CdSINFORMATICA E NETWORKING
|HIGH PERFORMANCE COMPUTING||INF/01||LEZIONI||72|
Parallel Processing Methodologies
Shared-Memory Parallel Architectures
Distributed-Memory Architectures (basics)
Graphical Processing Units (basics)
Interconnection Networks for Parallel Machines (basics)
Cost Models and Methodologies for Performance Evaluation
Students are expected to demonstrate a solid knowledge of:
1. fundamental concepts and techniques in parallel computation structuring and design, including parallelization methodologies and paradigms, parallel programming models, their implementation, and related cost models;
2. architectures of high-performance computing systems, including shared-memory multiprocessors, distributed memory multicomputers, clusters, and others. To this goal, the student will be aware of structural model, static and dynamic support to computation and programming models, performance evaluation, capability for building complex and heterogeneous applications and/or enabling platforms, technological features and trends (particular multi-/many-core technology and high-performance networks).
Written and oral exam.
The student must demonstrate the ability to properly correlating the various issues studied in the course in order to deal with problem solving tasks in the definition and design of parallel programs and their run-time support on parallel architectures. The written part will also assess the ability to present a clear report of the solved problem.
- Final oral exam
- Final written exam
- Periodic written tests
Two midterms are provided: if taken with sufficient rating, they replace the written part of the exam during the period January-February.
Ability to model, analyze and implement parallel processing systems at any level, both processes and their run-time support implementations, and firmware (knowledge about the behavior of HPC-enabling platforms).
At the end of the course the students will be able to :
1- evaluate different architectural designs of parallel and distributed systems through the application of performance modeling techniques;
2- design parallel applications using a structured methodology supported by cost models and to evaluate the performance achieved on different target execution environments.
Written and oral exam.
During the course some optional homeworks will be assigned to the students (typically on a per-week basis) in order to provide a way to check the understanding of the course lectures and to indentify possible gaps to be filled during the question time.
The students will be able to achieve a clear understanding of the behavior of parallel machines and their most critical design issues. Furthermore, they will be able to design parallel applications with the ability to indentify possible overheads, having the methodology to solve these issues through the application of a solid methodology.
The needed background in Computer Architecture includes basic concepts in system level structuring, modularity and parallelism design principles, hardware and firmware machine level, assembler machine level, processes and their run-time support, communication, operating systems functionalities, input-output, memory hierarchies and caching, instruction-level parallelism, optimizing compilers. This required background is contained in the following recommended reading:
M. Vanneschi, Structured Computer Architecture Background: Appendix of the High Performance Computing course textbook, by M. Vanneschi.
The prerequisites of the course are the following:
-Computer Architectures (a basic course of a Bachelor Degree program);
-Algorithms and basic data structures.
Delivery: face to face
- attending lectures
- preparation of oral/written report
- participation in discussions
- individual study
- Task-based learning/problem-based learning/inquiry-based learning
The course deals with two interrelated issues in high-performance computing: i) fundamental concepts and techniques in parallel computation structuring and design, including parallelization methodologies and paradigms, parallel programming models, their implementation, and related cost models; ii) architectures of high-performance computing systems, including shared-memory multiprocessors, distributed-memory multicomputers, GPUs, clusters, and others. Both issues are studied in terms of structural models, static and dynamic support to computation and programming models, performance evaluation, capability for building complex and heterogeneous applications and/or enabling platforms, also through examples of application cases. Technological features and trends are studied, in particular multi-/many-core technology and high-performance networks.
Course outline: the course is structured into two parts:
• Structuring and Design Methodology for Parallel Applications: structured parallelism at applications and processes levels, cost models, impact of communications, parallel computations as queueing systems/queueing networks, parallel paradigms (Pipeline, Data-Flow, Farm, Function Partitioning, Data Parallel), parallel systems at the firmware level, instruction level parallelism (Pipeline, Superscalar, Multithreaded CPUs), SIMD architectures and GPUs;
• Parallel Architectures: shared-memory multiprocessors (SMP and NUMA architectures), distributed-memory multicomputers (Clusters and MPP architectures), GPUs, run-time support to interprocess communication, interconnection networks, performance evaluation and multicore architectures.
1. Review of level structuring, processing modules, firmware architecture, assembler machine, memory hierarchies and caching, process level and interprocess communication;
2. Methodology for structuring and programming high-performance parallel applications, basic cost models: metrics, elements of queueing theory and queueing networks, load balancing, static and dynamic optimizations;
3. Parallel paradigms: stream-parallel , data-parallel and their compositions;
4. Run-time supports of parallel programs and their optimization;
5. Shared memory multiprocessors: SMP and NUMA, cost models; interconnection networks and their evaluation, cache coherence;
6. Distributed memory architectures: multicomputers, clusters, distributed heterogeneous platforms.
Text book: M. Vanneschi, High Performance Computing: Parallel Processing Models and Architectures. Pisa University Press, 2014.
Marco Vanneschi, "High Performance Computing: Parallel Processing Models and Architectures". Pisa University Press, 2014.
Written and oral exam
The exam consists in a written and in an oral part.