CdSINFORMATICA E NETWORKING
Codice534AA
CFU6
PeriodoPrimo semestre
LinguaInglese
Moduli | Settore/i | Tipo | Ore | Docente/i | |
PIATTAFORME ABILITANTI DISTRIBUITE | INF/01 | LEZIONI | 48 |
|
At the end of the course the students will develop an detailed understanding of the typical issues of very large scale distributed systems.
Final oral examination.
At the end of the course the students will be able to discuss, analyze and inspect very large scale distributed systems, and they will master tools, best practices and common procedures to design, implement and program such systems.
Laboratory Software report.
At the end of the course the student will know how to design, implement and program such systems, through understanding of algorithms and suitable theoretical models.
Discussion of design and implementation choices of the laboratory software report during oral examination.
Software Engineering
Java Programming
Information Retrieval
Machine Learning
N.A.
Delivery: face to face
Attendance: not mandatory
Teaching methods: lectures and programming exercises
The main course deals with a set of arguments related to the distributed computing platforms. In this course we will explore the issues in very large scale distributed systems - for example, with hundreds or thousands of nodes - and consider how such systems can be designed and programmed. Most common issues and problems in the design and implementations of such systems will be investigated, and the state-of-the-art solutions available in industry-level distributed platforms will be discussed. Examples include the Grid/Cloud platforms, MapReduce systems, data management platforms such as search system infrastructures (e.g., Web search engine backends) and online social services (e.g., Amazon)
The objectives of this course are:
* to develop an understanding of the typical issues of very large scale distributed systems;
* to equip students with tools, best practices and common procedures to design, implement and program such systems, through understanding of algorithms and suitable theoretical models.
The course includes a "conceptual" part and an "experimental" part. The conceptual part consists in the presentation of knowledge shared by the design of several large scale platforms such as big data processing solutions, query processing and data management. The lab part consists in the presentation and usage of a set of open source frameworks and tools implementing the concepts discussed during lectures, that will help the students in developing the final project of the course.
List of topics:
1) Introduction to large scale distributed systems.
2) Grid and cloud computing: concepts, techniques and solutions.
3) Large-scale data processing infrastructures
- data organization and layout
- web search infrastructures
- efficient query processing
- efficient machine learning
4) Large-scale data management infrastructures:
- data representation and replication
- consistency models and PAXOS
- availability and scalability
- fault tolerance and consensus
- I. Foster, C. Kesselman, “The Grid 2: Blueprint for a New Computing Infrastructure”, Morgan Kaufmann Publishers Inc., 2003. Chapters 4 and 21.
- NIST Cloud Computing Reference Architecture
- Map-Reduce and the New Software Stack
- Data-Intensive Text Processing with Map Reduce
- O'Reilly, Designing Data-Intensive Applications
Oral Examination
Lab project and report
N.A.
http://didawiki.cli.di.unipi.it/doku.php/magistraleinformaticanetworking/cpa/start