Modules | Area | Type | Hours | Teacher(s) | |
PIATTAFORME ABILITANTI DISTRIBUITE | INF/01 | LEZIONI | 48 |
|
The students of this course will develop an detailed understanding of the typical issues of very large scale distributed systems, with a particular focus on data management systems.
Oral exam on course topics
At the end of the course the students will be able to discuss, analyze and inspect very large scale distributed systems, and he will master tools, best practices and common procedures to design, implement and program such systems.
Laboratory software report on course topics
The students will be able to design, implement and program such systems, through understanding of algorithms and suitable theoretical models.
During the oral examination the decisions and design choices in implementing a practical distributed system will be discussed and evaluated.
Software Engineering
Java Programming
Information Retrieval
Machine Learning
None
Delivery: face to face lectures
Attendance: not mandatory
Teaching methods: lectures, programming exercises
The main course deals with a set of arguments related to the distributed computing platforms. In this course we will explore the issues in very large scale distributed systems - for example, with hundreds or thousands of nodes - and consider how such systems can be designed and programmed. Most common issues and problems in the design and implementations of such systems will be investigated, and the state-of-the-art solutions available in industry-level distributed platforms will be discussed. Examples include the Grid/Cloud platforms, MapReduce systems, data management platforms such as search system infrastructures (e.g., Web search engine backends) and online social services (e.g., Amazon)
The objectives of this course are:
* to develop an understanding of the typical issues of very large scale distributed systems;
* to equip students with tools, best practices and common procedures to design, implement and program such systems, through understanding of algorithms and suitable theoretical models.
The course includes a "conceptual" part and an "experimental" part. The conceptual part consists in the presentation of knowledge shared by the design of several large scale platforms such as big data processing solutions, query processing and data management. The lab part consists in the presentation and usage of a set of open source frameworks and tools implementing the concepts discussed during lectures, that will help the students in developing the final project of the course.
List of topics:
1) Introduction to large scale distributed systems.
2) Grid/Cloud computing: concepts, techniques and solutions.
3) Large-scale data processing infrastructures
- data organization and layout
- web search infrastructures
- efficient query processing
- efficient machine learning
4) Large-scale data management infrastructures:
- data representation and replication
- consistency models and PAXOS
- availability and scalability
- fault tolerance and consensus
Oral Examination
Lab project and report
Invited talks with personnel of ICT companies and researches of the National Research Council of Italy.
http://didawiki.cli.di.unipi.it/doku.php/magistraleinformaticanetworking/cpa/start
http://pomino.isti.cnr.it/~khast