Data Mining

Code 309AA
Credits 9

Learning outcomes

This course provides a structured introduction to the key methods of data mining and the design of knowledge discovery processes. Organizations and businesses are overwhelmed by the flood of data continuously collected into their data warehouses as well as sensed by all kinds of digital technologies - the web, social media, mobile devices, the internet of things. Traditional statistical techniques may fail to make sense of the data, due to the inherent complexity and size. Data mining, knowledge discovery and statistical learning techniques emerged as an alternative approach, aimed at revealing patterns, rules and models hidden in the data, and at supporting the analytical user to develop descriptive and predictive models for a number of challenging problems.
• Fundamentals of data mining and of the knowledge discovery process from data.
• Design of data analysis processes.
• Statistical exploratory analytics for data understanding.
• Dimensionality reduction and Principal Component Analysis.
• Clustering analysis with centroid-based, hierarchical and density-based methods, predictive analytics and classification models (including decision trees, bayesian, rule-based, kernel-based, SVM, random forest and ensemble methods), pattern mining and association rule discovery.
• Validation and interpretation of discovered patterns and models within statistical frameworks.
• Design and development of data mining processes using state of the art technology, including KNIME, Python, and R, within a wrap-up project aimed at using and possibly modifying the DM tools and libraries learned in class.