Big Data Analytics

Code 599AA
Credits 6

Learning outcomes

In our digital society, every human activity is mediated by information technologies. Therefore, every activity leaves digital traces behind, that can be stored in some repository. Phone call records, transaction records, web search logs, movement trajectories, social media texts and tweets, … Every minute, an avalanche of “big data” is produced by humans, consciously or not, that represents a novel, accurate digital proxy of social activities at global scale. Big data provide an unprecedented “social microscope”, a novel opportunity to understand the complexity of our societies, and a paradigm shift for the social sciences.
This course is an introduction to the emergent field of big data analytics and social mining, aimed at acquiring and analyzing big data from multiple sources to the purpose of discovering the patterns and models of human behavior that explain social phenomena. The focus is on what can be learnt from big data in different domains: mobility and transportation, urban planning, demographics, economics, social relationships, opinion and sentiment, etc.; and on the analytical and mining methods that can be used. An introduction to scalable analytics is also given, using the “map-reduce” paradigm.

1. Big data sources.
- Open (linked) data, Web activity data, Social network data, Social media data, Mobile phone data, Navigation GPS data, Commercial transaction data, Tourism-related data, Crowdsourcing / crowdsensing.
2. Big data analytics and social mining methods: data preprocessing, exploratory data analysis, correlation analysis, feature selection, semantic enrichment, pattern discovery, classification and prediction, clustering and segmentation for:
- the discovery of individual social profiles
- the analysis of collective behavior
- the discovery of emotional content of text and sentiment analysis
3. Big data analytics domains
- Mobility and transportation
- Nowcasting of socio-economic indicators of progress, happiness, etc.
- Twitterology and nowcasting of social mood and trends
- Tourism
4. Ethical issues of big data analytics
- Privacy and personal data protection
- Privacy-preserving analytics
- Social responsibility of data scientists
5. Scalable data analytics
- Paradigms of NO-SQL databases
- Data analysis processes with the “map-reduce” paradigm