Text analytics

Code 635AA
Credits 6

Learning outcomes

The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The objective is to learn to recognize situations in which text analytics techniques can solve information processing needs, to identify the analytic task/process that best models the business problem, to select the most appropriate resources methods and tools, to collect text data and apply such methods to them. Several applications context will be presented: information extraction, sentiment analysis (what is the nature of commentary on an issue), spam and fake posts detection, quantification problems, summarization, etc.

Syllabus

- Disciplinary background: Natural Language Processing, Information Retrieval and Machine Learning.
- Mathematical background: Probability, Statistics and Algebra.
- Linguistic essentials: words, lemmas, morphology, PoS, syntax.
- Basic text processing: regular expression, tokenisation.
- Data gathering: twitter API, scraping.
- Basic modelling: collocations, language models.
- Libraries and tools: NLTK, Keras.
- Applications: Classification/Clustering, Sentiment Analysis/Opinion Mining, Information Extraction/Relation Extraction, Entity Linking, Spam Detection: mail spam & phishing, blog spam, review spam.