Information retrieval

Code 289AA
Credits 6

Learning outcomes

In this course we will study, design and analyze (theoretically and experimentally) software tools for IR-applications dealing with unstructured (raw data), structured (DB-centric) or semi-structured data (i.e. HTML, XML). We will mainly concentrate on the basic components of a modern Web search engine, by examining in detail the algorithmic solutions currently adopted to implement its main software modules. We will also discuss their performance and/or computational limitations, as well as introduce measures for evaluating their efficiency and efficacy. Finally, we will survey some algorithmic techniques which are frequently adopted in the design of IR-tools managing large datasets. -Search engines -Crawling, Text analysis, Indexing, Ranking -Storage of Web pages and (hyper-)link graph -Results processing and visualization -Other data types: XML, textual DBs -Data processing for IR tools -Data streaming -Data sketching -Data compression -Data clustering (sketch)