Information Retrieval

Code 289AA
Credits 6

Learning outcomes

In this course we will study, design and analyze (theoretically and experimentally) software tools for IR-applications dealing with unstructured (raw data), structured (DB-centric) or semi-structured data (i.e. HTML, XML). We will mainly concentrate on the basic components of a modern Web search engine, by examining in detail the algorithmic solutions currently adopted to implement its main software modules. We will also discuss their performance and/or computational limitations, as well as introduce measures for evaluating their efficiency and efficacy. Finally, we will survey some algorithmic techniques which are frequently adopted in the design of IR-tools managing large datasets.
-Search engines
-Crawling, Text analysis, Indexing, Ranking
-Storage of Web pages and (hyper-)link graph
-Results processing and visualization
-Other data types: XML, textual DBs
-Data processing for IR tools
-Data streaming
-Data sketching
-Data compression
-Data clustering (sketch)