Thesis Code: 26004
Thesis Type: M.Sc. in Machine Learning, Data Science, Computer Science, Mathematics, Telecommunications, or similar
Research Area: Ai, Data and Space

Requirements

  • M.Sc. in Machine Learning, Data Science, Computer Science, Mathematics, Telecommunications, or similar
  • Knowledge of Python
  • Software development skills
  • Knowledge of signals
  • Basic knowleddge of natural language modelling and semantic embedding
  • Basic knowleddge of retrieval

Description
Retrieval-Augmented Generation (RAG) is arguably one of the technology with most traction at the moment. Most RAG systems, however, struggle to achieve their full potential because they rely on a fixed retrieval granularity, typically retrieving passages or chunks of uniform size. This approach often leads to mismatches between the information need and the retrieved evidence: broad questions like “What are these documents about?” demand high-level summaries, while specific factoid queries like “Where was John born?” require fine-grained snippets. The challenge is to design a retrieval system that can dynamically adapt to the level of detail a query requires. This thesis asks: how can we model and predict query intent to select the appropriate retrieval granularity, and how does such an adaptive system impact answer accuracy, coherence, and efficiency compared to fixed-granularity RAG methods?

Contacts: send a resume with attached the list of exams to lorenzo.bongiovanni@linksfoundation.com specifying the thesis code and title.