Thesis Code: 18012

Thesis Type: Thesis in Computer Science, Data Engineering, Computer Engineering, Mathematical Engineering, Data Science

Research Area: Innovation Development

• Experience with Python and/or Java and/or Node.js
• Basic knowledge of modular development
• Beginner of (or willing to learn quickly) deep learning and natural language processing
• Curiosity-driven mindset.

Public procurement contracts are a rich source of knowledge necessary for seizing the efforts in participating to public procurement calls. However, contracts are usually available in textual format making harder the task of extracting structured information automatically and being used in automated systems. This thesis will focus on extracting structured information from those documents such as specific dates, unique identifiers (VAT id, protocol numbers, telephone number), named entities (places, people, business entities, products). In this thesis the undergraduate will study and experiment with deep learning and natural language processing techniques that are the core of the Artificial Intelligence stack, by understanding the intrinsic semantics of document and identifying and linking pivotal information found in the text to an external database.

The thesis will be structured as follows:
• state-of-the-art analysis of text processing techniques
• problem formulation: objective function, data structures and resources to be used
• algorithm design and prototyping
• in-lab testing verification with real data and measurement of the performance of the approach.

The thesis will be co-tutored with Synapta Srl, a Spin-off of Politecnico di Torino. It will be an opportunity to work also with the Synapta team experimenting with real data. The undergraduate will benefit from being immersed in a existing start-up environment while applying scientific experimental practises learned in ISMB. At the end of the thesis, the undergraduate will be familiar with deep learning and natural language processing techniques, and she/he will acquire an understanding of the public-procurement domain. As additional benefit, she/he will proficiently use control version systems, continuous integration systems, remote deploying and monitoring techniques.

Contact: send a resume with attached the list of exams to specifying the thesis code and title.