Research - CoLingLab

Active projects

CLEVER – Computational and Linguistic bEnchmarks for the study of VErb argument structuRe
The project studies Italian verb argument structure by integrating three kinds of evidence: linguistic acceptability judgments, psycholinguistic behavioral data, and neural language models trained and analyzed as computational “laboratories”.

FAIR – Future Artificial Intelligence Research
The project is aimed at advancing methods, models, and technologies for building trustworthy, ethical, and human-centered AI systems. The CoLing Lab contributes to Spoke 1 (coordinated by the University of Pisa), Work Package 1.3: Human-centered machine learning and reasoning, focusing on evaluating Large Language Models’ inferential abilities, with particular attention to how they handle causal relations and reasoning.

Past projects

ABI2LE, Ability to Learning – Co-financed by the POR-CReO ERDF 2014–2020 programme (RS 2020 Calls, Action 1.1.5 sub-action a1), for a total funding amount of €119,000. The project aimed to develop a Service Layer of Artificial Intelligence and NLP services adaptable to different application domains.

Text2Query – A project partially founded by Regione Toscana (POR-FESR 2014-2020), whose aim is the development of natural language interfaces for query languages and Big Data via Deep Learning models.

MUSE – MUltimodal Semantic Extraction – A collaborative project with BNova s.r.l. (POR FSE 2014-2020 Asse A) focused on semantic multimodal analysis of text and images, leveraging Natural Language Processing and Computer Vision techniques.

Event Extraction for Fake News Detection – A project focused on the development of a fake news detection system based on graph-based representations of news events and actors, conducted in collaboration with the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology (MIT).

UBIMOL – UBIquitous Massive Open Learning – The project aims to develop an e-learning platform enriched with innovative NLP technologies capable of delivering personalized courses. The project involves the companies M.E.T.A. Srl, 01Sistemi Srl, VIDITRUST Srl, and PERSAFE Srl, in collaboration with the research partners ILC-CNR and the CoLing Lab (POR FESR 2014–2020).

Voci della Grande Guerra – A two-year project funded by the Special Mission for the Celebrations of the 100th Anniversary of World War I at the Presidenza del Consiglio dei Ministri of the Italian Government, aimed at building an annotated corpus of digital texts representative of the diverse ways of experiencing and describing the Italian war.

Word Combinations in Italian – A three-year project funded by the Italian Ministry of Research (PRIN 2010/2011). It focused on theoretical and descriptive analysis, computational models, lexicographic design, and dictionary creation. Within this framework, the CoLing Lab developed advanced computational linguistics methods for extracting distributional information from text corpora.

SEM – Il Chattadino – a 2-year project funded by Regione Toscana in collaboration with IT companies to develop a chatbot to query services and documents in the Public Administration (POR-CReO FESR 2014 – 2020 – Bandi RS 2017).

SEMPLICE – SEMantic instruments for PubLIc administrators and CitizEns – A two-year project funded by Regione Toscana, carried out in collaboration with IT companies, aimed at developing a chatbot for querying Public Administration services and documents (POR-CReO FESR 2014–2020 – Bandi RS 2017).

BLIND – Semantic representations in congenital blind subjects – A two-year project funded by the Italian Ministry of Research (PRIN 2008), conducted in collaboration with University of Trento. The project aimed to carry out linguistic, computational, and neurocognitive analyses of semantic representations in individuals with congenital blindness.

LexIt – An online database developed at the CoLing Lab containing automatically corpus-derived information on the argument structure properties of Italian verbs.

Distributional Memory (DM) – A general distributional semantic model developed in collaboration with Marco Baroni.

Paisà – Piattaforma per l’Apprendimento dell’Italiano Su corpora Annotati – A three-year project funded by the Italian Ministry of Research (FIRB 2007), conducted in collaboration with the University of Bologna, ILC-CNR, University of Trento, and Eurac (Bolzano). The project produced a large, freely available, richly annotated corpus of Italian, along with lexical databases automatically extracted from it.