Research - CoLingLab

Active projects

CLEVER – Computational and Linguistic bEnchmarks for the study of VErb argument structuRe
The CLEVER project studies Italian verb argument structure by integrating three kinds of evidence: linguistic acceptability judgments, psycholinguistic behavioral data, and neural language models trained and analyzed as computational “laboratories”.

Key outputs
– Training dataset: CLEVER will create a training dataset from Italian child-friendly, developmentally grounded input, building a cognitively plausible corpus that approximates the linguistic experience of young learners. To do this, it will employ child-directed speech alongside several child-friendly sources, with careful attention to developmental evidence.
– Evaluation benchmarks for Baby Language Models: CLEVER creates a controlled benchmark dataset of Italian sentences targeting core verb-argument phenomena, annotated with human judgments and eye-tracking/reading-time measures. These benchmarks are used to evaluate cognitively plausible models, and to test whether models encode argument-structure knowledge and predict human behavioral data.

Acknowledgments
PRIN 2022 Project Title “Computational and linguistic benchmarks for the study of verb argument structure” – CUP I53D23004050006 – Grant Assignment Decree No. 1016 adopted on 07/07/2023 by the Italian Ministry of University and Research (MUR).

FAIR – Future Artificial Intelligence Research
FAIR is a research project aimed at advancing methods, models, and technologies for building trustworthy, ethical, and human-centered AI systems. Within the project, the laboratory contributes to Spoke 1 (coordinated by the University of Pisa), Work Package 1.3: Human-centered machine learning and reasoning, focusing on evaluating Large Language Models’ inferential abilities, with particular attention to how they handle causal relations and reasoning.

Acknowledgments
PNRR—M4C2—Investimento 1.3, Partenariato Esteso PE00000013—“FAIR—Future Artificial Intelligence Research”—Spoke 1 “Human-centered AI,” funded by the European Commission under the NextGeneration EU programme”.

Past projects

ABI2LE, Ability to Learning (2022-2023) – Co-financed by the POR-CReO ERDF 2014–2020 programme (RS 2020 Calls, Action 1.1.5 sub-action a1), for a total funding amount of €119,000. The project aimed to develop a Service Layer of Artificial Intelligence and NLP services adaptable to different application domains.

Distributional Memory (DM) – a general distributional semantic model, developed in collaboration with Marco Baroni.

LexIt – on line database developed at the CoLing Lab, containing automatically corpus-derived information on the argument structure properties of Italian verbs.

Text2Query – Deep Learning models for Big Data analysis through Natural Language.

Event Extraction for Fake News Detection a project focused on the development of a system for fake news detection by means of a graph-based representation of news events and actors. The project is in collaboration with the Computer Science and Artificial Intelligence Lab of the Massachusetts Institute of Technology (CSAIL-MIT).

MUSE – MUltimodal Semantic Extraction – the goal of the project is the semantic multimodal analysis of both texts and images by exploiting Natural Language Processing and Computer Vision techniques. The project is in collaboration with the Company BNova s.r.l. (POR FSE 2014-2020 Asse A).

UBIMOL – UBIquitous Massive Open Learning – the project aims at developing an E-learning platform enriched with innovative NLP technologies able to offer personalized courses. The project involves the companies M.E.T.A. Srl, 01Sistemi Srl, VIDITRUST Srl, PERSAFE Srl and the research partners ILC-CNR and CoLing Lab (POR FESR 2014 – 2020).

Voci della Grande Guerra – two-year project, funded by the Special Mission for the Celebrations of the 100th Anniversary of World War I at the Presidenza del Consiglio dei Ministri of the Italian Government, to build an annotated corpus of digital texts representative of the different ways to experience and describe the Italian war.

Word Combinations in Italian – Theoretical and descriptive analysis, computational models, lexicographic layout and creation of a dictionary – a 3-year project funded by the Italian Ministry of Research (PRIN 2010/2011), coordinated by Raffaele Simone (University of Rome 3). The goal of CoLing Lab is to develop advanced computational linguistics methods for the extraction of distributional information from text corpora. The project will end in February 2016.

SEM – Il Chattadino – a 2-year project funded by Regione Toscana in collaboration with IT companies to develop a chatbot to query services and documents in the Public Administration (POR-CReO FESR 2014 – 2020 – Bandi RS 2017).

SEMPLICE – SEMantic instruments for PubLIc administrators and CitizEns – a 2-year project funded by Regione Toscana in collaboration with IT companies to develop NLP-based tools for knowledge management, information extraction and opinion mining for local public administrations.

BLIND – Semantic representations in congenital blind subjects – a 2-year project funded by the Italian Ministry of Research (PRIN 2008), in collaboration with Giovanna Marotta (University of Pisa, Project Director), Pietro Pietrini (University of Pisa), and Marco Baroni (University of Trento). The overall goal of the project was to conduct linguistic, computational and neuro-cognitive analyses of semantic representations in the congenitally blind.

Paisà – Piattaforma per l’Apprendimento dell’Italiano Su corpora Annotati – a 3-year project funded by the Italian Ministry of Research (Firb 2007), in collaboration with University of Bologna (Project Director Sergio Scalise), ILC-CNR, University of Trento and Eurac (Bolzano). The project has built a large, freely available, richly annotated corpus of Italian, and lexical databases that will be automatically acquired from it.

Semawiki – 2-year project funded by the Fondazione Cassa di Risparmio di Pisa. The project has developed various computational tools and resources for Italian NLP, and was carried out in collaboration with the Department of Computer Science of the University of Pisa and ILC-CNR.