Publications | MilaNLP Lab @ Bocconi University

The “r” in “woman” stands for rights. Auditing LLMs in Uncovering Social Dynamics in Implicit Misogyny

November, 2025

Persistent societal biases like misogyny express themselves more often implicitly than through openly hostile language.However, …

Arianna Muti, Chris Emmery, Debora Nozza, Alberto Barrón-Cedeño, Tommaso Caselli

PDF Project

TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent

November, 2025

As large language models (LLMs) become integrated into sensitive workflows, concerns grow over their potential to leak confidential …

Dominik Meier, Jan Philip Wahle, Paul Röttger, Terry Ruas, Bela Gipp

PDF Project

Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance

November, 2025

Expert persona prompting—assigning roles such as expert in math to language models—is widely used for task improvement. However, prior …

Pedro Henrique Luz de Araujo, Paul Röttger, Dirk Hovy, Benjamin Roth

PDF Project

No for Some, Yes for Others: Persona Prompts and Other Sources of False Refusal in Language Models

November, 2025

Large language models (LLMs) are increasingly integrated into our daily lives and personalized. However, LLM personalization might also …

Flor Miriam Plaza-Del-Arco, Paul Röttger, Nino Scherrer, Emanuele Borgonovo, Elmar Plischke, Dirk Hovy

PDF Project

Consistency is Key: Disentangling Label Variation in Natural Language Processing with Intra-Annotator Agreement

November, 2025

We commonly use agreement measures to assess the utility of judgements made by human annotators in Natural Language Processing (NLP) …

Gavin Abercrombie, Tanvi Dinkar, Amanda Cercas Curry, Verena Rieser, Dirk Hovy

PDF Project

Detoxify-IT: An Italian Parallel Dataset for Text Detoxification

October, 2025

Toxic language online poses growing challenges for content moderation. Detoxification, which rewrites toxic content into neutral form, …

Viola De Ruvo, Arianna Muti, Daryna Dementieva, Debora Nozza

PDF Project

Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains

October, 2025

Hate speech relies heavily on cultural influences, leading to varying individual interpretations. For that reason, we propose a …

Katerina Korre, Arianna Muti, Federico Ruggeri, Alberto Barrón-Cedeño

PDF Project

Blue-haired, misandriche, rabiata: Tracing the Connotation of ‘Feminist(s)’ Across Time, Languages and Domains

October, 2025

Understanding how words shift in meaning is crucial for analyzing societal attitudes.In this study, we investigate the contextual …

Arianna Muti, Sara Gemelli, Emanuele Moscato, Emilie Francis, Amanda Cercas Curry, Flor Miriam Plaza-Del-Arco, Debora Nozza

PDF Project

Leveraging Media Frames to Improve Normative Diversity in News Recommendations

September, 2025

Click-based news recommender systems suggest users content that aligns with their existing history, limiting the diversity of articles …

Sourabh Dattawad, Agnese Daffara, Tanise Ceron

PDF Project

Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification?

September, 2025

We introduce Co-DETECT (Collaborative Discovery of Edge cases in TExt ClassificaTion), a novel mixed-initiative annotation framework …

Chenfei Xiong, Jingwei Ni, Yu Fan, Vilém Zouhar, Donya Rooein, Lorena Calvo-Bartolomé, Alexander Hoyle, Zhijing Jin, Mrinmaya Sachan, Markus Leippold, Dirk Hovy, Mennatallah El-Assady, Elliott Ash

PDF

Biased Tales: Cultural and Topic Bias in Generating Children’s Stories?

September, 2025

Stories play a pivotal role in human communication, shaping beliefs and morals, particularly in children. As parents increasingly rely …

Donya Rooein, Vilém Zouhar, Debora Nozza, Dirk Hovy

PDF

Leveraging Media Frames to Improve Normative Diversity in News Recommendations

September, 2025

Click-based news recommender systems suggest users content that aligns with their existing history, limiting the diversity of articles …

Sourabh Dattawad, Agnese Daffara, Tanise Ceron

PDF Project

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

September, 2025

Large language models (LLMs) are helping millions of users write texts about diverse issues, and in doing so expose users to different …

Paul Röttger, Musashi Hinck, Valentin Hofmann, Kobi Hackenburg, Valentina Pyatkin, Faeze Brahman, Dirk Hovy

PDF Code Dataset Project Project

Measuring Gender Bias in Language Models in Farsi?

August, 2025

As Natural Language Processing models become increasingly embedded in everyday life, ensuring that these systems can measure and …

Hamidreza Saffari, Mohammadamin Shafiei, Donya Rooein, Debora Nozza

PDF

Are Large Language Models for Education Reliable for All Languages?

August, 2025

Large language models (LLMs) are increasingly being adopted in educational settings. These applications expand beyond English, though …

Vansh Gupta, Sankalan Pal Chowdhury, Vilém Zouhar, Donya Rooein, Mrinmaya Sachan

PDF

The AI Gap: How Socioeconomic Status Affects Language Technology Interactions

July, 2025

Socioeconomic status (SES) fundamentally influences how people interact with each other and, more recently, with digital technologies …

Elisa Bassignana*, Amanda Cercas Curry*, Dirk Hovy

PDF Dataset Project

HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter

July, 2025

To address the global challenge of online hate speech, prior research has developed detection models to flag such content on social …

Manuel Tonneau, Diyi Liu, Niyati Malhotra, Scott A. Hale, Samuel Fraiberger, Victor Orozco-Olvera, Paul Röttger

PDF Project

Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting

July, 2025

The rapid development of Large Language Models (LLMs) opens up the possibility of using them aspersonal tutors. This has led to the …

Sankalan Pal Chowdhury, Terry Jingchen Zhang, Donya Rooein, Dirk Hovy, Tanja Käser, Mrinmaya Sachan

PDF Project

Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions

July, 2025

People naturally vary in their annotations for subjective questions and some of this variation is thought to be due to the …

Matthias Orlikowski, Jiaxin Pei, Paul Röttger, Philipp Cimiano, David Jurgens, Dirk Hovy

PDF Project Project

Socially Aware Language Technologies: Perspectives and Practices

June, 2025

Language technologies have advanced substantially, particularly with the introduction of large language models. However, these …

Diyi Yang, Dirk Hovy, David Jurgens, Barbara Plank

PDF

Can I Introduce My Boyfriend to My Grandmother? Evaluating Large Language Models Capabilities on Iranian Social Norm Classification

April, 2025

Creating globally inclusive AI systems demands datasets reflecting diverse social norms. Iran, with its unique cultural blend, offers …

Hamidreza Saffari, Mohammadamin Shafiei, Donya Rooein, Francesco Pierri, Debora Nozza

PDF

Toeing the Party Line: Election Manifestos as a Key to Understand Political Discourse on Twitter

March, 2025

Political discourse on Twitter is a moving target: politicians continuously make statements about their positions. It is therefore …

Maximilian Maurer, Tanise Ceron, Sebastian Padó, Gabriella Lapesa

PDF Project

MilaNLP@Multilingual Counterspeech Generation: Evaluating Translation and Background Knowledge Filtering

March, 2025

We describe our participation in the Multilingual Counterspeech Generation shared task, which aims to generate a counternarrative to …

Emanuele Moscato, Arianna Muti, Debora Nozza

PDF Project

MONICA: Monitoring Coverage and Attitudes of Italian Measures in Response to COVID-19

March, 2025

Modern social media have long been observed as a mirror for public discourse and opinions. Especially in the face of exceptional …

Fabio Pernisi, Giuseppe Attanasio, Debora Nozza

PDF

Scaling language model size yields diminishing returns for single-message political persuasion

March, 2025

Large language models can now generate political messages as persuasive as those written by humans, raising concerns about how far this …

Kobi Hackenburg, Ben M. Tappin, Paul Röttger, Scott A. Hale, Jonathan Bright, Helen Margetts

PDF Project

Around the World in 24 Hours: Probing LLM Knowledge of Time and Place

March, 2025

Reasoning over time and space is essential for understanding our world. However, the abilities of language models in this area are …

Carolin Holtermann, Paul Röttger, Anne Lauscher

PDF Project

Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations

February, 2025

Large-scale surveys are essential tools for informing social science research and policy, but running surveys is costly and …

Yong Cao, Haijiang Liu, Arnav Arora, Isabelle Augenstein, Paul Röttger, Daniel Hershcovich

PDF Project

The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

December, 2024

Human feedback is central to the alignment of Large Language Models (LLMs). However, open questions remain about methods (how), domains …

Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew M. Bean, Katerina Margatina, Rafael Mosquera, Juan Manuel Ciro, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale

PDF Project

Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps

November, 2024

Current automatic speech recognition (ASR) models are designed to be used across many languages and tasks without substantial changes. …

Giuseppe Attanasio, Beatrice Savoldi, Dennis Fucci, Dirk Hovy

PDF Project Project

Language is Scary when Over-Analyzed: Unpacking Implied Misogynistic Reasoning with Argumentation Theory-Driven Prompts

November, 2024

We propose misogyny detection as an Argumentative Reasoning task and we investigate the capacity of large language models (LLMs) to …

Arianna Muti, Federico Ruggeri, Khalid Al Khatib, Alberto Barrón-Cedeño, Tommaso Caselli

PDF

Metrics for What, Metrics for Whom: Assessing Actionability of Bias Evaluation Metrics in NLP

November, 2024

This paper introduces the concept of actionability in the context of bias measures in natural language processing (NLP). We define …

Pieter Delobelle, Giuseppe Attanasio, Debora Nozza, Su Lin Blodgett, Zeerak Talat

PDF Project

Generalizability of Media Frames: Corpus creation and analysis across countries

November, 2024

Political discourse on Twitter is a moving target: politicians continuously make statements about their positions. It is therefore …

Agnese Daffara, Sourabh Dattawad, Sebastian Padó, Tanise Ceron

PDF Project

Divine LLaMAs: Bias, Stereotypes, Stigmatization, and Emotion Representation of Religion in Large Language Models

September, 2024

Emotions play important epistemological and cognitive roles in our lives, revealing our values and guiding our actions. Previous work …

Flor Miriam Plaza-del-Arco, Amanda Cercas Curry, Susanna Paoli, Alba Curry, Dirk Hovy

PDF Project

Countering Hateful and Offensive Speech Online - Open Challenges

September, 2024

In today’s digital age, hate speech and offensive speech online pose a significant challenge to maintaining respectful and inclusive …

Flor Miriam Plaza-del-Arco, Debora Nozza, Marco Guerini, Jeffrey Sorensen, Marcos Zampieri

PDF Project Project Slides Source Document

Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?

August, 2024

Pre-trained language models consider the context of neighboring words and documents but lack any author context of the human generating …

Nikita Soni, Niranjan Balasubramanian, H. Andrew Schwartz, Dirk Hovy

PDF

Narratives at Conflict: Computational Analysis of News Framing in Multilingual Disinformation Campaigns

August, 2024

Any report frames issues to favor a particular interpretation by highlighting or excluding certain aspects of a story. Despite the …

Antonina Sinelnik, Dirk Hovy

PDF Project

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

August, 2024

Much recent work seeks to evaluate values and opinions in large language models (LLMs) using multiple-choice surveys and …

Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schuetze, Dirk Hovy

PDF Project

My Answer is C: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

August, 2024

The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging. One …

Xinpeng Wang, Bolei Ma, Chengzhi Hu, Leon Weber-Genzel, Paul Röttger, Frauke Kreuter, Dirk Hovy, Barbara Plank

PDF Project Project

Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models

August, 2024

As diverse linguistic communities and users adopt large language models (LLMs), assessing their safety across languages becomes …

Fabio Pernisi, Dirk Hovy, Paul Röttger

PDF Project Project

From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets

July, 2024

Perceptions of hate can vary greatly across cultural contexts. Hate speech (HS) datasets, however, have traditionally been developed by …

Manuel Tonneau, Diyi Liu, Samuel Fraiberger, Ralph Schroeder, Scott A. Hale, Paul Röttger

PDF Project

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviors in Large Language Models

July, 2024

Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content. This risk …

Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy

PDF Project Project

DADIT: A Dataset for Demographic Classification of Italian Twitter Users and a Comparison of Prediction Methods

May, 2024

Social scientists increasingly use demographically stratified social media data to study the attitudes, beliefs, and behavior of the …

Lorenzo Lupo, Paul Bose, Mahyar Habibi, Dirk Hovy, Carlo Schwarz

PDF Project

Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts

May, 2024

Using large language models (LLMs) for educational applications like dialogue-based teaching is a hot topic. Effective teaching, …

Donya Rooein, Paul Rottger, Anastassia Shaitarova, Dirk Hovy

PDF Project

Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation

May, 2024

Large Language Models (LLMs) exhibit remarkable text classification capabilities, excelling in zero- and few-shot learning (ZSL and …

Flor Miriam Plaza-Del-Arco, Debora Nozza, Dirk Hovy

PDF Project

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions

May, 2024

Training large language models to follow instructions makes them perform better on a wide range of tasks, generally becoming more …

Federico Bianchi, Mirac Suzgun, Giuseppe Attanasio, Paul Röttger, Dan Jurafsky, Tatsunori Hashimoto, James Zou

PDF Project Project

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

April, 2024

The last two years have seen a rapid growth in concerns around the safety of large language models (LLMs). Researchers and …

Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy

PDF Project Project

Emotion Analysis in NLP: Trends, Gaps and Roadmap for Future Directions

March, 2024

Emotions are a central aspect of communication. Consequently, emotion analysis (EA) is a rapidly growing field in natural language …

Flor Miriam Plaza-del-Arco, Alba Curry, Amanda Cercas Curry, Dirk Hovy

PDF Project

Conversations as a Source for Teaching Scientific Concepts at Different Education Levels

March, 2024

Open conversations are one of the most engaging forms of teaching. However, creating those conversations in educational software is a …

Donya Rooein, Dirk Hovy

PDF Project

Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution

March, 2024

Large language models (LLMs) reflect societal norms and biases, especially about gender. While societal biases and stereotypes have …

Flor Miriam Plaza-del-Arco, Amanda Cercas Curry, Alba Curry, Gavin Abercrombie, Dirk Hovy

PDF Project

Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features

March, 2024

Predictive models make mistakes and have biases. To combat both, we need to understand their predictions. Explainable AI (XAI) …

Eliana Pastor, Alkis Koudounas, Giuseppe Attanasio, Dirk Hovy, Elena Baralis

PDF Project Project

Subjective isms? On the Danger of Conflating Hate and Offence in Abusive Language Detection

March, 2024

Natural language processing research has begun to embrace the notion of annotator subjectivity, motivated by variations in labelling. …

Amanda Cercas Curry, Gavin Abercrombie, Zeerak Talat

PDF Project

Impoverished Language Technology: The Lack of (Social) Class in NLP

March, 2024

Since Labov’s (1964) foundational work on the social stratification of language, linguistics has dedicated concerted efforts …

Amanda Cercas Curry, Zeerak Talat, Dirk Hovy

PDF Project

Classist Tools: Social Class Correlates with Performance in NLP

March, 2024

Since the foundational work of William Labov on the social stratification of language (Labov, 1964), linguistics has made concentrated …

Amanda Cercas Curry, Giuseppe Attanasio, Zeerak Talat, Dirk Hovy

PDF Project

A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation

December, 2023

Recent instruction fine-tuned models can solve multiple NLP tasks when prompted to do so, with machine translation (MT) being a …

Giuseppe Attanasio, Flor Miriam Plaza-del-Arco, Debora Nozza, Anne Lauscher

PDF Project

Mirages. On Anthropomorphism in Dialogue Systems

December, 2023

Automated dialogue or conversational systems are anthropomorphised by developers and personified by users. While a degree of …

Gavin Abercrombie, Amanda Cercas Curry, Tanvi Dinkar, Verena Rieser, Zeerak Talat

PDF Project

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising 'Alignment' in Large Language Models

November, 2023

In this paper, we address the concept of ‘alignment’ in large language models (LLMs) through the lens of post-structuralist …

Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

PDF Project Project

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

November, 2023

The past year has seen rapid acceleration in the development of large language models (LLMs). For many tasks, there is now a wide range …

Bertie Vidgen, Hannah Rose Kirk, Rebecca Qian, Nino Scherrer, Anand Kannappan, Scott A. Hale, Paul Röttger

PDF Project Project

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

October, 2023

Human feedback is increasingly used to steer the behaviours of Large Language Models (LLMs). However, it is unclear how to collect and …

Hannah Rose Kirk, Andrew M. Bean, Bertie Vidgen, Paul Röttger, Scott A. Hale

PDF Project Project

Wisdom of Instruction-Tuned Language Model Crowds: Exploring Model Label Variation

July, 2023

Large Language Models (LLMs) exhibit remarkable text classification capabilities, excelling in zero- and few-shot learning (ZSL and …

Flor Miriam Plaza-del-Arco, Debora Nozza, Dirk Hovy

PDF Project

What about ''em''? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

July, 2023

As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. …

Anne Lauscher, Debora Nozza, Ehm Miltersen, Archie Crowley, Dirk Hovy

PDF Project

What about ''em''? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

July, 2023

As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. …

Anne Lauscher, Debora Nozza, Ehm Miltersen, Archie Crowley, Dirk Hovy

PDF Project

The State of Profanity Obfuscation in Natural Language Processing Scientific Publications

July, 2023

Work on hate speech has made considering rude and harmful examples in scientific publications inevitable. This situation raises various …

Debora Nozza, Dirk Hovy

PDF Code Project

The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics

July, 2023

Many NLP tasks exhibit human label variation, where different annotators give different labels to the same texts. This variation is …

Matthias Orlikowski, Paul Röttger, Philipp Cimiano, Dirk Hovy

PDF Project

Temporal and Second Language Influence on Intra-Annotator Agreement and Stability in Hate Speech Labelling

July, 2023

Much work in natural language processing (NLP) relies on human annotation. The majority of this implicitly assumes that annotator’s …

Gavin Abercrombie, Dirk Hovy, Vinodkumar Prabhakaran

PDF Project

Respectful or Toxic? Using Zero-Shot Learning with Language Models to Detect Hate Speech

July, 2023

Hate speech detection faces two significant challenges: 1) the limited availability of labeled data and 2) the high variability of hate …

Flor Miriam Plaza-del-Arco, Debora Nozza, Dirk Hovy

PDF Project

MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection

July, 2023

We present the system proposed by the MilaNLP team for the Explainable Detection of Online Sexism (EDOS) shared task. We propose an …

Amanda Cercas Curry, Giuseppe Attanasio, Debora Nozza, Dirk Hovy

PDF Code Project

A Multi-dimensional study on Bias in Vision-Language models

July, 2023

In recent years, joint Vision-Language (VL) models have increased in popularity and capability. Very few studies have attempted to …

Gabriele Ruggeri, Debora Nozza

PDF

Leveraging Social Interactions to Detect Misinformation on Social Media

June, 2023

Detecting misinformation threads is crucial to guarantee a healthy environment on social media. We address the problem using the data …

Tommaso Fornaciari, Luca Luceri, Emilio Ferrara, Dirk Hovy

PDF

Computer says “No”: The Case Against Empathetic Conversational AI

June, 2023

Emotions are an integral part of human cognition and they guide not only our understanding of the world but also our actions within it. …

Alba Curry, Amanda Cercas Curry

PDF Project

A Cross-Lingual Study of Homotransphobia on Twitter

May, 2023

We present a cross-lingual study of homotransphobia on Twitter, examining the prevalence and forms of homotransphobic content in tweets …

Davide Locatelli, Greta Damo, Debora Nozza

PDF

Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale

May, 2023

Machine learning models are now able to convert user-written text descriptions into naturalistic images. These models are available to …

Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, Aylin Caliskan

PDF

Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)

May, 2023

Natural Language Processing has seen impressive gains in recent years. This research includes the demonstration by NLP models to have …

Sunipa Dev, Vinodkumar Prabhakaran, David Adelani, Dirk Hovy, Luciana Benotti

PDF

ferret: a Framework for Benchmarking Explainers on Transformers

May, 2023

As Transformers are increasingly relied upon to solve complex NLP problems, there is an increased need for their decisions to be …

Giuseppe Attanasio, Eliana Pastor, Chiara Di Bonaventura, Debora Nozza

PDF Code

Can Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers

May, 2023

Demographic factors (e.g., gender or age) shape our language. Previous work showed that incorporating demographic factors can …

Chia-chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš

PDF Project

Know Your Audience: Do LLMs Adapt to Different Age and Education Levels?

April, 2023

Large language models (LLMs) offer a range of new possibilities, including adapting the text to different audiences and their reading …

Donya Rooein, Amanda Cercas Curry, Dirk Hovy

PDF Project

Beyond Digital 'Echo Chambers': The Role of Viewpoint Diversity in Political Discussion

February, 2023

Increasingly taking place in online spaces, modern political conversations are typically perceived to be unproductively affirming - …

Rishav Hada, Amir Ebrahimi Fard, Sarah Shugars, Federico Bianchi, Patricia Rossini, Dirk Hovy, Rebekah Tromble, Nava Tintareva

PDF Project

Viewpoint: Artificial Intelligence Accidents Waiting to Happen?

January, 2023

Artificial Intelligence (AI) is at a crucial point in its development: stable enough to be used in production systems, and increasingly …

Federico Bianchi, Amanda Cercas Curry, Dirk Hovy

PDF Project

Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data

December, 2022

Twitter data have become essential to Natural Language Processing (NLP) and social science research, driving various scientific …

Federico Bianchi, Vincenzo Cutrona, Dirk Hovy

PDF Code Project

It's Not Just Hate: A Multi-Dimensional Perspective on Detecting Harmful Speech Online

December, 2022

Well-annotated data is a prerequisite for good Natural Language Processing models. Too often, though, annotation decisions are governed …

Federico Bianchi, Stefanie Hills, Patricia Rossini, Dirk Hovy, Rebekah Tromble, Nava Tintarev

PDF Code Project

SocioProbe: What, When, and Where Language Models Learn about Sociodemographics

December, 2022

Pre-trained language models (PLMs) have outperformed other NLP models on a wide range of tasks. Opting for a more thorough …

Anne Lauscher, Federico Bianchi, Samuel R. Bowman, Dirk Hovy

PDF Project

Bridging Fairness and Environmental Sustainability in Natural Language Processing

December, 2022

Fairness and environmental impact are important research directions for the sustainable development of artificial intelligence. …

Marius Hessenthaler, Emma Strubell, Dirk Hovy, Anne Lauscher

PDF Project

Measuring Harmful Representations in Scandinavian Language Models

December, 2022

Scandinavian countries are perceived as role-models when it comes to gender equality. With the advent of pre-trained language models …

Samia Touileb, Debora Nozza

PDF

Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages

October, 2022

Hate speech is a global phenomenon, but most hate speech datasets so far focus on English-language content. This hinders the …

Paul Röttger, Debora Nozza, Federico Bianchi, Dirk Hovy

PDF Project

Is It Worth the (Environmental) Cost? Limited Evidence for the Benefits of Diachronic Continuous Training

October, 2022

Language is constantly changing and evolving, leaving language models to quickly become outdated, both factually and linguistically. …

Giuseppe Attanasio, Debora Nozza, Federico Bianchi, Dirk Hovy

PDF Project

Welcome to the Modern World of Pronouns: Identity-Inclusive Natural Language Processing beyond Gender

October, 2022

The world of pronouns is changing – from a closed word class with few members to an open set of terms to reflect identities. However, …

Anne Lauscher, Archie Crowley, Dirk Hovy

PDF Project

Guiding the Release of Safer E2E Conversational AI through Value Sensitive Design

September, 2022

Over the last several years, end-to-end neural conversational agents have vastly improved their ability to carry unrestricted, …

A. Stevie Bergman, Gavin Abercrombie, Shannon Spruit, Dirk Hovy, Emily Dinan, Y-Lan Boureau, Verena Rieser

PDF Project

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

July, 2022

Hate speech detection models are typically evaluated on held-out test sets. However, this risks painting an incomplete and potentially …

Paul Röttger, Haitham Seelawi, Debora Nozza, Zeerak Talat, Bertie Vidgen

PDF Code

HATE-ITA: Hate Speech Detection in Italian Social Media Text

July, 2022

Online hate speech is a dangerous phenomenon that can (and should) be promptly counteracted properly. While Natural Language Processing …

Debora Nozza, Federico Bianchi, Giuseppe Attanasio

PDF Code Poster Slides

Hard and Soft Evaluation of NLP models with BOOtSTrap SAmpling - BooStSa

May, 2022

Natural Language Processing (NLP) ‘s applied nature makes it necessary to select the most effective and robust models. Producing …

Tommaso Fornaciari, Alexandra Uma, Massimo Poesio, Dirk Hovy

PDF Code Project

MilaNLP at SemEval-2022 Task 5: Using Perceiver IO for Detecting Misogynous Memes with Text and Image Modalities

April, 2022

In this paper, we describe the system proposed by the MilaNLP team for the Multimedia Automatic Misogyny Identification (MAMI) …

Giuseppe Attanasio, Debora Nozza, Federico Bianchi

PDF Code Video

Language Invariant Properties in Natural Language Processing

April, 2022

Meaning is context-dependent, but many properties of language (should) remain the same even if we transform the context. For example, …

Federico Bianchi, Debora Nozza, Dirk Hovy

PDF Code

XLM-EMO: Multilingual Emotion Prediction in Social Media Text

April, 2022

Detecting emotion in text allows social and computational scientists to study how people behave and react to online events. However, …

Federico Bianchi, Debora Nozza, Dirk Hovy

PDF Code Project

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

April, 2022

Labelled data is the foundation of most natural language processing tasks. However, labelling data is difficult and there often are …

Paul Röttger, Bertie Vidgen, Dirk Hovy, Janet B. Pierrehumbert

PDF Project

Pipelines for Social Bias Testing of Large Language Models

April, 2022

The maturity level of language models is now at a stage in which many companies rely on them to solve various tasks. However, while …

Debora Nozza, Federico Bianchi, Dirk Hovy

PDF Project Poster

Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals

April, 2022

Current language technology is ubiquitous and directly influences individuals' lives worldwide. Given the recent trend in AI on …

Debora Nozza, Federico Bianchi, Anne Lauscher, Dirk Hovy

PDF Code Project

Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection

April, 2022

Transformer-based Natural Language Processing models have become the standard for hate speech detection. However, the unconscious use …

Giuseppe Attanasio, Debora Nozza, Eliana Pastor, Dirk Hovy

PDF Code Project Video

Fair and Argumentative Language Modeling for Computational Argumentation

April, 2022

Although much work in NLP has focused on measuring and mitigating stereotypical bias in semantic spaces, research addressing bias in …

Carolin Holtermann, Anne Lauscher, Simone Paolo Ponzetto

PDF Project

DS-TOD: Efficient Domain Specialization for Task Oriented Dialog

April, 2022

Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over …

Chia-Chien Hung, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš

PDF

SAFETYKIT: First Aid for Measuring Safety in Open-domain Conversational Systems

March, 2022

The social impact of natural language processing and its applications has received increasing attention. In this position paper, we …

Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser

PDF Project

Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists

March, 2022

Natural Language Processing (NLP) models risk overfitting to specific terms in the training data, thereby reducing their performance, …

Giuseppe Attanasio, Debora Nozza, Dirk Hovy, Elena Baralis

PDF Code Project Video

Text Analysis in Python for Social Scientists – Prediction and Classification

January, 2022

Text contains a wealth of information about about a wide variety of sociocultural constructs. Automated prediction methods can infer …

Dirk Hovy

PDF

Learning from Disagreement: A Survey

December, 2021

Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer evidence that humans disagree, from objective tasks such …

Alexandra N Uma, Tommaso Fornaciari, Dirk Hovy, Silviu Paun, Barbara Plank, Massimo Poesio

PDF

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

August, 2021

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the …

Federico Bianchi, Silvia Terragni, Dirk Hovy

PDF Code

On the Gap between Adoption and Understanding in NLP

August, 2021

There are some issues with current research trends in NLP that can hamper the free development of scientific research. We identify five …

Federico Bianchi, Dirk Hovy

PDF

Five sources of bias in natural language processing

August, 2021

Recently, there has been an increased interest in demographically grounded bias in natural language processing (NLP) applications. Much …

Dirk Hovy, Shrimai Prabhumoye

PDF Project

Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection

August, 2021

Reducing and counter-acting hate speech on Social Media is a significant concern. Most of the proposed automatic methods are conducted …

Debora Nozza

PDF Project Poster Slides

'We will Reduce Taxes' - Identifying Election Pledges with Language Models

August, 2021

In an election campaign, political parties pledge to implement various projects–should they be elected. But do they follow …

Tommaso Fornaciari, Dirk Hovy, Elin Naurin, Julia Runeson, Robert Thomson, Pankaj Adhikari

PDF Project

The Importance of Modeling Social Factors of Language: Theory and Practice

June, 2021

Natural language processing (NLP) applications are now more powerful and ubiquitous than ever before. With rapidly developing (neural) …

Dirk Hovy, Diyi Yang

PDF Project

HONEST: Measuring Hurtful Sentence Completion in Language Models

June, 2021

Language models have revolutionized the field of NLP. However, language models capture and proliferate hurtful stereotypes, especially …

Debora Nozza, Federico Bianchi, Dirk Hovy

PDF Code Project Poster Slides Blog Post

Language in a (Search) Box: Grounding Language Learning in Real-World Human-Machine Interaction

June, 2021

We investigate grounded language learning through real-world data, by modelling a teacher-learner dynamics through the natural interactions occurring between users and search engines.

Federico Bianchi, Ciro Greco, Jacopo Tagliabue

PDF Tweet

MilaNLP @ WASSA: Does BERT Feel Sad When You Cry?

May, 2021

The paper describes the MilaNLP team’s submission (Bocconi University, Milan) in the WASSA 2021 Shared Task on Empathy Detection and …

Tommaso Fornaciari, Federico Bianchi, Debora Nozza, Dirk Hovy

PDF

FEEL-IT: Emotion and Sentiment Classification for the Italian Language

May, 2021

Sentiment analysis is a common task to understand people’s reactions online. Still, we often need more nuanced information: is …

Federico Bianchi, Debora Nozza, Dirk Hovy

PDF Code

Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning

May, 2021

Supervised learning assumes that a ground truth label exists. However, the reliability of this ground truth depends on human …

Tommaso Fornaciari, Alexandra Uma, Silviu Paun, Barbara Plank, Dirk Hovy and Massimo Poesio

PDF Video

Universal Joy A Data Set and Results for Classifying Emotions Across Languages

April, 2021

While emotions are universal aspects of human psychology, they are expressed differently across different languages and cultures. We …

Sotiris Lamprinidis, Federico Bianchi, Daniel Hardt, Dirk Hovy

PDF Code

BERTective: Language Models and Contextual Information for Deception Detection

April, 2021

Spotting a lie is challenging but has an enormous potential impact on security as well as private and public safety. Several NLP …

Tommaso Fornaciari, Federico Bianchi, Dirk Hovy, Massimo Poesio

PDF Code Dataset

Cross-lingual Contextualized Topic Models with Zero-shot Learning

March, 2021

We introduce a novel topic modeling method that can make use of contextulized embeddings (e.g., BERT) to do zero-shot cross-lingual topic modeling.

Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora Nozza, Elisabetta Fersini

PDF Code Slides Blog Post

Text Analysis in Python for Social Scientists – Discovery and Exploration

December, 2020

Text is everywhere, and it is a fantastic resource for social scientists. However, because it is so abundant, and because language is …

Dirk Hovy

PDF

“You Sound Just Like Your Father” Commercial Machine Translation Systems Include Stylistic Biases

July, 2020

The main goal of machine translation has been to convey the correct content. Stylistic considerations have been at best secondary. We …

Dirk Hovy, Federico Bianchi, Tommaso Fornaciari

PDF Video

Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview

July, 2020

An increasing number of natural language processing papers address the effect of bias on predictions, introducing mitigation techniques …

Deven Santosh Shah, H. Andrew Schwartz, Dirk Hovy

PDF Video

Visualizing Regional Language Variation Across Europe on Twitter

March, 2020

Geotagged Twitter data allows us to investigate correlations of geographic language variation, both at an interlingual and intralingual …

Dirk Hovy, Afshin Rahimi, Timothy Baldwin, Julian Brooke

PDF DOI

What the [MASK]? Making Sense of Language-Specific BERT Models

March, 2020

Recently, Natural Language Processing (NLP) has witnessed an impressive progress in many areas, due to the advent of novel, pretrained …

Debora Nozza, Federico Bianchi, Dirk Hovy

PDF Code Project Source Document

Helpful or Hierarchical? Predicting the Communicative Strategies of Chat Participants, and their Impact on Success

March, 2020

When interacting with each other, we motivate, advise, inform, show love or power towards our peers. However, the way we interact may …

Farzana Rashid, Tommaso Fornaciari, Dirk Hovy, Eduardo Blanco, Fernando Vega-Redondo

PDF

Fake opinion detection: how similar are crowdsourced datasets to real data?

January, 2020

Identifying deceptive online reviews is a challenging tasks for Natural Language Processing (NLP). Collecting corpora for the task is …

Tommaso Fornaciari, Letitia Cagnina, Paolo Rosso, Massimo Poesio

PDF DOI

A Case for Soft Loss Functions

January, 2020

Recently, Peterson et al. provided evidence of the benefits of using probabilistic soft labels generated from crowd annotations for …

Alexandra Uma, Tommaso Fornaciari, Dirk Hovy, Silviu Paun, Barbara Plank, Massimo Poesio

PDF

Identifying Linguistic Areas for Geolocation

November, 2019

Geolocating social media posts relies on the assumption that language carries sufficient geographic information. However, locations are …

Tommaso Fornaciari, Dirk Hovy

PDF

Hey Siri. Ok Google. Alexa: A topic modeling of user reviews for smart speakers

November, 2019

User reviews provide a significant source of information for companies to understand their market and audience. In order to discover …

Hanh Nguyen, Dirk Hovy

PDF

Geolocation with Attention-Based Multitask Learning Models

November, 2019

Geolocation, predicting the location of a post based on text and other information, has a huge potential for several social media …

Tommaso Fornaciari, Dirk Hovy

PDF

Dense Node Representation for Geolocation

November, 2019

Prior research has shown that geolocation can be substantially improved by including user network information. While effective, it …

Tommaso Fornaciari, Dirk Hovy

PDF

Women’s Syntactic Resilience and Men’s Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing

July, 2019

Several linguistic studies have shown the prevalence of various lexical and grammatical patterns in texts authored by a person of a …

Aparna Garimella, Carmen Banea, Dirk Hovy, Rada Mihalcea

PDF

Peer networks and entrepreneurship: A Pan-African RCT

January, 2019

Can large-scale peer interaction foster entrepreneurship and innovation? We conducted an RCT involving almost 5,000 entrepreneurs from …

Fernando Vega-Redondo, Paolo Pin, Diego Ubfal, Cristiana Benedetti-Fasil, Charles Brummitt, Gaia Rubera, Dirk Hovy, Tommaso Fornaciari

PDF

Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information

November, 2018

Dirk Hovy, Tommaso Fornaciari

PDF

Predicting News Headline Popularity with Syntactic and Semantic Knowledge Using Multi-Task Learning

October, 2018

Newspapers need to attract readers with headlines, anticipating their readers’ preferences. These preferences rely on topical, …

Sotiris Lamprinidis, Daniel Hardt, Dirk Hovy

PDF

Comparing Bayesian Models of Annotation

October, 2018

The analysis of crowdsourced annotations in natural language processing is concerned with identifying (1) gold standard labels, (2) …

Silviu Paun, Bob Carpenter, Jon Chamberlain, Dirk Hovy, Udo Kruschwitz, Massimo Poesio

PDF

Capturing Regional Variation with Distributed Place Representations and Geographic Retrofitting

October, 2018

Dialects are one of the main drivers of language variation, a major challenge for natural language processing tools. In most languages, …

Dirk Hovy, Christoph Purschke

PDF

The Social and the Neural Network: How to Make Natural Language Processing about People again

June, 2018

Over the years, natural language processing has increasingly focused on tasks that can be solved by statistical models, but ignored the …

Dirk Hovy

PDF