hate speech | MilaNLP Lab @ Bocconi University

Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages

Hate speech is a global phenomenon, but most hate speech datasets so far focus on English-language content. This hinders the development of more effective hate speech detection models in hundreds of languages spoken by billions across the world. More …

HATE-ITA: Hate Speech Detection in Italian Social Media Text

Online hate speech is a dangerous phenomenon that can (and should) be promptly counteracted properly. While Natural Language Processing supplies appropriate algorithms for trying to reach this objective, all research efforts are directed toward the …

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

Hate speech detection models are typically evaluated on held-out test sets. However, this risks painting an incomplete and potentially misleading picture of model performance because of increasingly well-documented systematic gaps and biases in hate …

Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection

Transformer-based Natural Language Processing models have become the standard for hate speech detection. However, the unconscious use of these techniques for such a critical task comes with negative consequences. Various works have demonstrated that …

Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals

Current language technology is ubiquitous and directly influences individuals' lives worldwide. Given the recent trend in AI on training and constantly releasing new and powerful large language models (LLMs), there is a need to assess their biases …

Pipelines for Social Bias Testing of Large Language Models

The maturity level of language models is now at a stage in which many companies rely on them to solve various tasks. However, while research has shown how biased and harmful these models are, **systematic ways of integrating social bias tests into …

Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists

Natural Language Processing (NLP) models risk overfitting to specific terms in the training data, thereby reducing their performance, fairness, and generalizability. E.g., neural hate speech detection models are strongly influenced by identity terms …

Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection

Reducing and counter-acting hate speech on Social Media is a significant concern. Most of the proposed automatic methods are conducted exclusively on English and very few consistently labeled, non-English resources have been proposed. Learning to …

HONEST: Measuring Hurtful Sentence Completion in Language Models

Language models have revolutionized the field of NLP. However, language models capture and proliferate hurtful stereotypes, especially in text generation. Our results show that **4.3% of the time, language models complete a sentence with a hurtful …