BERT

ferret: a Framework for Benchmarking Explainers on Transformers

As Transformers are increasingly relied upon to solve complex NLP problems, there is an increased need for their decisions to be humanly interpretable. While several explainable AI (XAI) techniques for interpreting the outputs of transformer-based …

Measuring Harmful Representations in Scandinavian Language Models

Scandinavian countries are perceived as role-models when it comes to gender equality. With the advent of pre-trained language models and their widespread usage, we investigate to what extent gender-based harmful and toxic content exist in selected …

Is It Worth the (Environmental) Cost? Limited Evidence for the Benefits of Diachronic Continuous Training

Language is constantly changing and evolving, leaving language models to quickly become outdated, both factually and linguistically. Recent research proposes we continuously update our models using new data. Continuous training allows us to teach …

HATE-ITA: Hate Speech Detection in Italian Social Media Text

Online hate speech is a dangerous phenomenon that can (and should) be promptly counteracted properly. While Natural Language Processing supplies appropriate algorithms for trying to reach this objective, all research efforts are directed toward the …

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

Hate speech detection models are typically evaluated on held-out test sets. However, this risks painting an incomplete and potentially misleading picture of model performance because of increasingly well-documented systematic gaps and biases in hate …

Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection

Transformer-based Natural Language Processing models have become the standard for hate speech detection. However, the unconscious use of these techniques for such a critical task comes with negative consequences. Various works have demonstrated that …

ferret: a Framework for Benchmarking Explainers on Transformers

Measuring Harmful Representations in Scandinavian Language Models

Is It Worth the (Environmental) Cost? Limited Evidence for the Benefits of Diachronic Continuous Training

HATE-ITA: Hate Speech Detection in Italian Social Media Text

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection

Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals

Pipelines for Social Bias Testing of Large Language Models

XLM-EMO: Multilingual Emotion Prediction in Social Media Text

Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection