Paul Röttger | MilaNLP Lab @ Bocconi University

Latest

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance
HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions
Around the World in 24 Hours: Probing LLM Knowledge of Time and Place
Scaling language model size yields diminishing returns for single-message political persuasion
Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models
My Answer is C: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviors in Large Language Models
Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts
Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising 'Alignment' in Large Language Models
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks