Search

Scott A. Hale

The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising 'Alignment' in Large Language Models
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values