NLP

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

Large language models (LLMs) are helping millions of users write texts about diverse issues, and in doing so expose users to different ideas and perspectives. This creates concerns about issue bias, where an LLM tends to present just one perspective …

Are Large Language Models for Education Reliable for All Languages?

Large language models (LLMs) are increasingly being adopted in educational settings. These applications expand beyond English, though current LLMs remain primarily English-centric. In this work, we ascertain if their use in education settings in …

Measuring Gender Bias in Language Models in Farsi?

As Natural Language Processing models become increasingly embedded in everyday life, ensuring that these systems can measure and mitigate bias is critical. While substantial work has been done to identify and mitigate gender bias in English, Farsi …

HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter

To address the global challenge of online hate speech, prior research has developed detection models to flag such content on social media. However, due to systematic biases in evaluation datasets, the real-world effectiveness of these models remains …

Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions

People naturally vary in their annotations for subjective questions and some of this variation is thought to be due to the person's sociodemographic characteristics. LLMs have also been used to label data, but recent work has shown that models …

Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting

The rapid development of Large Language Models (LLMs) opens up the possibility of using them aspersonal tutors. This has led to the development of several intelligent tutoring systems and learning assistants that use LLMs as back-ends with various …

Socially Aware Language Technologies: Perspectives and Practices

Language technologies have advanced substantially, particularly with the introduction of large language models. However, these advancements can exacerbate several issues that models have traditionally faced, including bias, evaluation, and risk. In …

Can I Introduce My Boyfriend to My Grandmother? Evaluating Large Language Models Capabilities on Iranian Social Norm Classification

Creating globally inclusive AI systems demands datasets reflecting diverse social norms. Iran, with its unique cultural blend, offers an ideal case study, with Farsi adding linguistic complexity. In this work, we introduce the Iranian Social Norms …

The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Human feedback is central to the alignment of Large Language Models (LLMs). However, open questions remain about methods (how), domains (where), people (who) and objectives (to what end) of feedback processes. To navigate these questions, we …

Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps

Current automatic speech recognition (ASR) models are designed to be used across many languages and tasks without substantial changes. However, this broad language coverage hides performance gaps within languages, for example, across genders. Our …