NLP

Consistency is Key: Disentangling Label Variation in Natural Language Processing with Intra-Annotator Agreement

We commonly use agreement measures to assess the utility of judgements made by human annotators in Natural Language Processing (NLP) tasks. While inter-annotator agreement is frequently used as an indication of label reliability by measuring …

No for Some, Yes for Others: Persona Prompts and Other Sources of False Refusal in Language Models

Large language models (LLMs) are increasingly integrated into our daily lives and personalized. However, LLM personalization might also increase unintended side effects. Recent work suggests that persona prompting can lead models to falsely refuse …

TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent

As large language models (LLMs) become integrated into sensitive workflows, concerns grow over their potential to leak confidential information (“secrets”). We propose TrojanStego, a novel threat model in which an adversary fine-tunes an LLM to embed …

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

Large language models (LLMs) are helping millions of users write texts about diverse issues, and in doing so expose users to different ideas and perspectives. This creates concerns about issue bias, where an LLM tends to present just one perspective …

Are Large Language Models for Education Reliable for All Languages?

Large language models (LLMs) are increasingly being adopted in educational settings. These applications expand beyond English, though current LLMs remain primarily English-centric. In this work, we ascertain if their use in education settings in …

Measuring Gender Bias in Language Models in Farsi?

As Natural Language Processing models become increasingly embedded in everyday life, ensuring that these systems can measure and mitigate bias is critical. While substantial work has been done to identify and mitigate gender bias in English, Farsi …

HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter

To address the global challenge of online hate speech, prior research has developed detection models to flag such content on social media. However, due to systematic biases in evaluation datasets, the real-world effectiveness of these models remains …

Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions

People naturally vary in their annotations for subjective questions and some of this variation is thought to be due to the person's sociodemographic characteristics. LLMs have also been used to label data, but recent work has shown that models …

Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting

The rapid development of Large Language Models (LLMs) opens up the possibility of using them aspersonal tutors. This has led to the development of several intelligent tutoring systems and learning assistants that use LLMs as back-ends with various …

Socially Aware Language Technologies: Perspectives and Practices

Language technologies have advanced substantially, particularly with the introduction of large language models. However, these advancements can exacerbate several issues that models have traditionally faced, including bias, evaluation, and risk. In …