ethics | MilaNLP Lab @ Bocconi University

Metrics for What, Metrics for Whom: Assessing Actionability of Bias Evaluation Metrics in NLP

This paper introduces the concept of actionability in the context of bias measures in natural language processing (NLP). We define actionability as the degree to which a measure’s results enable informed action and propose a set of desiderata for …

What about ''em''? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. Exclusion is particularly harmful in one of the most popular NLP applications, machine translation (MT). Wrong pronoun …

What about ''em''? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

Welcome to the Modern World of Pronouns: Identity-Inclusive Natural Language Processing beyond Gender

The world of pronouns is changing – from a closed word class with few members to an open set of terms to reflect identities. However, Natural Language Processing (NLP) barely reflects this linguistic shift, resulting in the possible exclusion of …

Guiding the Release of Safer E2E Conversational AI through Value Sensitive Design

Over the last several years, end-to-end neural conversational agents have vastly improved their ability to carry unrestricted, open-domain conversations with humans. However, these models are often trained on large datasets from the Internet and, as …

Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview

An increasing number of natural language processing papers address the effect of bias on predictions, introducing mitigation techniques at different parts of the standard NLP pipeline (data and models). However, these works have been conducted …

“You Sound Just Like Your Father” Commercial Machine Translation Systems Include Stylistic Biases

The main goal of machine translation has been to convey the correct content. Stylistic considerations have been at best secondary. We show that as a consequence, the output of three commercial machine translation systems (Bing, DeepL, Google) make …