NLP

Cross-lingual Contextualized Topic Models with Zero-shot Learning

We introduce a novel topic modeling method that can make use of contextulized embeddings (e.g., BERT) to do zero-shot cross-lingual topic modeling.

Text Analysis in Python for Social Scientists – Discovery and Exploration

Text is everywhere, and it is a fantastic resource for social scientists. However, because it is so abundant, and because language is so variable, it is often difficult to extract the information we want. There is a whole subfield of AI concerned …

Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview

An increasing number of natural language processing papers address the effect of bias on predictions, introducing mitigation techniques at different parts of the standard NLP pipeline (data and models). However, these works have been conducted …

“You Sound Just Like Your Father” Commercial Machine Translation Systems Include Stylistic Biases

The main goal of machine translation has been to convey the correct content. Stylistic considerations have been at best secondary. We show that as a consequence, the output of three commercial machine translation systems (Bing, DeepL, Google) make …

Visualizing Regional Language Variation Across Europe on Twitter

Geotagged Twitter data allows us to investigate correlations of geographic language variation, both at an interlingual and intralingual level. Based on data-driven studies of such relationships, this paper investigates regional variation of language …

Helpful or Hierarchical? Predicting the Communicative Strategies of Chat Participants, and their Impact on Success

When interacting with each other, we motivate, advise, inform, show love or power towards our peers. However, the way we interact may also hold some indication on how successful we are, as people often try to help each other to achieve their goals. …

What the [MASK]? Making Sense of Language-Specific BERT Models

Recently, Natural Language Processing (NLP) has witnessed an impressive progress in many areas, due to the advent of novel, pretrained contextual representation models. In particular, Devlin et al. (2019) proposed a model, called BERT (Bidirectional …

INTEGRATOR

Incorporating Demographic Factors into Natural Language Processing Models

MiMac

Mixed methods for analyzing political parties’ promises to voters during election campaigns

MONICA

MONItoring Coverage, Attitudes and Accessibility of Italian measures in response to COVID-19