dataset

Detoxify-IT: An Italian Parallel Dataset for Text Detoxification

Toxic language online poses growing challenges for content moderation. Detoxification, which rewrites toxic content into neutral form, offers a promising alternative but remains underexplored beyond English. We present Detoxify-IT, the first Italian …

MONICA: Monitoring Coverage and Attitudes of Italian Measures in Response to COVID-19

Modern social media have long been observed as a mirror for public discourse and opinions. Especially in the face of exceptional events, computational language tools are valuable for understanding public sentiment and reacting quickly. During the …

It's Not Just Hate: A Multi-Dimensional Perspective on Detecting Harmful Speech Online

Well-annotated data is a prerequisite for good Natural Language Processing models. Too often, though, annotation decisions are governed by optimizing time or annotator agreement. We make a case for nuanced efforts in an interdisciplinary setting for …

Detoxify-IT: An Italian Parallel Dataset for Text Detoxification

MONICA: Monitoring Coverage and Attitudes of Italian Measures in Response to COVID-19

It's Not Just Hate: A Multi-Dimensional Perspective on Detecting Harmful Speech Online

Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data

Measuring Harmful Representations in Scandinavian Language Models

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

XLM-EMO: Multilingual Emotion Prediction in Social Media Text

FEEL-IT: Emotion and Sentiment Classification for the Italian Language

BERTective: Language Models and Contextual Information for Deception Detection

Fake opinion detection: how similar are crowdsourced datasets to real data?