NLP

What the [MASK]? Making Sense of Language-Specific BERT Models

Recently, Natural Language Processing (NLP) has witnessed an impressive progress in many areas, due to the advent of novel, pretrained contextual representation models. In particular, Devlin et al. (2019) proposed a model, called BERT (Bidirectional …

INTEGRATOR

Incorporating Demographic Factors into Natural Language Processing Models

MiMac

Mixed methods for analyzing political parties’ promises to voters during election campaigns

MONICA

MONItoring Coverage, Attitudes and Accessibility of Italian measures in response to COVID-19

Twitter Healthy Conversations

Devising Metrics for Assessing Echo Chambers, Incivility, and Intolerance on Twitter

A Case for Soft Loss Functions

Recently, Peterson et al. provided evidence of the benefits of using probabilistic soft labels generated from crowd annotations for training a computer vision model, showing that using such labels maximizes performance of the models over unseen data. …

Fake opinion detection: how similar are crowdsourced datasets to real data?

Identifying deceptive online reviews is a challenging tasks for Natural Language Processing (NLP). Collecting corpora for the task is difficult, because normally it is not possible to know whether reviews are genuine. A common workaround involves …

Dense Node Representation for Geolocation

Prior research has shown that geolocation can be substantially improved by including user network information. While effective, it suffers from the curse of dimensionality, since networks are usually represented as sparse adjacency matrices of …

Geolocation with Attention-Based Multitask Learning Models

Geolocation, predicting the location of a post based on text and other information, has a huge potential for several social media applications. Typically, the problem is modeled as either multi-class classification or regression. In the first case, …

Hey Siri. Ok Google. Alexa: A topic modeling of user reviews for smart speakers

User reviews provide a significant source of information for companies to understand their market and audience. In order to discover broad trends in this source, researchers have typically used topic models such as Latent Dirichlet Allocation (LDA). …