AppsApps

Predicting the author's gender using computational stylistic methods

Date
Speaker
  1. George Mikros
Abstract

Online textual production increases rapidly through Web 2.0 media, enriching traditional text genres with new ones. Blogs produce daily more than 900,000 posts, while in microblogging services like Twitter, approximately 5,700 tweets per second are sent from more than 231 million registered accounts. Automatic identification of author’s characteristics (e.g. gender, age and personality) in such micro-texts have started to be the focus of intensive research, mainly due to the many possible applications including forensics, online audience identification for targeted advertisement and socio(linguistic) analysis on gender identity issues.

This lecture will present the state-of-the-art in automatic gender identification analysis in social media texts, emphasizing to modern computational stylistic methods using shallow text features (n-grams) and machine learning algorithms. The data used are part of the first Greek Social Media Texts Corpus which has been compiled at the University of Athens (Greece) for studying wider linguistic phenomena in this genre. A detailed comparison of stylometric profiles of men and women users will be presented based on the computational analysis of their blog posts and tweets. Furthermore, experiments predicting a user’s gender in tweets and blog posts will be presented and the reported results will be linked to recently observed neuro-cognitive gender differences.