Franciska de Jong
Text as social data
In the domain of Natural Language Processing (NLP), text mining is currently one of the most salient fields of study. The ever growing volumes of text shared via social media platforms have made it possible to apply text mining to data sets that are rooted in real life communicative behaviour. But the development of methodological frameworks that support the interpretation of the patterns found in language data in terms of social phenomena, and the contribution of the field of text mining to the understanding of human behaviour, is still in its infancy. Based on three example studies this talk will advocate a stronger collaboration between the field of NLP and the social sciences. The first study is an exploration of how satisfaction as expressed by (former) employees in online fora can predict the earnings of a company. The second study explores how the emotional state of an interviewee correlates with intonation patterns in an interview. And the third study investigates to what extent language use can predict a speaker’s age and gender.