Across social media platforms users (sub)consciously represent themselves in a way which is appropriate for their intended audience. This has unknown impacts on studies with unobtrusive designs based on digital (social) platforms, and studies of contemporary social phenomena in online settings. A lack of appropriate methods to identify, control for, and mitigate the effects of self-representation, the propensity to express socially responding characteristics or self-censorship in digital settings, hinders the ability of researchers to confidently interpret and generalize their findings. This article proposes applying boosted regression modelling to fill this research gap. A case study of paid Amazon Mechanical Turk workers (n = 509) is presented where workers completed psychometric surveys and provided anonymized access to their Facebook timelines. Our research finds indicators of self-representation on Facebook, facilitating suggestions for its mitigation. We validate the use of LIWC for Facebook personality studies, as well as find discrepancies with extant literature about the use of LIWC-only approaches in unobtrusive designs. Using survey data and LIWC sentiment categories as predictors, the boosted regression model classified the Five Factor personality model with an average accuracy of 74.6%. The contribution of this work is an accurate prediction of psychometric information based on short, informal text.
ASJC Scopus subject areas
- Biochemistry, Genetics and Molecular Biology(all)
- Agricultural and Biological Sciences(all)