PHILADELPHIA, Sept. 29 (UPI) -- That person on Twitter claiming to make $300,000 a year working from home is probably lying. But researchers say a person's tweets can reveal their income.
Increasingly, data scientists have been highlighting the predictive value of social media. The field of linguistics is just one of many digging for insights in the treasure trove of mineable data that is the Twittersphere.
New analysis by a team of linguists and computer scientists at the University of Pennsylvania suggests a person's Twitter history can accurately pinpoint his or her income bracket. In scanning Twitter profiles for words indicating a person's occupation, researchers corralled 5,191 Twitter handles and more than 10 million tweets to study.
Their findings were published this week in the journal PLOS ONE.
"It's the largest dataset of its kind for this type of research," Daniel Preotiuc-Pietro, a post-doctoral researcher at Penn's Positive Psychology Center, explained in a press release. "The dataset enabled us to do something no one has really done before."
To organize their data, scientists used a job code system employed by economists in the United Kingdom, which divides occupation into nine classes, each with average income estimates.
From there, the analysts worked backwards from tweets to tweeter, building an algorithm to find words and phrases unique to each class. Their work offered a variety of insights and previously unrealized patterns between speech behavior and income.
Some of these insights confirmed previous research, that certain words and phrases reveal a person's age, gender and other social data, and that these cues can be pieced together to accurately predict a person's income.
The analysis also showed that lower-income tweeters tend to use more swear words, while higher-income tweeters were more likely to discuss politics and business.
Other patterns were more surprising: Lower-income users tended to be more optimistic, while higher-income tweeters expressed anger and fear more frequently.
The analysis also offered a wide-angle view of how people from different income levels use Twitter.
"Lower-income users or those of a lower socioeconomic status use Twitter more as a communication means among themselves," Preotiuc-Pietro said. "High-income people use it more to disseminate news, and they use it more professionally than personally."
Researchers say their work can serve as a benchmark for other scientists looking to glean socioeconomic insights from Twitter data. Previous studies have shown Twitter's potential for identifying user's health risks and predicting outbreaks of the flu.