Tell truth from fiction from the words you used

Newswise: Social media has supercharged the spread of information and misinformation, which presents significant challenges when it comes to distinguishing between fact and fiction on social media platforms like Twitter.

One of the most prolific, widely shared, and highly analyzed Twitter accounts in recent years belonged to former US President Donald Trump. In the last year of his presidency, Trump tweeted, on average, more than 33 times a day. These tweets ranged from easily verifiable statements of fact to comments that were demonstrably false.

The sheer volume of Trump’s social media record and its extensive analysis by fact checkers allowed a team of researchers to make a unique comparison of his word choices when sharing true or false information.

The results of this study, published in the journal Psychological Science, show that Trump’s word choices differed in clear and predictable ways when he shared information he knew to be factually incorrect. Van der Zee and his colleagues then used this information to create a model to predict whether a single tweet was correct or incorrect. Similar custom linguistic models may eventually help detect lies in other real-world settings.

“We created a custom language model that could predict which statements by the former president were correct and which were potentially misleading,” said Sophie van der Zee, a researcher at Erasmus University in Rotterdam and first author of the paper. “His language was so consistent that in about three-quarters of the cases, our model was able to correctly predict whether Trump’s tweets were factual or not based solely on the use of his words.”

For their analysis, the researchers collected two separate data sets, each containing 3 months of presidential tweets sent by the @realDonaldTrump Twitter account. The researchers then compared these data sets to a verified data set of Washington Post tweets to determine whether a tweet was correct or incorrect.

To avoid data contamination, the researchers removed all tweets that did not reflect Trump’s use of language (eg, retweets, long quotes).

The first set of data revealed big differences in language use between Trump’s correct and incorrect tweets. Van der Zee and his colleagues then used this information to create a model to predict whether an individual tweet was real.

“Using this model, we were able to predict how sincere Trump was in three out of four tweets,” van der Zee said. “We also compared our new custom model to other similar detection models and found that it outperformed them by at least 5 percentage points.”

Given these results, the researchers speculate that their custom model could help distinguish fact from fiction in future communications from Trump. Similar models could also be made for other politicians who are systematically verified.

“Our document also constitutes a warning to all people who share information online,” said van der Zee. “It was already known that information people post online can be used against them. We now show, using only publicly available data, that the words people use when sharing information online can reveal sensitive information about the sender, including an indication of their trustworthiness.”

# # #

Reference: van der Zee, S., Poppe, R., Havrileck, A., and Baillon, A. (2022). A Personal Model of Trumpery: Linguistic Deception Detection in a High-Risk Real-World Environment. psychological science. Advance publication online.


Leave a Comment

Your email address will not be published.