(This article is translated from Dutch, the original is here)
“During a flood or an earthquake, you could monitor social media to get extra information on what's happening. Where do people need help? What exactly is going on?” This way, Suzan Verberne of the LIACS institute for computer science wants to make better use of social media. “Can we use an analysis of the posts to help society? That is the central question of our project.”
The project is called Social Media Analytics, and it's got universities from all over the world working together: Leiden, but also universities from Brazil, Australia, Indonesia and Norway. “We had received a European grant that allowed the participating scientists to go on work visits to each other's universities, but I guess that's not happening”, Verberne sighs from behind her webcam.
The analysis starts by making a dataset. For instance: every single tweet posted to Twitter in the past week. “Next, you have to filter out the ones that are relevant for you. Most of those do not have the hashtags relating to you question, and there could be several hashtags applicable. Right now, #coronavirus, #covid19, #corona, #covid-19 would still just be the start. You can also look for relevant words in the post, like anything that mentions the health authorities, for example. Of course, in a way the situation is turned upside-down at the moment: almost everything on Twitter is about corona now, and hashtags like #WorkingFromHome or #Toiletpaper are related to the crisis too.
Mandatory vaccinations
The role of the Leiden scientists is to do a text and a network analysis on this set of filtered posts. The network analysis tells you who is in contact with whom, who spreads messages and who sees them. Verbernes specialty is text analysis: do messages have a positive or a negative tone, for instance. “You could also use this approach to gauge how people think about political issues like mandatory measles vaccination, for instance.
In normal times, about a six thousand tweets are sent online every second – and in times of crisis, this shoots up. Humans cannot read this fast, so let the computer do it for us. “You take a bit of your data, say five hundred messages. You seperate them by hand: this one is an eye witness account, this one is a reaction to a news article, etc. By doing this, you train a piece of software called a classifier, that is going to do the same job, but for your entire set of relevant posts.”
Verberne and her colleauges have done this before, for instance when looking at the users of health forums on the internet. “We could find experiences of being ill, or rare side-effects of medication. It's not perfect, but it works pretty good.”
Ideally, you want to say something about the quality and relevance of a post. An earlier research about postings to the Dutch Viva Forum – aimed at young women – indicated that properties of a post like average word length or use of punctuation can help you find the informative messages. But it still hard: “What is a reliable source? I, myself, notice that I tend to trust a post if there is some statistics involved. But in times like these, doing back-of-the-envelope calculations can be a bad idea. You could also look at the amount of expressed trust in health authorities like the Dutch RIVM, and the quality of the information in someone's post. Is there a relationship between the two?”
She stresses that after the analysis, the job is hardly finished. The collaboration has media sociologists and news researchers in it as well. “With this type of data science, it is extremely important to have people with an overview of the field as a whole. Can we recognize posts with misinformation? This is an interesting question, because there's a large gray area between lying on purpose, or the well-meant spreading of information that turns out to be wrong on reflection.”
The thing about Twitter is that it's not the most popular social network by far. Out of every hundred people in the Netherlands, 99 are not on Twitter. It is, however, very useful for data scientists, because almost every post is public there. “Instagram? Really, we don't mine that one at all, and Facebook only in a very limited way.”
Except Twitter, the researchers also look at web forums, and the comment sections on sites like Reddit and YouTube. They hope that their analysis will be of interest to lawmakers, government officials, and journalists as well as data scientists. So if you're ever in a crisis or want your opinion on political issues to be known to any of these groups of people: now you know where to post it.