What is the ground truth for credibility and truth in microblogs?

In our recent SocialCom 2013 paper that won the best paper award, we investigate how to evaluate methods to find credible information on microblogs like Twitter. The journal version of the paper appears in the ASE Human Journal (ASEHuman_2013.pdf).

It has become increasingly important to filter and highlight important information on sites like Twitter that feature short messages from unknown sources, arriving and transmitted quickly in the network. On the positive side, these messages provide fast access to information, much faster than any other news resource. On the negative side, incorrect information can also propagate easily due to lack of information, social herding effects. Hence, there is a great need to develop methods to identify credible and correct information on such sites.

While there has been a great deal of work in developing methods to find such information, all these methods have limited impact if they do not provide results based on a golden rule of what constitutes correct and true results. This is actually a very hard problem and has not studied much in the literature. This is what we do in this work.

Survey methods are often used in the literature as a way to get “true” information. Surveys are a direct way to get truth and credibility information and are to a large degree unbiased. But they have limitations as well. Survey participants often do not know the information sender and do not know the topic of the message. In fact, in our paper, we run two surveys on the same set of messages with small variations and get very different results.

In network behavior like retweeting is a secondary way to assess truth in messages. This method overcomes the limitations described above. Those in the network know the topic of the tweet and the sender better, and also act on information that is available at the time. However, this type of a signal is noisy at best. There are many reasons to retweet, from social influence to entertainment.

In essence, both methods are flawed in a way, give noisy indicators of truth and credibility. Hence using these methods to train a classifier is like trying to predict noise. We actually illustrate this by showing that the classifiers trained on these methods do not perform much better than the baseline, i.e. prediction by chance. But, the interesting thing is that the errors made by both methods are likely uncorrelated. Hence, by carefully combining them we can get much better classifiers.

We show that we can do much better prediction by carefully choosing classes of “credible” and “not credible” based on multiple inputs:

•Credible according to survey (4/5 out of 1-5 scale) and retweeted more than twice, versus:
•Not credible according to survey (1/2 out of 1-5 scale) and not rewteeted.

Note that, we throw away the middle uncertain classes in which either the individual methods are uncertain, or both methods disagree. By including those cases increases the training set but also introduces more noise, resulting in worse accuracy overall with higher recall.

If this methodology is used for determining ground truth, the type of survey is not important. It is possible to define credible information of all messages or only messages considered newsworthy.

Accuracy of our methods (TRT, NTRT) versus the traditional methods (T, NT, RT, RTT) in two datasets corresponding to messages on Hurricane Sandy (FR) and the following relief effort (S) are shown below.

March 2014

next >

< previous