Developing a visual dashboard to track online conversations surrounding COVID-19.
The COVID-19 crisis has required mass communication and public understanding on an unprecedented scale. During this time there has been a proliferation of online discussion, news sharing and emergence of ‘information sources’ concerning COVID-19. Such proliferation has raised concerns about the potential dangers of dis/misinformation. At a time when governments and health authorities must disseminate crucial information to the public, the rapid, large-scale spread of misinformation online has become a major challenge.
Although social media analysis and monitoring are known to be critical to controlling health crises, neither the extent of the issue nor the sources of information have been studied in detail and there are currently no significant tools available for investigating COVID-19 misinformation on social media. Furthermore, despite communication being at the heart of COVID-19 public health efforts, there has been a surprising lack of input from linguistic experts.
The need to tackle misinformation is clear with initiatives such as the SHARE public information campaign and the Government-led Rapid Response Unit for tackling harmful COVID-19 narratives. However, the complexity of the ‘infodemic’ is not to be underestimated, with misinformation having multiple root causes and often having a basis in fact.
Online dashboards and data visualisations have been a stand-out feature of the COVID-19 crisis so far, but these have only covered the spread of the virus itself (number of cases and rate of infection). They have proved to be highly popular, with the John Hopkins University dashboard being visited more than 200 million times in the first 6 weeks and several similar dashboards following from WHO, GOV.UK, BBC and Google. We propose that a dashboard is needed specifically for tracking the hitherto underexplored online conversations surrounding COVID-19. This will provide valuable insights into information transmission and reception in an easily digestible manner.
No open-access dashboard like the one we propose currently exists, whether COVID-19 related or not. Instead, the tools available for working with Twitter are typically desktop-based (e.g. FireAnt, NodeXL), collect a limited amount of data (e.g. TAGS, Nexalogy) and are often locked behind paywalls for corporate use (e.g. Zoho Social). Our solution, an easy-to-use online dashboard, will allow tracking of the coronavirus conversation across Twitter and the web, as well as being extendable to future events.
Our dashboard will be released online for others to use to gain their own insights. In addition, the tool will be open-source and available via a Github repository for future development. To accompany the dashboard, a series of interactive tutorials will be produced. Our aim is for the dashboard to have uses far beyond this project and beyond academia. Thus the dashboard and tutorials will be designed with novice users in mind with clear explanations of how to use the system and how to interpret the results.
This project aims to build a large-scale dataset of Twitter posts, which will be made available via an open-access online dashboard incorporating intuitive visualisations. The dataset will be novel in capturing not just the content of tweets, but also the content of web-pages shared in the tweets. The content of web pages linked to from the tweets will be processed using the WebCorp technology to extract textual data.
We will then release an online open-access dashboard that visualises trends in word use, frequently shared web resources and propagation through social networks. Drawing on automated corpus linguistic methods and social network analysis, the dashboard will uncover the multi-layered content of shared information (original links, tweets, replies, retweets), alongside a deeper understanding of the online networks through which (mis)information is shared. The dashboard will include visualisations such as time-series, network graphs and maps, indicating for example frequencies of words, phrases, hashtags over time, the reception of the sources as evidenced by word use in tweets about the sources and the scale and reach of the networks within which the sources are being shared, based on retweets and replies.
To demonstrate the applicability of our novel approach to a wide range of stakeholders, the methodology and dashboard will be validated through two case studies, each focussing on a potentially dangerous area of miscommunication relating to COVID-19. These case studies will approach the problem from a linguistic perspective, examining the clarity and reception of official messaging and the trustworthiness of information sources.
Mapping how information has spread online, specifically the root sources and networks within which it is shared, is a powerful tool for assessing the authority and trustworthiness of information sources. By doing so at scale, we will be able to understand the perceptions of the wider public and discover common reactions and misunderstandings.