Hansi Hettiarachchi

Doctoral researcher

School of Computing and Digital Technology - CDT

Email:: hansi.hettiarachchi@mail.bcu.ac.uk

Hansi successfully completed her BSc (Hons) in Computer Science and Engineering at University of Moratuwa, Sri Lanka in 2016 with a first-class.

After completing the bachelor's degree, she joined the Research and Development Division of CodeGen International (Pvt) Ltd., Sri Lanka as a Software Engineer. Hansi could obtain over three years of experience in the industrial research and development projects which deal with machine learning and natural language processing techniques before join with Birmingham City University.

Currently, Hansi is a PhD research student at the School of Computing and Digital Technology at Birmingham City University.

Areas of Expertise

Qualifications

Research

Word Embedding-based Event Detection and Tracking System for Social Media Data Streams

Social media services like Twitter, Facebook, Snapchat, etc. are becoming more popular day by day. As a result of this huge popularity, these services produce a vast amount of data which may contain information like opinions, newsworthy contents, general status, personal updates, etc.

But, the interest in data or their importance to users mainly depends on the communities or parties to which they belong. For an example, newsworthy contents will be the most important information for news producers. In contrast to this, personal updates of an actor will be important for his followers.

However, it is difficult to analyse all the data produced by social media, because of its high volume and dynamic nature, to extract the necessary details. Therefore, automated event detection and tracking is a key requirement for social media data extraction.

Contents of social media documents are different from the contents of traditional documents. In the majority, the main idea of posting on social media is giving an update to an audience quickly. Thus, the writer tries to shrink the content while highlighting the important words. This pattern of writing does not consider about the grammatical and spelling correctness of the sentences. And, there is a high tendency to introduce new words to express the feelings like excitement, anger, etc.

Therefore, unlike traditional documents, a high portion of social media documents contains grammatically incorrect text phrases with irregular words. Thus, traditional topic modelling approaches will not perform well on social media contents.

This research will focus on a word embedding-based approach for event detection and tracking. The idea expressed in the text need to be properly understood to extract the events described in the text. Thus, both syntactical and semantical features need to be considered while learning the word embeddings, because those features are similarly important to understand the text.

Also, it is necessary to maintain relationships between regular words and their extended or modified versions in order to understand irregular text. This research suggests improvements on existing event detection and tracking solutions using properly learned word embeddings which capture syntactical and semantical features of the text while handling the errors and modified words in the context of social media data.

Hansi Hettiarachchi

Doctoral researcher

School of Computing and Digital Technology - CDT

Areas of Expertise

Qualifications

Research

Postgraduate Supervision

Publications

Links and Social Media