Word Embedding-based Event Detection and Tracking System for Social Media Data Streams
Social media services like Twitter, Facebook, Snapchat, etc. are becoming more popular day by day. As a result of this huge popularity, these services produce a vast amount of data which may contain information like opinions, newsworthy contents, general status, personal updates, etc.
But, the interest in data or their importance to users mainly depends on the communities or parties to which they belong. For an example, newsworthy contents will be the most important information for news producers. In contrast to this, personal updates of an actor will be important for his followers.
However, it is difficult to analyse all the data produced by social media, because of its high volume and dynamic nature, to extract the necessary details. Therefore, automated event detection and tracking is a key requirement for social media data extraction.
Contents of social media documents are different from the contents of traditional documents. In the majority, the main idea of posting on social media is giving an update to an audience quickly. Thus, the writer tries to shrink the content while highlighting the important words. This pattern of writing does not consider about the grammatical and spelling correctness of the sentences. And, there is a high tendency to introduce new words to express the feelings like excitement, anger, etc.
Therefore, unlike traditional documents, a high portion of social media documents contains grammatically incorrect text phrases with irregular words. Thus, traditional topic modelling approaches will not perform well on social media contents.
This research will focus on a word embedding-based approach for event detection and tracking. The idea expressed in the text need to be properly understood to extract the events described in the text. Thus, both syntactical and semantical features need to be considered while learning the word embeddings, because those features are similarly important to understand the text.
Also, it is necessary to maintain relationships between regular words and their extended or modified versions in order to understand irregular text. This research suggests improvements on existing event detection and tracking solutions using properly learned word embeddings which capture syntactical and semantical features of the text while handling the errors and modified words in the context of social media data.
Emoji Powered Capsule Network to Detect Type and Target of Offensive Posts in Social Media Authors - Hansi Hettiarachchi, Tharindu Ranasinghe (Publication in RANLP 2019, Bulgaria) https://www.aclweb.org/anthology/R19-1056/
Machine Learning Approach to Recognize Subject Based Sentiment Values of Reviews Authors - N.M. De Mel, H.H. Hettiarachchi, W.P.D. Madusanka, G.L. Malaka, A.S. Perera, U. Kohomban (Publication in 2016 Moratuwa Engineering Research Conference (MERCon), Sri Lanka) https://ieeexplore.ieee.org/document/7480107
BRUMS at HASOC 2019: Deep Learning Models for Multilingual Hate Speech and Offensive Language Identification Authors - T Ranasinghe, M Zampieri, H Hettiarachchi (Publication in 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019)) http://ceur-ws.org/Vol-2517/T3-3.pdf