Matt Gee

Researcher in English Linguistics

School of English

Matt Gee develops research and teaching tools in the Research and Development Unit for English Studies (RDUES). This includes the creation of a fully-fledged search engine (WebCorp LSE), designed to treat the web as a source of linguistic data. WebCorp LSE can download, clean-up and present through a search interface texts from the web, especially for corpus linguistic style analysis (including wildcard and part-of-speech search, concordancing, collocation and change over time).

He is currently developing eMargin. A web-based tool which replicates the practice of close reading by allowing the annotation and discussion of digital texts. eMargin shall focus on literary studies in the first instance, but the potential applications are much wider and shall be explored as part of the project.

He also maintains and develops tools from previous and ongoing RDUES projects, including WebCorp Live, Repulsion and ACRONYM.


Cardiff University: BSc (Hons) - Computer Science


Matt is interested in all aspects of Corpus Linguistics, in particular the use of the Web as a source of natural language data. He is also interested in collaborative e-Learning and research tools.


2013 with Kehoe, A. eMargin: A Collaborative Textual Annotation Tool. Ariadne, Issue 71.

2012 with Kehoe, A. Reader comments as an aboutness indicator in online texts: introducing the Birmingham Blog Corpus in S. Oksefjell Ebeling, J. Ebeling and H. Hasselgård (eds.) Studies in Variation, Contacts and Change in English Volume 12: Aspects of Corpus Linguistics: Compilation, Annotation, Analysis, University of Helsinki e-journal.

2011 with Kehoe, A. Social Tagging: A new perspective on textual 'aboutness' in P. Rayson, S. Hoffmann and G. Leech (eds.) Studies in Variation, Contacts and Change in English Volume 6: Methodological and Historical Dimensions of Corpus Linguistics, University of Helsinki e-journal.

2009 with Kehoe, A. 'Weaving web data into a diachronic corpus patchwork' in A. Renouf and A. Kehoe (eds.) Corpus Linguistics: Refinements and Reassessments, Amsterdam: Rodopi.

2007 with Kehoe, A. New corpora from the web: making web text more 'text-like' in Pahta, P., I. Taavitsainen, T. Nevalainen and J. Tyrkkö (eds.) Towards Multimedia in Corpus Studies, electronic publication, University of Helsinki e-journal.