Dr Andrew Kehoe

Associate Professor, Deputy Head of School, MA Course Director

School of English

Andrew Kehoe is Deputy Head of School, and Director of the Research and Development Unit for English Studies (RDUES). He studied at the University of Liverpool, gaining qualifications in both English and Computer Science. He researches in the field of Corpus Linguistics: the creation and study of a collection of texts (or corpus) in order to extract new knowledge about language in use. Andrew’ s particular emphasis is on the use of the web as a source of natural language data and on the development of software tools to facilitate this.

Andrew was lead developer on the WebCorp project, creating a system which has been used by hundreds of thousands of researchers and teachers worldwide, as well as the general public. He has co-edited two volumes on Corpus Linguistics, and has published a series of articles and chapters which have explored in depth the nature of web texts and the issues involved in extracting linguistic examples from them. Andrew has extensive experience in all aspects of research planning and management. For over a decade he worked on a series of externally-funded research projects. In addition to his work on the WebCorp project (EPSRC, 2000-3), he was Research Associate on the APRIL neologism project (EPSRC, 1999-2000) and SHARES document similarity project (EPSRC,2001-4).

After moving to BCU in July 2004, he was Researcher Co-investigator on the WebCorpLSE and Repulsion projects (EPSRC, 2006-8 and 2006-7 respectively) and co-author of an AHRC Knowledge Transfer Fellowship introducing WebCorpLSE to A-Level students (2009-11). He became Director of RDUES in August 2010 and has since led two Jisc-funded projects developing eMargin: a collaborative annotation tool. Intended originally for the close reading of literary texts, eMargin now has over 6000 registered users across disciplines worldwide.

Andrew was approached by the advertising agency Grey London to work as a linguistic consultant on a product launch campaign by Procter & Gamble, manufacturer of a new range of fragrances licensed under the Puma brand. The campaign was targeted at consumers aged 14-25 and aimed to raise awareness of the new product through a social media campaign. The agency’s idea was to allow consumers to write a message to a friend, which would then be translated into a video showing dance moves related to the content of the message. Andrew’s specific task was to determine which words were likely to occur most frequently in social media communication between young people. This research fed into the Dance Dictionary website, accompanied by television advertisements across Europe. Andrew is currently exploring further opportunities for commercial engagement.

Andrew is an elected member of the executive committee of University English and the executive board of the International Computer Archive of Modern and Medieval English (ICAME). He led the School of English return to REF2014, and is a member of the ESRC Peer-review College. Andrew is Course Director of the School's distance-learning MA in English Linguistics.


PhD in Linguistics, Birmingham City University, 2016.

MSc with distinction in Information Systems, University of Liverpool, 1999.

BA (Hons), 1st Class in English Language and Literature, University of Liverpool, 1998.


Andrew is an elected member of the Executive Board of the International Computer Archive of Modern and Medieval English (ICAME) and the Executive Committee of University English.


Andrew teaches on the School's distance-learning MA in English Linguistics, of which he is also Course Director.


Andrew has research interests in all aspects of Corpus Linguistics, including the development of software tools for the identification and visualisation of language change across time. He has a particular interest in the use of the web as a source of natural language data and has expertise in the areas of search engine design, topic detection and indexing, web document formats, and the extraction of authorship date from web documents.

Andrew has recently begun a collaboration with the Academic Planning department at BCU on the analysis of feedback received through the National Student Survey (NSS). The NSS makes a significant contribution to university rankings in national league tables and many institutions are developing increasingly sophisticated methods for analysing its results.However, much of the emphasis has been on the multiple choice questions, with relatively little attention paid to the free-text answers where students can give detailed comments on positive and negative aspects of their course. To offer an enhanced analysis, Andrew and colleague Matt Gee are developing a user-friendly ‘dashboard’ interface to WebCorpLSE which will provide non-specialists with new linguistic insights into NSS comments.

Grants Awarded

2012-13 JISC Embedding Benefits Grant - Integration of eMargin with Virtual Learning Environments (Project Manager)

2011-12 JISC Learning and Teaching Innovation Grant - eMargin: an online collaborative textual annotation resource (Project Manager)

Recent Invited Talks

2016 'Pushing the Boundaries of Corpus Linguistics: New Approaches and New Audiences'. Keynote lecture at annual Birmingham English Language Postgraduate (BELP) conference, University of Birmingham, April 22.

2015 Reader comments on online news articles: a corpus-based analysis. CRAL Corpus Linguistics Workshop, University of Nottingham, February 20.

2014 "Your blog is (the) shit" - the role of context in the analysis of swearing in blogs (with Ursula Lutzky), English Department Research Seminar, University of Liverpool, December 10.

2014 Reader comments on online news articles: a corpus-based analysis. English Department Research Seminar, University of Liverpool, May 21.

2013 The role of context in the analysis of swearing in blogs (with Ursula Lutzky). Workshop on politeness and impoliteness in digital communication: Corpus-related explorations. ESRC Centre for Corpus Approaches to Social Science, Lancaster University, September 20.

2012 eMargin and Linguistic Analysis. UCREL Corpus Research Seminar, Lancaster University, December 6.

2012 eMargin and Text Annotation, AHRC Hidden Collections Doctoral Training Programme, University of Nottingham, November 23.

2012 eMargin in Literary Study, HEA Workshop, University of Leicester, July 5.

2012 Introduction to eMargin, Digital Conversations Workshop, British Library, March 30.

Past Projects

2009-11 Introducing A-Level English Language students to empirical text study using the WebCorp Linguist's Search Engine (AHRC Knowledge Transfer Fellowship) Research Associate / Co-author

2006-08 WebCorp Linguist's Search Engine (EPSRC / HEFCE-SRIF) Technical Lead

2006-07 Repulsion: The investigation of an organising force in text (EPSRC) Researcher Co-investigator / Software Developer

2001-04 SHARES: System of Hypermatrix Analysis, Retrieval, Evaluation and Summarisation (EPSRC) Research Associate / Software Developer

2000-01 WebCorp: The Web as Corpus (EPSRC) Research Assistant / Software Developer

1999-2000 APRIL: Analysis and prediction of innovation in the lexicon (EPSRC) Research Assistant / Software Developer


2009 with Renouf, A. (eds.) Corpus Linguistics: Refinements and Reassessments, Amsterdam: Rodopi.

2006 with Renouf, A. (eds.) The Changing Face of Corpus Linguistics, Amsterdam: Rodopi.


2009 with Gee, M. Weaving Web data into a diachronic corpus patchwork in A. Renouf and A. Kehoe (eds.) Corpus Linguistics: Refinements and Reassessments, Amsterdam: Rodopi.

2006 Diachronic Linguistic Analysis on the Web with WebCorp in A. Renouf and A. Kehoe (eds.) The Changing Face of Corpus Linguistics, Amsterdam: Rodopi.

2006 with Renouf, A. and J. Banerjee WebCorp: an integrated system for web text search, in M. Hundt, N. Nesselhauf and C. Biewer (eds.), Corpus Linguistics and the Web, Amsterdam: Rodopi.

2004 with Renouf, A. and D. Mezquiriz "The Accidental Corpus: Some Issues in Extracting Linguistic Information from the Web", in K. Aijmer and B. Altenberg (eds.) Advances in Corpus Linguistics, Amsterdam: Rodopi.

Journal Articles

2017 with Lutzky, U. "I apologise for my poor blogging": Searching for Apologies in the Birmingham Blog Corpus. Corpus Pragmatics. pp. 1-20. ISSN 2509-9507

2016 with Lutzky, U. “Oops, I didn't mean to be so flippant” A corpus pragmatic analysis of apologies in blog data, Elsevier, Special issue of the Journal of Pragmatics on Adaptability in New Media, forthcoming.

2016 with Lutzky, U. ”Your blog is (the) shit”: a corpus linguistic approach to the identification of swearing in computer mediated communication. International Journal of Corpus Linguistics 21:2, 165-191.

2013 with Renouf, A. Filling the gaps: Using the WebCorp Linguist's Search Engine to supplement existing text resources. International Journal of Corpus Linguistics 18:2, 167-198.

2013 with Gee, M. eMargin: A Collaborative Textual Annotation Tool. Ariadne, Issue 71.

2012 with Gee, M. Reader comments as an aboutness indicator in online texts: introducing the Birmingham Blog Corpus in S. Oksefjell Ebeling, J. Ebeling and H. Hasselgård (eds.) Studies in Variation, Contacts and Change in English Volume 12: Aspects of Corpus Linguistics: Compilation, Annotation, Analysis, University of Helsinki e-journal.

2011 with Gee, M. Social Tagging: A new perspective on textual 'aboutness' in P. Rayson, S. Hoffmann and G. Leech (eds.) Studies in Variation, Contacts and Change in English Volume 6: Methodological and Historical Dimensions of Corpus Linguistics, University of Helsinki e-journal.

2007 with Gee, M. New corpora from the web: making web text more 'text-like' in P. Pahta, I. Taavitsainen, T. Nevalainen and J. Tyrkkö (eds.) Towards Multimedia in Corpus Studies, electronic publication, University of Helsinki.


2010 Review article on 'ConcGram 1.0' software, in ICAME Journal: Computers in English Linguistics, No. 34, April 2010.

Conference Proceedings

2005 with Renouf, A. and J. Banerjee The WebCorp Search Engine: a holistic approach to Web text Search in Proceedings from the Corpus Linguistics Conference Series, Vol. 1, no.1, University of Birmingham.

2004 with Renouf, A. 'Textual Distraction as a Basis for Evaluating Automatic Summarisers', in M.T. Lino et al (eds.) Procedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Paris: ELRA, Vol IV pp. 1347-1350.

2003 with Morley B. and A. Renouf Linguistic Research with the XML / RDF aware WebCorp Tool. World Wide Web 2003 Conference, Budapest.

2002 with Renouf, A. WebCorp: Applying the Web to Linguistics and Linguistics to the Web. World Wide Web 2002 Conference, Honolulu, Hawaii.


2010 The Birmingham Blog Corpus (with Matt Gee and Ursula Lutzky)

2000-ongoing WebCorp software and user guide.

2000 APRIL (Analysis and Prediction of Innovation in the Lexicon) project software, databases and web front-end.

1999 Discourse Tree Manipulation Algorithms: Using Rhetorical Structure Theory to Restructure and Summarise Texts, MSc Dissertation, University of Liverpool (with accompanying C++ software).

Work With Industry

Andrew worked as linguistic consultant to the Grey London communications agency on behalf of the fashion brand Puma and their fragrance partner Procter & Gamble. This work resulted in the creation of the critically acclaimed Puma Dance Dictionary website and accompanying Europe-wide TV advertising campaign to launch the Puma Sync fragrance range.

Links and Social Media