WebCorp: harnessing the web as linguistic resource in research, teaching and beyond

The WebCorp suite of tools is comprised of three versions of the software that build on one another to provide linguistic web analysis for research, teaching and beyond.

Webcorp large


Research background

The WebCorp suite of tools is comprised of three versions of the software that build on one another. The WebCorp tools have featured in over 1700 publications by researchers across disciplines, with a multitude of users worldwide.

WebCorp Live

WebCorp Live (released in December 2008 after several years of prototyping as WebCorp) was designed to test the hypothesis that the web could complement static offline text collections by providing evidence of rare, new and changing language use. Previous linguistic research relied on searches using the web interfaces of commercial search engines such as Google, but this required researchers to expend substantial effort visiting each web page manually to observe the linguistic patterns within which their search terms occur. WebCorp Live streamlines this approach by processing the results of commercial search engines, automatically accessing the web pages and producing examples of words and phrases with the level of detail required for linguistic study. With the ability to search in multiple languages, WebCorp Live has augmented language teaching and translation in over 180 countries.

WebCorp Live has been integrated into the search interface at Proz.com, the world’s largest community of translators, and recommended as a terminology checking tool by numerous online translation guides and language support groups, including the Terminology Coordination Unit of the European Parliament, which coordinates over 1200 translators. The fact that WebCorp Live lets translators check the acceptability of their wording not in static dictionaries but against real texts on the web facilitates translation into the second language of the translator, a practice traditionally fraught with danger and thus frowned upon.

In teaching, WebCorp Live has been included in courses all around the world covering Linguistics, English for Academic Purposes, Teaching English as a Foreign Language (TEFL) and Translation. WebCorp Live facilitates data-driven learning in the language classroom, allowing students to become active researchers, finding and evaluating examples of words and phrases on the web.


While WebCorp Live uses commercial search engines as gatekeepers to the web, the goal of its sister project, the WebCorp Linguist’s Search Engine (WebCorpLSE), was to build a bespoke large-scale collection of web-texts, and thus enable advanced linguistic and statistical analysis of the kind only possible in datasets of known size and composition. We developed linguistically-focused web processing, annotation and search tools and used these to build a large-scale representative sample of the web (a ‘miniweb’), capturing the distribution of document formats, subject domains and web-native text-types, as well as constructing specialist datasets of online news and blogs with their associated comments. WebCorpLSE was supported by EPSRC, HEFCE and AHRC grants.

The WebCorpLSE software was also used to introduce A-Level English Language students to empirical text study through an AHRC Knowledge Transfer Fellowship. In recent years, the subject criteria have been tightened, requiring more in-depth understanding of linguistic concepts and analytical techniques, and with an increased emphasis on independent learning. Our work provided students with access to a novel, state-of-the-art teaching and independent learning aid, and distilled a wealth of linguistic knowledge, gained from previous research projects in the field of Corpus Linguistics, into a form appropriate for A-level study. 

At present, the corpus linguistic approach is rarely employed at pre-university level, if at all. A-level students do not, for the most part, have understanding of or access to many automated analysis tools beyond the spelling/grammar checker in Microsoft Word. This work enriches the learning experience of A-level students by introducing them to WebCorpLSE. Through this, they learn to apply corpus linguistic techniques to their language studies and their independent research projects. WebCorpLSE also provides teachers and their students with a plentiful supply of authentic language data, relevant to all aspects of the A-level syllabus

We worked with a local partner school to optimise our approach for the new audience, developing new functionality, search interfaces and associated learning materials in response to ongoing feedback from teachers and students.

WebCorp Learn

In 2020, we expanded the work on data-driven language learning, adapting the WebCorpLSE technology and creating WebCorp Learn, a version optimised for interactive English language learning by non-native speakers. WebCorp Learn has been integrated into courses in German secondary schools through collaboration with the Teaching Solutions language consultancy.

WebCorp Learn enhanced English language learning in schools, helping teachers realise the language learning goals set out by the government for the use of digital language reference tools. In addition to providing the technology, we created teaching materials (videos and exercises) based on our corpus linguistic research, while Teaching Solutions delivered seminars and workshops to schools and individual teachers.

WebCorp Learn provides teachers with experience of linguistic tools, examples of real language use, and skills in data-driven learning, in turn enabling them to introduce their students to new methods and technologies. Such technologies were previously not used by teachers due to lack of knowledge, access, and ease of use, meaning that WebCorp Learn exposed students to linguistic theories and analysis techniques previously almost never done at that stage of language learning and teaching. Our software provided a new way of teaching vocabulary and improved the delivery and discussion of societal and media issues.

At a time when more teaching is happening online than ever before, there are considerable benefits for students and teachers in using WebCorp Learn The pre-formulated exercises and clear instructions are key benefits, as is the fact the online tool runs equally well on all systems, including mobile phones.