Exploring the instances of language whereby pairs of words are discouraged from appearing together, or are 'repulsed' from one another.
The metaphor of 'attraction' from Physics is established in linguistics. It characterises the situation whereby a word is not evenly or randomly distributed across texts but is found close to its preferred word partners (or 'collocates') in certain textual positions. In our previous work, the focus has been on the circumstances in which words significantly prefer each other’s company, whether in adjacent pairings (span of 1 word to the left and right) or discontinuous phrasal or grammatical frameworks (e.g. span of 4 words to the left and right). In the AVIATOR project, we built a 'collocational profile' for each word and used statistics to identify the most significant collocates.
In this project, we tested the hypothesis that there is another 'force', which we called 'repulsion', that operates on the construction of text in the opposite way. By repulsion, we mean the system of conventional language use which discourages certain pairs of words from occurring together; for instance, it is normal in English to say 'Happy Christmas', 'Merry Christmas' and 'Happy Birthday', but not 'Merry Birthday'. The goal of the study was to look at the reasons behind these and more complex examples to establish whether and how consistently the force operates, and whether it holds two words at a measurable distance from each other.
Absence of attraction is not the same as repulsion. We wished to establish a measure to differentiate between the non-co-occurrence of two words simply because they have no particular association, and actual repulsion.
This project has revealed facts about an unexplored but fundamental and potentially exploitable aspect of language in use. Synonyms differ in meaning according to the particular functions they each fulfil, their frequency of occurrence in text, their range of senses, and the types of context in which they typically occur. We have confirmed that synonyms actively repel some of each other's collocates wherever they differ in these aspects.
We discovered that collocational differences in the behaviour of two synonyms, which have until now been thought just to be arbitrary and conventional, are in fact systematic and explicable. This discovery is important for language teaching, research and Natural Language Processing, because it will certainly lead to the provision of hitherto unavailable information about the lexicon which is finer-grained, objective and accessible.
The IT applications for this new measure complement and supplement the use of collocation measures. While collocation can indicate which word is the normal (correct) choice for a given context, 'repulsion' will pick up on unusual (incorrect) choices of words, serving as a tool to identify errors and evaluate suggested choices in drafts by writers of text and international users of English. This function has also been included in the latest version of WebCorp Learn, where the Compare tool shows the repulsion between collocates of two words.
See The search for repulsion: a new corpus analytical approach by A. Renouf & J. Banerjee for more details.