Written by Lynne Bowker, University of Ottawa

I was recently very fortunate to have been honoured as the 2018 winner of the ALISE/Proquest Methodology Paper Competition awarded by the Association for Library and Information Science Educators (ALISE). The paper is entitled “Corpus Linguistics: It’s not just for linguists!” and my goal in writing it was to demonstrate that corpus linguistics is a methodology that can be useful for researchers in fields beyond linguistics, including library and information science (LIS).

Although my background and education lie firmly in applied linguistics, I have become more active in LIS in recent years, and I currently hold a cross-appointment between the University of Ottawa’s School of Translation and Interpretation and its School of Information Studies. As an interdisciplinary researcher, I was inspired to write the paper after reading an article by Veronica Gauchi Risso (2016) in which she explored research methods used in LIS during the 40-year period between 1970 and 2010 and concluded that LIS researchers were largely creatures of habit who strongly favoured survey-based research. I was particularly struck by her statement in which she implored “LIS needs new methodological developments, which should combine qualitative and quantitative approaches” (74). Aha, I thought to myself, corpus linguistics could be just the ticket!

Following a description of some of the main techniques used in corpus linguistics, including frequency counts, keywords, collocation generation, KWIC concordancing, and semantic prosody investigation, I suggested a number of areas in which corpus-based methods could be usefully adopted in LIS. For example:

  • Frequency data drawn from a corpus can help LIS professionals to better understand the literary warrant of a given text collection, which is important when establishing indexing languages. For instance, terms that appear very infrequently and with scant literary warrant can be filtered out.
  • The notion of keyness in corpus linguistics is directly pertinent to the notion of aboutness in LIS, and corpus linguistics techniques for identifying keynesscan be usefully integrated into automatic indexing systems or recommender systems.
  • In LIS, understanding the relations between concepts is important for developing classification schemes and ontologies. For instance, in controlled vocabularies, broader terms and narrower terms are used to indicate hierarchical relations. Meanwhile, in the context of information retrieval applications, semantic relations help the user to browse concept systems for appropriate search terms and enable query expansion for knowledge discovery. Corpus-based techniques allow semantic relations to be extracted from corpora using lexical knowledge patterns.
  • Several LIS researchers have employed content analysis (CA) as a methodology, often carrying out the analyses manually, relying on a close reading of the material under study and deriving prevalent themes from it through interpretative methods. Corpus linguistics techniques can offer a powerful complement to content analysis; for instance, researchers can first use frequency measures to determine which words and collocations are significant in the corpus, and then proceed to use KWIC concordances to study these occurrences in more detail. Moreover, a corpus-based approach can also be iterative in that these qualitative findings can in turn become a source for further quantitative investigation. For instance, since concordance analysis looks at a known number of concordance lines, the findings of the qualitative analysis can be grouped (e.g. themes relating to a specific word) and then quantified in absolute and relative terms to identify possible patterns (e.g. the tendency of a particular words to be associated with particular themes).While corpus methods alone may not be sufficient for research in LIS, they can be used to help triangulate the findings of other methods, such as CA.

It has been over twenty-five years since corpus-based methods took a firm hold in the field of linguistics, and it seems safe to say that corpus linguistics is not just for linguists anymore. I hope this blog entry will stimulate discussion about the potential of corpus techniques for enhancing research in other fields also.Please share your experiences and thoughts about how corpus linguistics techniques have – or could – benefit other fields of research!