The term CORPUS (pl. CORPORA) is derived from Latin and literally means body and within the field of medicine it is used to refer to either the body as a whole or the main part of the organ.

However, the definition of corpus that is of most relevant to us, is provided by the Cambridge dictionary, which defines corpus as :

a collection of written or spoken material stored on computer and used to find out how language is used


Corpus Linguistics refers specifically to the study of language that is present within a corpus. A more comprehensive definition of corpus linguistics is provided by McEnery and Hardie (2011:1), who mention that:

It is certainly quite distinct from most other topics you might study in linguistics, as it is not directly about the study of any particular aspect of language. Rather, it is an area which focuses upon a set of procedures, or methods, for studying language (although, as we will see, at least one major school of corpus linguists does not agree with  the characterisation of corpus linguistics as a methodology). The procedures themselves are still developing, and remain an unclearly delineated set – though some of them, such as concordancing, are  well established and are viewed as central to the approach.


If you are completely new to the study of Corpus Linguistics, it can sometimes be a daunting task to decide where exactly you should begin when deciding what is the best book for you to read to get a good grounding of what exactly a corpus study entails.

There is a huge selection of books on the market written by the most prominent names in Corpus Linguistics and they all have contributed greatly to the development of aspiring corpus linguits and early-career academics. Although there is no correct answer as to what exactly is the best book for a complete beginner, we at AAC have put together a list of some reading materials that we feel would be most effective when starting your journey to corpus enlightenment:

Meyer, C. F. (2002). English corpus linguistics: An introduction. Cambridge University Press.

Meyer’s book provides a comprehensive breakdown of all the steps a corpus linguist would go through before, during and after the process of creating a corpus.

Baker, Paul and Hardie, Andrew and McEnery, Tony (2006) A glossary of corpus linguistics. Edinburgh University Press

Baker et al. provide an extremely comprehensive glossary of all the key terms related to Corpus Linguistics, which covers not only terminology that is specific to corpus methodology, but also the names of corpora and corpus software

McEnery, T., & Hardie, A. (2011). Corpus linguistics: Method, theory and practice. Cambridge University Press.

Although this book is not exactly suited for complete beginners, it was the first book I had personally read when I intially entered into the field of Corpus Linguistics. Additionally, I am sure all corpus linguists will agree that this book is a must have for anybody interested in Corpus Linguistics.

Hunston, S. (2002). Corpora in Applied linguistics. Cambridge University Press.

No list of corpus-related reading material would be complete without mentioning Susan Hunston, who has made an immense contribution to the field of Corpus Linguistics. However, her inclusion on this list is not only a token gesture. Although her book goes into some more complex areas in the later chapters, the first chapter of the book is a must read for all those who are new to Corpus Linguistics.