AAC Admin: One of the core missions of the All About Corpora website is to provide a platform for independent researchers and research students who are not as visible to the academic community, when compared to some of the more prominent corpus linguists. The ‘Rising Star’ blog articles, alongside the researcher profiles section is one of the main ways we aim to achieve this.
Written by Ahdi Hassan, CEO of Pakistani Languages Corpora
Ahdi Hassan is the CEO of the Pakistani Languages Corpora (PLC) and is an independent scholar based in Pakistan.
The PLC is a system designed for linguists, lexicologists, lexicographers, researchers, translators, terminologists, teachers and students working with Pakistani languages to easily discover what is typical and frequent in the language. The PLC provides a new way of research, where you can easily access researchable data for educators, students and researchers in Pakistani regional languages.
The PLC software tool is fast and efficient and offers researchers a wide range of multilayer access to analyze word frequencies, collocates, keywords, grammatical/lexical labeling, statistical figures, semantic, speech input, as well as many other features. Researchers can easily carry out samples of research through statistical data analysis, as opposed to relying on questionnaires.
It is hoped that PLC will soon become an open source corpus tool and helps in the further development of Pakistani languages, with its features being modified and simplified for the end-user. The development of corpora of spoken languages data is a complex task and it requires careful planning during transcription. PLC is providing new ways of research and a new perspective on corpus tools that will lead to further growth in the field of computational linguistics and corpus linguistics. A real world example of such tool, software and web is developing for the current and future generations.
The objectives of the PLC have been defined as follows:
- To create an authentic source for storing data of spoken and written languages in electric form for communicative and research purposes
- To develop orthography of unwritten languages
- To preserve all languages and mainly those which are near to extinction
- To revitalize endangered languages
- To suggest new innovative language teaching methods
- To create appropriate materials for any language teaching/learning
- To build appropriate frameworks for teaching/learning grammar and vocabulary
- To create an approach for computational linguistics in Pakistan
- To develop the corpus of spoken language data
- To develop new ways of searching and using data to aid research
- To develop Enterprise level software and web Application of natural language and speech processing, for language researchers, teachers and learners that will be lead to continued growth of corpora and field as whole
KEY FEATURES of the PLC
Concordance is a useful tool for investigating corpora. It is limited by the ability of the human observer to process information.
Advance Concordance provide options like semantic relatedness, semantic opposition, lematization, bilingual transitions & transformations so that you can analyzing text upto advance level.
Evaluating Statistical results from the keyword in context (KWIC) in monolingual and bilingual mode. It also perform normalization and generates visuals e.g. graphs
Extraction of words that co-occur in a sentence more frequently than by chance. It calculates MI and display visual results. It also supports Bilingual mode.
If you need corpus other then the prebuild corpus given by us you can also create and manage your own corpus for your specific research. It has all the tools for Corpus indexing and management.
Get the list of all words used in literature. Custom limits will help you in finding more specific words.
* Part of Speech tagging is available in all functions where context search in involved eg Concordance, Adv. Concordance…
Advance Concordance works same as concordance but with much more advance functions and tools. It can target two languages at a time giving researcher the options to conduct bilingual research for example analyzing English and Urdu or English Spanish at the same time vice versa. It has some advance tools for corpus research like lemmatisation, Semantic Similarity and Semantic Opposition analysis which allows the researcher to find KEY WORD IN CONTEXT up to much deeper level involving the semantics and lemma forms.
Further information about the PLC can be found on their website: http://www.plcorpora.com/