This paper includes information about corpus linguistics, its connection with lexicology and translation. The latter is the most important one and I am keen on finding and introducing something which is mainly connected with my future profession. Frankly speaking that was not an easy journey but I am hopeful it is destined to be successful. A corpus is an electronically stored collection of samples of naturally occurring language. Most modern corpora are at least 1 million words in size and consist either of complete texts or of large extracts from long texts. Usually the texts are selected to represent a type of communication or a variety of language; for example, a corpus may be compiled to represent the English used in history textbooks, or Canadian French, or Internet discussions of genetic modification. Corpora are investigated through the use of dedicated software. Corpus linguistics can be regarded as a sophisticated method of finding answers to the kinds of questions linguists have always asked. A large corpus can be a test bed for hypotheses and can be used to add a quantitative dimension to many linguistic studies. It is also true, however, that corpus software presents the researcher with language in a form that is not normally encountered and that this can highlight patterning that often goes unnoticed. Corpus linguistics has also, therefore, led to a reassessment of what language is like. During this journey we will try to find out;

What is Corpus Linguistics
Corpus Linguistics Terms and Their Meanings
History of Corpus Linguistics
Resources and Methodologies for Corpus Linguistics, Corpora
Corpus Linguistics and Linguistic Theory, Corpus-Based Descriptions So fasten the seat belts we are flying!
What is Corpus Linguistics?
Corpus linguistics is a study of language and a method of linguistic analysis which uses a collection of natural or “real word” texts known as corpus. Corpus linguistics is used to analyse and research a number of linguistic questions and offers a unique insight into the dynamic of language which has made it one of the most widely used linguistic methodologies. Since corpus linguistics involves the use of large corpora that consist of millions or sometimes even billion words, it relies heavily on the use of computers to determine what rules govern the language and what patters (grammatical or lexical for instance) occur. Thus it is not surprising that corpus linguistics emerged in its modern form only after the computer revolution in the 1980s. The Brown Corpus, the first modern and electronically readable corpus, however, was created by Henry Kucera and W. Nelson Francis as early as the 1960s. Corpus Linguistics Terms and Their Meanings

Corpus (plural corpora). It refers to a collection of systematically or randomly collected texts of natural language which is electronically stored and processed. Corpus can consist of texts in a single or multiple languages. It contains a large number of texts which allow the researchers to analyse linguistic rules but the corpus does not represent the entire language, no matter how large it is. Multilingual corpus. Like its name suggests, multilingual corpus consists of texts in multiple languages. Parsed corpus (treebank). It is a collection of texts in naturally occurring language in which each sentence is parsed - syntactically analysed and annotated. Syntactic analysis is typically given in a tree-like structure which is why parsed corpus is also known as treebank. Parallel corpora. The term refers to a collection of texts which are translations of each other. Annotation. It refers to an extension of the text by addition of various linguistic information....
