Tuesday, May 24, 2011

COCA (The Corpus of Contemporary American English)

COCA is an amazing resource for determining the frequency of word usages within the modern American English context.

"The corpus contains more than 425 million words of text and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts. It includes 20 million words each year from 1990-2011 and the corpus is also updated once or twice a year (the most recent texts are from March 2011). Because of its design, it is perhaps the only corpus of English that is suitable for looking at current, ongoing changes in the language (see the 2011 article in Literary and Linguistic Computing). The interface allows you to search for exact words or phrases, wildcards, lemmas, part of speech, or any combinations of these.  You can search for surrounding words (collocates) within a ten-word window (e.g. all nouns somewhere near faint, all adjectives near woman, or all verbs near feelings), which often gives you good insight into the meaning and use of a word. 

The corpus also allows you to easily limit searches by frequency and compare the frequency of words, phrases, and grammatical constructions, in at least two main ways:
  • By genre: comparisons between spoken, fiction, popular magazines, newspapers, and academic, or even between sub-genres (or domains), such as movie scripts, sports magazines, newspaper editorial, or scientific journals
  • Over time: compare different years from 1990 to the present time
You can also easily carry out semantically-based queries of the corpus. For example, you can contrast and compare the collocates of two related words (little/small, democrats/republicans, men/women), to determine the difference in meaning or use between these words.  You can find the frequency and distribution of synonyms for nearly 60,000 words and also compare their  frequency in different genres, and also use these word lists as part of other queries. Finally, you can easily create your own lists of semantically-related words, and then use them directly as part of the query."

The site, part of the Brigham Young University Site, gives a five minute tutorial for additonal information about usinf this resource. CHECK IT OUT!

1 comment:

  1. Really nice annotation, Sarah. I like your suggestion for comparisons. I also posted some new YouTube videos today on how to use COCA -- I learned a lot today!

    P.S.: The red font in your blog is hard to read against the background.