首页 > 代码库 > CCAE词频表(转)

CCAE词频表(转)

http://www.wordfrequency.info/

 

 

技术分享

Word frequency data

Corpus of Contemporary American English

 

 Purchase data 

Overview
Using the data
Compare 100k/60k

100,000 word list
  Samples
  Compare
  FAQ / questions

5,000-60,000 lemma lists
   Samples / formats
   Compare
   Free list (5,000)   

Spanish data
Portuguese data

Related sites
  Full-text data 
  Collocates
  N-grams
  WordAndPhrase
  Academic vocabulary
  corpus.byu.edu

Contact us 

 

 

This site contains what we believe is the most accurate frequency data of English, and it comes in a number of different formats (see samples: 100,000 and 60,000 word lists, and a comparison of the two lists).

For the 5,000-60,000 word lists, you can download a simple word list, frequency by genre, or as an eBook or a printed frequency dictionary. For the 100,000 word list, you can see detailed frequency information for many genres in several different corpora. In addition to word frequency data, you can also download up to 155 million n-grams, and 4.3 million collocates.

Any frequency list is only as good as the corpus (collection of texts) that it is based on. The 5,000-60,000 word lists are based on the only large, genre-balanced, up-to-date corpus of American English -- the 450 million word Corpus of Contemporary American English(COCA). The 100,000 word list supplements this COCA data with detailed frequency data from the 400 million word Corpus of Historical American English, the British National Corpus, and the Corpus of American Soap Operas (for very informal language).

Short samples (see more)

rank   lemma / word PoS frequency dispersion
7309   attic n 2711 0.91
17311   tearful j 542 0.93
27303   tailgate v 198 0.85
37310   hydraulically r 78 0.83
47309   unsparing j 35 0.83
57309   embryogenesis n 22 0.66
 
rank lemma / word PoS disp totFreq spok fic mag news acad M1 M2 N1 N2 A1 A2
25083 piglet n 0.88 239 20 97 54 46 22 10 2 3 3 0 2
25088 woodsman n 0.70 300 10 176 77 12 25 1 2 1 3 2 0
25090 candied j 0.87 242 17 49 102 73 1 0 1 2 1 0 0
25093 metacognitive j 0.69 306 0 0 0 0 306 0 0 0 0 0 0
25107 industry-wide j 0.89 236 16 2 64 109 45 19 10 2 1 10 6
25108 health-food j 0.85 246 10 19 154 55 8 6 4 7 1 0 2
25110 posterior n 0.88 240 6 30 36 27 139 0 5 4 0 0 99

 

CCAE词频表(转)