【词汇大师】计算(2/2)
什么都是得搞数据说话的,数据信息的准确性决定了一件事情。
TIPS
本篇听写只听写嘉宾的话,主持人的话不需要听写。听仔细些。
主持人的话①AA: "So now you describe this basically as an 88,000-word-long sentence, starting with the word 'the,' the most frequently used word in the English language. What's at the other end?"
HINTS
conquistador
PS:忽略口语词you konw
http://t1.g.hjfile.cn/listen/201306/201306060950454782748.mp3The frequency is data that is not generated by me. The frequency data was all coming from this source data that I used, which is the British National Corpus and that is a collection of written and spoken English words that were collected over a few years, I think back in the mid-1990s, by this group in England. It is a little bit dated; I have found one word that people are often surprised does not appear at all in the archive is blog. So clearly the phenomenon of Web logging came up after this data was collected.
The other end is surprising, and this is a big point of contention for a lot of people that actually find what the last word is. But the last word, surprisingly or not, is conquistador. And if you look through the list and you spend some time with it, you will find that there are many words much, much further in front of conquistador that you've never even heard of. So clearly there seems to be some errata in their data.频率是我不产生数据。高频数据都来自,我用这个数据源,这是英国国家语料库,是一个集合的书面和口头英语单词收集了几年,我觉得早在20世纪90年代中期,这个集团在英国。这是一个有点过时,我已经找不到一个字,人们常常惊讶不会出现在所有的归档博客。如此明确的Web日志记录的现象后想出这个数据收集。
另一端是见怪不怪了,很多人,实际上找到的最后一个字是什么,这是一个很大的争论点。但最后一个字,令人惊讶的与否,是征服者。如果你看的列表,你花一些时间,你会发现,有很多的话很多,很多进一步征服者面前,你甚至从来没有听说过。所以很明显,似乎有一些在他们的数据勘误表。
页:
[1]