正文內(nèi)容

考研英語閱讀理解基本素材經(jīng)濟學(xué)人科技類-資料下載頁

2025-06-07 22:16本頁面

　　

【正文】 sounds right and wrong to people, what they actually write and say, is the linguist39。s raw material.But that raw material is surprisingly elusive. Getting people to speak naturally in a controlled study is hard. Eavesdropping is difficult, timeconsuming and invasive of privacy. For these reasons, linguists often rely on a “corpus” of language, a body of recorded speech and writing, nowadays usually puterised. But traditional corpora have their disadvantages too. The British National Corpus contains 100m words, of which 10m are speech and 90m writing. But it represents only British English, and 100m words is not so many when linguists search for rare usages. Other corpora, such as the North American News Text Corpus, are bigger, but contain only formal writing and speech.Linguists, however, are slowly ing to discover the joys of a free and searchable corpus of maybe 10 trillion words that is available to anyone with an internet connection: the world wide web. The trend, predictably enough, is prevalent on the internet itself. For example, a group of linguists write informally on a weblog called Language Log. There, they use Google to discuss the frequency of nonstandard usages such as “far from” as an adverb (“He far from succeeded”), as opposed to more standard usages such as “He didn39。t succeed—far from it”. A search of the blogitself shows that 354 Language Log pages use the word “Google”. The blog39。s authors clearly rely heavily on it.For several reasons, though, researchers are wary about using the web in more formal research. One, as Mark Liberman, a Language Log contributor, warns colleagues, is that “there are some mean texts out there”. The web is filled with words intended to attract internet searches to gambling and pornography sites, and these can muck up linguists39。 results. Originally, such sites would contain these words as lists, so the makers of Google, the biggest search engine, fitted their product with a list filter that would exclude hits without a correct syntactical context. In response, as Dr Liberman notes, many offending websites have hired putational linguists to churn out syntactically correct but meaningless verbiage including mon search terms. “When some sandbank over a superslots hibernates, a directness toward a progressive jackpot earns frequent flier miles” is a typical example. Such pages are not filtered by Google, and thus create noise inresearch data.There are other problems as well. Search engines, unlike the tools linguists use to analyse standard corpora, do not allow searching for a particular linguistic structure, such as “[Noun phrase] far from [verb phrase]”. This requires indirect searching via samples like “He far from succeeded”. But Philip Resnik, of the University of Maryland, has created a “Linguist39。s Search Engine” (LSE) to overe this. When trying to answer, for example, whether a certain kind of verb is generally used with a direct object, the LSE grabs a chunk of web pages (say a thousand, with perhaps a million words) that each include an example of the verb. The LSE then parses thesample, allowing the linguist to find examples of a given structure, such as the verb without an object. In short, the LSE allows a user to create and analyse a custommade corpus within minutes.The web still has its drawbacks. Most of it is in English, limiting its use for other languages (although Dr Resnik is working on a Chinese version of the LSE). And it is mostly written, not spoken, making it tougher to gauge people39。s spontaneous use. But since much web content is written by nonprofessional writers, it more clearly represents informal and spoken English than a corpus such as the North American News Text Corpus does. Despite the problems, linguists are gradually warming to the web as a corpus for formal research. An early paper on the subject, written in 2003 by Frank Keller and Mirella Lapata, of Edinburgh and Sheffield Universities, showed that web searches for rare twoword phrases correlated well with the frequency found in traditional corpora, as well as with human judgments of whether those phrases were natural. What problems the web throws up are seemingly outweighed by the advantages of its huge size. Such evidence, along with tools such as Dr Resnik39。s, should convince more and more linguists to turn to the corpus on their desktop. Young scholars seem particularly keen.The easy availability of the web also serves another purpose: to democratise the way linguists work. Allowing anyone to conduct his own impromptu linguistic research, some linguists hope, will do more to popularise their notion of studying the intricacy and charm of language as it really exists, not as killjoy prescriptivists think it should be.

點擊復(fù)制文檔內(nèi)容

教學(xué)教案相關(guān)推薦

freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

考研英語閱讀理解基本素材經(jīng)濟學(xué)人科技類-資料下載頁

高一英語閱讀理解試題(故事類)-資料下載頁

高一英語閱讀理解廣告類資料-資料下載頁

考研英語閱讀理解模擬試題及解析一-資料下載頁

考研英語閱讀理解常見單詞特殊義(全)-資料下載頁

何凱文考研英語閱讀理解解題思路-資料下載頁

考研英語歷年真題閱讀理解精讀筆記-資料下載頁

考研英語閱讀理解解題技巧全匯總-資料下載頁

考研英語閱讀詞匯分類記政法類-資料下載頁

考研英文原刊經(jīng)濟學(xué)人識別人們的行蹤-資料下載頁

考研英文原刊經(jīng)濟學(xué)人：識別人們的行蹤-資料下載頁

社會團體章程示本(科技類)-資料下載頁

六套考研英語閱讀理解模擬試題及解析-資料下載頁

考研英語閱讀理解10種題型解題技巧-資料下載頁

考研英語真題(1997-2004)閱讀理解精讀-資料下載頁

06考研英語歷年閱讀理解真題精析-資料下載頁

考研英語閱讀理解基本素材經(jīng)濟學(xué)人科技類-wenkub

考研英語閱讀理解基本素材經(jīng)濟學(xué)人科技類(已修改)

考研英語閱讀理解基本素材經(jīng)濟學(xué)人科技類(編輯修改稿)

考研英語閱讀理解基本素材經(jīng)濟學(xué)人科技類-wenkub.com

考研英語閱讀理解基本素材經(jīng)濟學(xué)人科技類(已改無錯字)