freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

textmining文本挖掘課件-資料下載頁

2025-08-10 17:20本頁面

【導(dǎo)讀】Introduction. Summarization. QuestionAnswering. alsoboughtbeer.beer?smallsurvey!mining?Email. Insuranceclaims. Newsarticles. Webpages. Patentportfolios. IRC. Scientificarticles. Customerplaint. Contracts. Technicaldocuments. QuestionAnswering. Categorization(documents). Clustering(collections). Webspider/search. Textclassification. Textclustering. Isthistextmining?Whatelseisneeded?Example:. Example:. CMOs. Commercialbank. Commercialpaper. CommercialUnion. ConsulRestaurant. Convertiblebond. Creditfacility. Creditline. Debtsecurity. Debtorcountry. DetroitEdison. DigitalEquipment. Dollarsofdebt. End-March. Enserch. Equitywarrant. Eurodollar. Peoples’firstthought:. hugepile.2definitions:

  

【正文】 Is More Always Better? ? Dumais, Banko, Brill, Lin, Ng (Microsoft, MIT, Berkeley) ? Q: “ Where is the Louvre located?” ? Want “ Paris” or “ France” or “ 75058 Paris Cedex 01” or a map ? Don?t just want URLs AskMSR: Shallow approach ? In what year did Abraham Lincoln die? ? Ignore hard documents and find easy ones AskMSR: Details 1 2 3 4 5 Step 1: Rewrite queries ? Intuition: The user?s question is often syntactically quite close to sentences that contain the answer ? Where is the Louvre Museum located? ? The Louvre Museum is located in Paris ? Who created the character of Scrooge? ? Charles Dickens created the character of Scrooge. Query rewriting ? Classify question into seven categories ? Who is/was/are/were…? ? When is/did/will/are/were …? ? Where is/are/were …? a. Categoryspecific transformation rules eg “For Where questions, move ?is? to all possible locations” “Where is the Louvre Museum located” ? “is the Louvre Museum located” ? “the is Louvre Museum located” ? “the Louvre is Museum located” ? “the Louvre Museum is located” ? “the Louvre Museum located is” b. Expected answer “Datatype” (eg, Date, Person, Location, …) When was the French Revolution? ? DATE ? Handcrafted classification/rewrite/datatype rules (Could they be automatically learned?) Nonsense, but who cares? It’s only a few more queries to Google. Query Rewriting weights ? One wrinkle: Some query rewrites are more reliable than others +“the Louvre Museum is located” Where is the Louvre Museum located? Weight 5 if we get a match, it’s probably right +Louvre +Museum +located Weight 1 Lots of nonanswers could e back too Step 2: Query search engine ? Send all rewrites to a Web search engine ? Retrieve top N answers (100?) ? For speed, rely just on search engine?s “snippets”, not the full text of the actual document Step 3: Mining NGrams ? Unigram, bigram, trigram, … Ngram: list of N adjacent terms in a sequence ? Eg, “Web Question Answering: Is More Always Better” ? Unigrams: Web, Question, Answering, Is, More, Always, Better ? Bigrams: Web Question, Question Answering, Answering Is, Is More, More Always, Always Better ? Trigrams: Web Question Answering, Question Answering Is, Answering Is More, Is More Always, More Always Betters Mining NGrams ? Simple: Enumerate all Ngrams (N=1,2,3 say) in all retrieved snippets ? Use hash table and other fancy footwork to make this efficient ? Weight of an Ngram: occurrence count, each weighted by “reliability” (weight) of rewrite that fetched the document ? Example: “Who created the character of Scrooge?” ? Dickens 117 ? Christmas Carol 78 ? Charles Dickens 75 ? Disney 72 ? Carl Banks 54 ? A Christmas 41 ? Christmas Carol 45 ? Uncle 31 Step 4: Filtering NGrams ? Each question type is associated with one or more “datatype filters” = regular expressions ? When… ? Where… ? What … ? Who … ? Boost score of Ngrams that do match regexp ? Lower score of Ngrams that don?t match regexp ? Details omitted from paper…. Date Location Person Step 5: Tiling the Answers Dickens Charles Dickens Mr Charles Scores 20 15 10 merged, discard old ngrams Mr Charles Dickens Score 45 NGrams tile highestscoring ngram NGrams Repeat, until no more overlap Results ? Standard TREC contest testbed: ~1M documents。 900 questions ? Technique doesn?t do too well (though would have placed in top 9 of ~30 participants!) ? MRR = (., right answer ranked about 45 on average) ? Why? Because it relies on the enormity of the Web! ? Using the Web as a whole, not just TREC?s 1M documents… MRR = (., on average, right answer is ranked about 23) Issues ? In many scenarios (., monitoring an individual?s …) we only have a small set of documents ? Works best/only for “Trivial Pursuit”style factbased questions ? Limited/brittle repertoire of ? question categories ? answer data types/filters ? query rewriting rules ISI: Surface patterns approach ? Use of Characteristic Phrases ? When was person born” ? Typical answers ? Mozart was born in 1756.” ? Gandhi (18691948)...” ? Suggests phrases (regular expressions) like ? NAME was born in BIRTHDATE” ? NAME ( BIRTHDATE” ? Use of Regular Expressions can help locate correct answer Use Pattern Learning ? Example: ? “ The great poser Mozart (17561791) achieved fame at a young age” ? “ Mozart (17561791) was a genius” ? “ The whole world would always be indebted to the great music of Mozart (17561791)” ? Longest matching substring for all 3 sentences is Mozart (17561791)” ? Suffix tree would extract Mozart (17561791) as an output, with score of 3 Pattern Learning (cont.) ? Repeat with different examples of same question type ? “ Gandhi 1869” , “ Newton 1642” , etc. ? Some patterns learned for BIRTHDATE ? a. born in ANSWER, NAME ? b. NAME was born on ANSWER , ? c. NAME ( ANSWER ? d. NAME ( ANSWER ) Experiments ? 6 different Q types ? from Webclopedia QA Typology (Hovy et al., 2020a) ? BIRTHDATE ? LOCATION ? INVENTOR ? DISCOVERER ? DEFINITION ? WHYFAMOUS Experiments: pattern precision ? BIRTHDATE table: ? NAME ( ANSWER ) ? NAME was born on ANSWER, ? NAME was born in ANSWER ? NAME was born ANSWER ? ANSWER NAME was born ? NAME ( ANSWER ? NAME ( ANSWER ? INVENTOR ? ANSWER invents NAME ? the NAME was invented by ANSWER ? ANSWER invented the NAME in Experiments (cont.) ? DISCOVERER ? when ANSWER discovered NAME ? ANSWER39。s discovery of NAME ? NAME was discovered by ANSWER in ? DEFINITION
點擊復(fù)制文檔內(nèi)容
公司管理相關(guān)推薦
文庫吧 www.dybbs8.com
備案圖鄂ICP備17016276號-1