freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內容

textmining文本挖掘課件-wenkub

2022-08-30 17:20:15 本頁面
 

【正文】 ons (grunts, shouts, etc.)? Top Performing Systems ? Currently the best performing systems at TREC can answer approximately 6080% of the questions ? A pretty amazing performance! ? Approaches and successes have varied a fair deal ? Knowledgerich approaches, using a vast array of NLP techniques stole the show in 2020, 2020 ? Notably Harabagiu, Moldovan et al. – SMU/UTD/LCC ? AskMSR system stressed how much could be achieved by very simple methods with enough text (now has various copycats) ? Middle ground is to use a large collection of surface matching patterns (ISI) AskMSR ? Web Question Answering: Is More Always Better? ? Dumais, Banko, Brill, Lin, Ng (Microsoft, MIT, Berkeley) ? Q: “ Where is the Louvre located?” ? Want “ Paris” or “ France” or “ 75058 Paris Cedex 01” or a map ? Don?t just want URLs AskMSR: Shallow approach ? In what year did Abraham Lincoln die? ? Ignore hard documents and find easy ones AskMSR: Details 1 2 3 4 5 Step 1: Rewrite queries ? Intuition: The user?s question is often syntactically quite close to sentences that contain the answer ? Where is the Louvre Museum located? ? The Louvre Museum is located in Paris ? Who created the character of Scrooge? ? Charles Dickens created the character of Scrooge. Query rewriting ? Classify question into seven categories ? Who is/was/are/were…? ? When is/did/will/are/were …? ? Where is/are/were …? a. Categoryspecific transformation rules eg “For Where questions, move ?is? to all possible locations” “Where is the Louvre Museum located” ? “is the Louvre Museum located” ? “the is Louvre Museum located” ? “the Louvre is Museum located” ? “the Louvre Museum is located” ? “the Louvre Museum located is” b. Expected answer “Datatype” (eg, Date, Person, Location, …) When was the French Revolution? ? DATE ? Handcrafted classification/rewrite/datatype rules (Could they be automatically learned?) Nonsense, but who cares? It’s only a few more queries to Google. Query Rewriting weights ? One wrinkle: Some query rewrites are more reliable than others +“the Louvre Museum is located” Where is the Louvre Museum located? Weight 5 if we get a match, it’s probably right +Louvre +Museum +located Weight 1 Lots of nonanswers could e back too Step 2: Query search engine ? Send all rewrites to a Web search engine ? Retrieve top N answers (100?) ? For speed, rely just on search engine?s “snippets”, not the full text of the actual document Step 3: Mining NGrams ? Unigram, bigram, trigram, … Ngram: list of N adjacent terms in a sequence ? Eg, “Web Question Answering: Is More Always Better” ? Unigrams: Web, Question, Answering, Is, More, Always, Better ? Bigrams: Web Question, Question Answering, Answering Is, Is More, More Always, Always Better ? Trigrams: Web Question Answering, Question Answering Is, Answering Is More, Is More Always, More Always Betters Mining NGrams ? Simple: Enumerate all Ngrams (N=1,2,3 say) in all retrieved snippets ? Use hash table and other fancy footwork to make this efficient ? Weight of an Ngram: occurrence count, each weighted by “reliability” (weight) of rewrite that fetched the document ? Example: “Who created the character of Scrooge?” ? Dickens 117 ? Christmas Carol 78 ? Charles Dickens 75 ? Disney 72 ? Carl Banks 54 ? A Christmas 41 ? Christmas Carol 45 ? Uncle 31 Step 4: Filtering NGrams ? Each question type is associated with one or more “datatype filters” = regular expressions ? When… ? Where… ? What … ? Who … ? Boost score of Ngrams that do match regexp ? Lower score of Ngrams that don?t match regexp ? Details omitted from paper…. Date Location Person Step 5: Tiling the Answers Dickens Charles Dickens Mr Charles Scores 20 15 10 merged, discard old ngrams Mr Charles Dickens Score 45 NGrams tile highestscoring ngram NGrams Repeat, until no more overlap Results ? Standard TREC contest testbed: ~1M documents。 policy changes due to the crash (new runway lights were installed at airports). ? Euro Introduced () ? On topic: stories about the preparation for the mon currency (negotiations about exchange rates and financial standards to be shared among the member nations)。 ? High levels of magnesium inhibit SCD。 ? Text mining is the process of piling, anizing, and analyzing large document collections to support the delivery of targeted types of information to analysts and decision makers and to discover relationships between related facts that span wide domains of inquiry. True Text Data Mining: Don Swanson’ s Medical Work ? Given ? medical titles and abstracts ? a problem (incurable rare disease) ? some medical expertise ? find causal links among titles ? symptoms ? drugs ? results ? .: Magnesium deficiency related to migraine ? This was found by extracting features from medical literature on migraines and nutrition Swanson Example (1991) ? Problem: Migraine headaches (M) ? Stress is associated with migraines。Outline of Today ? Introduction ? Lexicon construction ? Topic Detection and Tracking ? Summarization ? Question Answering Data Mining Market Basket Analysis ? 80% of the people who buy milk also buy bread ? On Friday’s, 70% of the men who bought diapers also bought beer. ? What is the relationship between diapers and beer? ? Walmart could trace the reason after doing a small survey! The business opportunity in text mining? 0102030405060708090100D a ta vo l u m e M a r k e t Ca pU n s tr u c tu r e dS tr u c tu r e dCorporate Knowledge “ Ore” ?
點擊復制文檔內容
公司管理相關推薦
文庫吧 www.dybbs8.com
備案圖片鄂ICP備17016276號-1