正文內(nèi)容

textmining文本挖掘課件(參考版)

2024-08-23 17:20本頁面

　　

【正文】 900 questions ? Technique doesn?t do too well (though would have placed in top 9 of ~30 participants!) ? MRR = (., right answer ranked about 45 on average) ? Why? Because it relies on the enormity of the Web! ? Using the Web as a whole, not just TREC?s 1M documents… MRR = (., on average, right answer is ranked about 23) Issues ? In many scenarios (., monitoring an individual?s …) we only have a small set of documents ? Works best/only for “Trivial Pursuit”style factbased questions ? Limited/brittle repertoire of ? question categories ? answer data types/filters ? query rewriting rules ISI: Surface patterns approach ? Use of Characteristic Phrases ? When was person born” ? Typical answers ? Mozart was born in 1756.” ? Gandhi (18691948)...” ? Suggests phrases (regular expressions) like ? NAME was born in BIRTHDATE” ? NAME ( BIRTHDATE” ? Use of Regular Expressions can help locate correct answer Use Pattern Learning ? Example: ? “ The great poser Mozart (17561791) achieved fame at a young age” ? “ Mozart (17561791) was a genius” ? “ The whole world would always be indebted to the great music of Mozart (17561791)” ? Longest matching substring for all 3 sentences is Mozart (17561791)” ? Suffix tree would extract Mozart (17561791) as an output, with score of 3 Pattern Learning (cont.) ? Repeat with different examples of same question type ? “ Gandhi 1869” , “ Newton 1642” , etc. ? Some patterns learned for BIRTHDATE ? a. born in ANSWER, NAME ? b. NAME was born on ANSWER , ? c. NAME ( ANSWER ? d. NAME ( ANSWER ) Experiments ? 6 different Q types ? from Webclopedia QA Typology (Hovy et al., 2020a) ? BIRTHDATE ? LOCATION ? INVENTOR ? DISCOVERER ? DEFINITION ? WHYFAMOUS Experiments: pattern precision ? BIRTHDATE table: ? NAME ( ANSWER ) ? NAME was born on ANSWER, ? NAME was born in ANSWER ? NAME was born ANSWER ? ANSWER NAME was born ? NAME ( ANSWER ? NAME ( ANSWER ? INVENTOR ? ANSWER invents NAME ? the NAME was invented by ANSWER ? ANSWER invented the NAME in Experiments (cont.) ? DISCOVERER ? when ANSWER discovered NAME ? ANSWER39。 reactions within the EU and around the world. TDT: The Corpus ? Topic Detection and Tracking ? “ Bakeoff” sponsored by US government agencies ? TDT evaluation corpora consist of text and transcribed news from 1990s. ? A set of target events (., 119 in TDT2) is used for evaluation ? Corpus is tagged for these events (including first story) ? TDT2 consists of 60,000 news stories, JanJune 1998, about 3,000 are “ on topic” for one of 119 topics ? Stories are arranged in chronological order TDT2020 Tasks in News Detection ? There is no supervised topic training (like Topic Detection) Time First Stories Not First Stories = Topic 1 = Topic 2 The FirstStory Detection Task To detect the first story that discusses a topic, for all topics. First Story Detection ? New event detection is an unsupervised learning task ? Detection may consist of discovering previously unidentified events in an accumulated collection – retro ? Flagging onset of new events from live news feeds in an online fashion ? Lack of advance knowledge of new events, but have access to unlabeled historical data as a contrast set ? The input to online detection is the stream of TDT stories in chronological order simulating realtime ining documents ? The output of online detection is a YES/NO decision per document Approach 1: KNN ? Online processing of each ining story ? Compute similarity to all previous stories ? Cosine similarity ? Language model ? Prominent terms ? Extracted entities ? If similarity is below threshold: new story ? If similarity is above threshold for previous story s: assign to topic of s ? Threshold can be trained on training set ? Threshold is not topic specific! Approach 2: Single Pass Clustering ? Assign each ining document to one of a set of topic clusters ? A topic cluster is represented by its centroid (vector average of members) ? For ining story pute similarity with centroid Patterns in Event Distributions ? News stories discussing the same event tend to be temporally proximate ? A time gap between burst of topically similar stories is often an indication of different events ? Different earthquakes ? Airplane accidents ? A significant vocabulary shift and rapid changes in term frequency are typical of stories reporting a new event, including previously unseen proper nouns ? Events are typically reported in a relatively brief time window of 1 4 weeks Similar Events over Time Approach 3: KNN + Time ? Only consider documents in a (short) time window ? Compute similarity in a time weighted fashion: ? m: number of documents in window, di: ith document in window ? Time weighting significantly increases performance. FSD Results Discussion ? Hard problem ? Bees harder the more topics need to be tracked. ? Second Story Detection much easier that First Story Detection ? Example: ? retrospective detection of first 9/11 story easy, ? online detection hard References ? Online New Event Detection using SinglePass Clustering, Papka, Allan (University of Massachusetts, 1997) ? A study on Retrospective and OnLine Event Detection, Yang, Pierce, Carbonell (Carnegie Mellon University, 1998) ? Umass at TDT2020, Allan, Lavrenko, Frey, Khandelwal (Umass, 2020) ? Statistical Models for Tracking and Detection, (Dragon Systems, 1999) Summarization What is a Summary? ? Informative summary ? Purpose: replace original document ? Example: executive summary ? Indicative summary ? Purpose: support decision: do I want to read original docu

點擊復(fù)制文檔內(nèi)容

公司管理相關(guān)推薦

textmining文本挖掘課件(參考版)

【摘要】OutlineofToday?Introduction?Lexiconconstruction?TopicDetectionandTracking?Summarization?QuestionAnsweringDataMining--MarketBasketAnalysis?80%ofthepeoplewh

2024-08-23 17:20

文本挖掘簡介(參考版)

【摘要】文本挖掘簡介鄒權(quán)博士，助理教授Outline?Introduction?TF-IDF?SimilarityIntroduction?Why？?Textmining≈Webmining?How？?ClassificationorClustering?Retrieval文本分類一般過程?預(yù)處

2024-10-28 15:45

textmining(參考版)

【摘要】MINING文本挖掘（TextMining）報告人：張忠平2022/03/18提綱?文本挖掘的起源?文本挖掘的過程?特征建立?特征集縮減?知識模式提取?模型評價?國內(nèi)外研究狀況文本挖掘的起源?文本數(shù)據(jù)庫（web文檔數(shù)據(jù)）?半結(jié)構(gòu)化數(shù)據(jù)（semistruc

2024-07-28 17:34

文本挖掘算法總結(jié)(參考版)

【摘要】文本數(shù)據(jù)挖掘算法應(yīng)用小結(jié)1、基于概率統(tǒng)計的貝葉斯分類??2、ID3決策樹分類?3、基于粗糙集理論RoughSet的確定型知識挖掘?4、基于k-means聚類?5、無限細(xì)分的模糊聚類FuzzyClustering??6、SOM神經(jīng)元網(wǎng)絡(luò)聚類?7、基于Meaning的文本相似度計算&

2025-07-02 13:57

數(shù)據(jù)挖掘ppt課件(參考版)

【摘要】④內(nèi)部公開請勿外傳版權(quán)所有?1993-2022金蝶軟件（中國）有限公司④內(nèi)部公開請勿外傳大數(shù)據(jù)時代企業(yè)內(nèi)部小數(shù)據(jù)挖掘杭州蝶舞軟件有限公司④內(nèi)部公開請勿外傳大數(shù)據(jù)時代的需求如何提升ERP應(yīng)用效果K/3運(yùn)營魔方特色介紹目錄④內(nèi)部公開請勿外傳全球每秒鐘發(fā)送百

2025-05-15 05:04

充分挖掘文本資源改(參考版)

【摘要】挖掘文本資源?提高寫作能力閔行區(qū)北橋中心小學(xué)宋軼吟???《語文課程標(biāo)準(zhǔn)》中的語言積累的基本理念：語言材料的積累；語言運(yùn)用規(guī)律的積累；規(guī)范語言的積累。陳延軍老師在《培養(yǎng)學(xué)生的讀寫表現(xiàn)力》一書中有這樣形象的描述：“閱讀和寫作一個是吃進(jìn)來，一個是吐出去，一個是內(nèi)化，一個是外在表現(xiàn)，

2025-06-30 12:53

數(shù)據(jù)挖掘入門ppt課件(參考版)

【摘要】數(shù)據(jù)挖掘入門Date1

2025-05-15 08:50

序論數(shù)據(jù)挖掘ppt課件(參考版)

【摘要】數(shù)據(jù)挖掘與知識發(fā)現(xiàn)主講教師：王玲教科書和參考書n教科書q數(shù)據(jù)挖掘：概念與技術(shù)，JiaweiHan和MichelineKamber著，機(jī)械工業(yè)出版社(2022)n參考書q數(shù)據(jù)挖掘原理,DavidHand,HeikkiMannila和PadhraicSmyth著，機(jī)械工業(yè)出版社(2022)qDataMining

2025-05-15 08:29

數(shù)據(jù)挖掘綜述ppt課件(參考版)

【摘要】數(shù)據(jù)挖掘綜述北京師范大學(xué)數(shù)學(xué)學(xué)院1數(shù)據(jù)挖掘技術(shù)的由來n網(wǎng)絡(luò)技術(shù)的高度發(fā)展n數(shù)據(jù)爆炸但知識貧乏n支持?jǐn)?shù)據(jù)挖掘技術(shù)的基礎(chǔ)n從商業(yè)數(shù)據(jù)到商業(yè)信息的進(jìn)化數(shù)據(jù)爆炸但知識貧乏激增的數(shù)據(jù)背后隱藏著許多重要的信息，人們希望能夠?qū)ζ溥M(jìn)行更高層次的分析，

2025-05-03 18:14

數(shù)據(jù)挖掘方法ppt課件(參考版)

【摘要】第二章：管理與決策支持的數(shù)據(jù)挖掘方法教師：廖芹第二章管理與決策支持的數(shù)據(jù)挖掘方法概述主要方法：１、神經(jīng)網(wǎng)絡(luò)（感知機(jī)模型、BP、RBF、自組織模型）

2024-11-06 22:17

數(shù)據(jù)挖掘應(yīng)用ppt課件(參考版)

【摘要】數(shù)據(jù)挖掘應(yīng)用CRM顧客生命周期壽命盈利獲取消費(fèi)者保持消費(fèi)者消費(fèi)者分析和恢復(fù)收入支出壽命數(shù)據(jù)挖掘在CRM中的應(yīng)用Customeridentification?CRMbeginswithcustomeridentification.Thisphaseinvolvestarge

2024-11-06 22:17

web挖掘基礎(chǔ)ppt課件(參考版)

【摘要】Web挖掘基礎(chǔ)知識WWW提綱?Web挖掘的概念?Web內(nèi)容挖掘?Web結(jié)構(gòu)挖掘?Web日志挖掘Web挖掘的挑戰(zhàn)?Web數(shù)據(jù)量太龐大：ServerLevelCollection、ClientLevelCollection和ProxyLevelCollection?Web

2025-01-22 19:37

數(shù)據(jù)挖掘概述ppt課件(參考版)

【摘要】講授：吳雄華第一章數(shù)據(jù)挖掘概述電話：13752460206Email：一、引例網(wǎng)站這種推薦并非漫無邊際，而是有一定技術(shù)依據(jù)的，這種技術(shù)就是數(shù)據(jù)挖掘技術(shù)（DM）。網(wǎng)站怎么知道讀者可能會對這些物品干興趣？這是因為網(wǎng)站采用了新的技術(shù)來了解顧客的潛在需求，比如：網(wǎng)站從顧客的購買清單中發(fā)現(xiàn)你買的書與張三買過的書有幾本是相同的，但是還有些書張三已經(jīng)

2025-05-15 08:33

數(shù)據(jù)挖掘ppt課件(2)(參考版)

【摘要】第第13章章數(shù)據(jù)挖掘數(shù)據(jù)挖掘數(shù)據(jù)挖掘概述數(shù)據(jù)挖掘的基本類型和算法智能決策與物聯(lián)網(wǎng)本章內(nèi)容數(shù)據(jù)挖掘概述數(shù)據(jù)挖掘ü從大量數(shù)據(jù)中獲取潛在有用的并且可以被人們理解的模式的過程ü反復(fù)迭代的人機(jī)交互和處理過程，歷經(jīng)多個步驟，并且在一些步驟中需要由用戶提供決策數(shù)據(jù)挖掘概述數(shù)據(jù)挖掘過程?數(shù)據(jù)預(yù)處理階段

2025-05-03 18:24

數(shù)據(jù)挖掘技術(shù)ppt課件(參考版)

【摘要】于金霞計算機(jī)科學(xué)與技術(shù)學(xué)院信息管理與信息系統(tǒng)專業(yè)課程第三講數(shù)據(jù)挖掘技術(shù)主要內(nèi)容?一、數(shù)據(jù)挖掘概述?二、數(shù)據(jù)預(yù)處理?三、數(shù)據(jù)挖掘算法－分類與預(yù)測?四、數(shù)據(jù)挖掘算法－聚類?五、數(shù)據(jù)挖掘算法－關(guān)聯(lián)分析?六、序列模式挖掘?七、數(shù)據(jù)挖掘軟件?八、數(shù)據(jù)挖掘應(yīng)用一、數(shù)據(jù)

2025-01-20 17:45

freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

textmining文本挖掘課件(參考版)

textmining文本挖掘課件(參考版)

文本挖掘簡介(參考版)

textmining(參考版)

文本挖掘算法總結(jié)(參考版)

數(shù)據(jù)挖掘ppt課件(參考版)

充分挖掘文本資源改(參考版)

數(shù)據(jù)挖掘入門ppt課件(參考版)

序論數(shù)據(jù)挖掘ppt課件(參考版)

數(shù)據(jù)挖掘綜述ppt課件(參考版)

數(shù)據(jù)挖掘方法ppt課件(參考版)

數(shù)據(jù)挖掘應(yīng)用ppt課件(參考版)

web挖掘基礎(chǔ)ppt課件(參考版)

數(shù)據(jù)挖掘概述ppt課件(參考版)

數(shù)據(jù)挖掘ppt課件(2)(參考版)

數(shù)據(jù)挖掘技術(shù)ppt課件(參考版)

textmining文本挖掘課件(存儲版)

textmining文本挖掘課件-文庫吧在線文庫

textmining文本挖掘課件(完整版)

textmining文本挖掘課件(更新版)

textmining文本挖掘課件(專業(yè)版)