freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

textmining文本挖掘課件-文庫吧資料

2024-08-27 17:20本頁面
  

【正文】 ment yes/no? ? Example: Headline, scientific abstract Why Automatic Summarization? ? Algorithm for reading in many domains is: ? read summary ? decide whether relevant or not ? if relevant: read whole document ? Summary is gatekeeper for large number of documents. ? Information overload ? Often the summary is all that is read. ? Humangenerated summaries are expensive. Summary Length (Reuters) Goldstein et al. 1999 Summarization Algorithms ? Keyword summaries ? Display most significant keywords ? Easy to do ? Hard to read, poor representation of content ? Sentence extraction ? Extract key sentences ? Medium hard ? Summaries often don’ t read well ? Good representation of content ? Natural language understanding / generation ? Build knowledge representation of text ? Generate sentences summarizing content ? Hard to do well ? Something between the last two methods? Sentence Extraction ? Represent each sentence as a feature vector ? Compute score based on features ? Select n highestranking sentences ? Present in order in which they occur in text. ? Postprocessing to make summary more readable/concise ? Eliminate redundant sentences ? Anaphors/pronouns ? Delete subordinate clauses, parentheticals Sentence Extraction: Example ? Sigir95 paper on summarization by Kupiec, Pedersen, Chen ? Trainable sentence extraction ? Proposed algorithm is applied to its own description (the paper) Sentence Extraction: Example Feature Representation ? Fixedphrase feature ? Certain phrases indicate summary, . “ in summary” , “ in conclusion” etc. ? Paragraph feature ? Paragraph initial/final more likely to be important. ? Thematic word feature ? Repetition is an indicator of importance ? Do any of the most frequent content words occur? ? Uppercase word feature ? Uppercase often indicates named entities. (Taylor) ? Is uppercase thematic word introduced? ? Sentence length cutoff ? Summary sentence should be 5 words. ? Summary sentences have a minimum length. Training ? Handlabel sentences in training set (good/bad summary sentences) ? Train classifier to distinguish good/bad summary sentences ? Model used: Na239。 official introduction of the Euro。 reports on the investigation following the crash。 ? Migraine patients have high platelet aggregability。 ? Spreading cortical depression (SCD) is implicated in some migraines。 ? Stress can lead to a loss of magnesium。 ? Discovery of knowledge previously unknown to the user in text。Outline of Today ? Introduction ? Lexicon construction ? Topic Detection and Tracking ? Summarization ? Question Answering Data Mining Market Basket Analysis ? 80% of the people who buy milk also buy bread ? On Friday’s, 70% of the men who bought diapers also bought beer. ? What is the relationship between diapers and beer? ? Walmart could trace the reason after doing a small survey! The business opportunity in text mining? 0102030405060708090100D a ta vo l u m e M a r k e t Ca pU n s tr u c tu r e dS tr u c tu r e dCorporate Knowledge “ Ore” ? Email ? Insurance claims ? News articles ? Web pages ? Patent portfolios ? IRC ? Scientific articles ? Customer plaint letters ? Contracts ? Transcripts of phone calls with customers ? Technical documents Stuff not very accessible via standard datamining Text Knowledge Extraction Tasks ? Small Stuff. Useful nuggets of information that a user wants: ? Question Answering ? Information Extraction (DB filling) ? Thesaurus Generation ? Big Stuff. Overviews: ? Summary Extraction (documents or collections) ? Categorization (documents) ? Clustering (collections) ? Text Data Mining: Interesting unknown correlations that one can discover Text Mining ? The foundation of most mercial “ text mining” products is all the stuff we have already covered: ? Information Retrieval engine ? Web spider/search ? Text classification ? Text clustering ? Named entity recognition ? Information extraction (only sometimes) ? Is this text mining? What else is needed? One tool: Question Answering ? Goal: Use Encyclopedia/other source to answer “Trivial Pursuitstyle” factoid questions ? Example: ? “What famed English site is found on Salisbury Plain?” From Another tool: Summarizing ? Highlevel summary or survey of all main points? ? How to summarize a collection? ? Example: ? sentence extraction from a single document IBM Text Miner terminology: Example of Vocabulary found ? Certificate of deposit ? CMOs ? Commercial bank ? Commercial paper ? Commercial Union Assurance ? Commodity Futures Trading Commission ? Consul Restaurant ? Convertible bond ? Credit facility ? Credit line ? Debt security ? Debtor country ? Detroit Edison ? Digital Equipment ? Dollars of debt ? EndMarch ? Enserch ? Equity warrant ? Eurodollar ? ? What is Text Data Mining? ? Peoples’ first thought: ? Make it easier to find things on the Web. ? But this is information retrieval! ? The metaphor of extracting ore from rock: ? Does make sense for extracting documents of interest from a huge pile. ? But does not reflect notions of DM in practice. Rather: ? finding patterns across large collections ? discovering heretofore unknown information Definitions of Text Mining ? Text mining mainly is about somehow extracting the information and knowledge from text。 ? 2 definitions: ? Any operation related to gathering and analyzing text from external sources for business intelligence purposes。 ? Text mining is the process of piling, anizing, and analyzing large document collections to support the delivery of targeted types of information to analysts and decision makers
點(diǎn)擊復(fù)制文檔內(nèi)容
公司管理相關(guān)推薦
文庫吧 www.dybbs8.com
備案圖鄂ICP備17016276號(hào)-1