freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

textmining文本挖掘課件-全文預(yù)覽

2025-09-14 17:20 上一頁面

下一頁面
  

【正文】 l data as a contrast set ? The input to online detection is the stream of TDT stories in chronological order simulating realtime ining documents ? The output of online detection is a YES/NO decision per document Approach 1: KNN ? Online processing of each ining story ? Compute similarity to all previous stories ? Cosine similarity ? Language model ? Prominent terms ? Extracted entities ? If similarity is below threshold: new story ? If similarity is above threshold for previous story s: assign to topic of s ? Threshold can be trained on training set ? Threshold is not topic specific! Approach 2: Single Pass Clustering ? Assign each ining document to one of a set of topic clusters ? A topic cluster is represented by its centroid (vector average of members) ? For ining story pute similarity with centroid Patterns in Event Distributions ? News stories discussing the same event tend to be temporally proximate ? A time gap between burst of topically similar stories is often an indication of different events ? Different earthquakes ? Airplane accidents ? A significant vocabulary shift and rapid changes in term frequency are typical of stories reporting a new event, including previously unseen proper nouns ? Events are typically reported in a relatively brief time window of 1 4 weeks Similar Events over Time Approach 3: KNN + Time ? Only consider documents in a (short) time window ? Compute similarity in a time weighted fashion: ? m: number of documents in window, di: ith document in window ? Time weighting significantly increases performance. FSD Results Discussion ? Hard problem ? Bees harder the more topics need to be tracked. ? Second Story Detection much easier that First Story Detection ? Example: ? retrospective detection of first 9/11 story easy, ? online detection hard References ? Online New Event Detection using SinglePass Clustering, Papka, Allan (University of Massachusetts, 1997) ? A study on Retrospective and OnLine Event Detection, Yang, Pierce, Carbonell (Carnegie Mellon University, 1998) ? Umass at TDT2020, Allan, Lavrenko, Frey, Khandelwal (Umass, 2020) ? Statistical Models for Tracking and Detection, (Dragon Systems, 1999) Summarization What is a Summary? ? Informative summary ? Purpose: replace original document ? Example: executive summary ? Indicative summary ? Purpose: support decision: do I want to read original document yes/no? ? Example: Headline, scientific abstract Why Automatic Summarization? ? Algorithm for reading in many domains is: ? read summary ? decide whether relevant or not ? if relevant: read whole document ? Summary is gatekeeper for large number of documents. ? Information overload ? Often the summary is all that is read. ? Humangenerated summaries are expensive. Summary Length (Reuters) Goldstein et al. 1999 Summarization Algorithms ? Keyword summaries ? Display most significant keywords ? Easy to do ? Hard to read, poor representation of content ? Sentence extraction ? Extract key sentences ? Medium hard ? Summaries often don’ t read well ? Good representation of content ? Natural language understanding / generation ? Build knowledge representation of text ? Generate sentences summarizing content ? Hard to do well ? Something between the last two methods? Sentence Extraction ? Represent each sentence as a feature vector ? Compute score based on features ? Select n highestranking sentences ? Present in order in which they occur in text. ? Postprocessing to make summary more readable/concise ? Eliminate redundant sentences ? Anaphors/pronouns ? Delete subordinate clauses, parentheticals Sentence Extraction: Example ? Sigir95 paper on summarization by Kupiec, Pedersen, Chen ? Trainable sentence extraction ? Proposed algorithm is applied to its own description (the paper) Sentence Extraction: Example Feature Representation ? Fixedphrase feature ? Certain phrases indicate summary, . “ in summary” , “ in conclusion” etc. ? Paragraph feature ? Paragraph initial/final more likely to be important. ? Thematic word feature ? Repetition is an indicator of importance ? Do any of the most frequent content words occur? ? Uppercase word feature ? Uppercase often indicates named entities. (Taylor) ? Is uppercase thematic word introduced? ? Sentence length cutoff ? Summary sentence should be 5 words. ? Summary sentences have a minimum length. Training ? Handlabel sentences in training set (good/bad summary sentences) ? Train classifier to distinguish good/bad summary sentences ? Model used: Na239。 reports on the investigation following the crash。 ? Spreading cortical depression (SCD) is implicated in some migraines。 ? Discovery of knowledge previously unknown to the user in text。 ? 2 definitions: ? Any operation related to gathering and analyzing text from external sources for business intelligence purposes。 ? calcium channel blockers prevent some migraines ? Magnesium is a natural calcium channel blocker。 ? Magnesium can suppress platelet aggregability. ? All extracted from medical journal titles Gathering Evidence migraine magnesium stress CCB PA SCD All Nutrition Research All Migraine Research Swanson’ s TDM ? Two of his hypotheses have received some experimental verification. ? His technique ? Only partially automated ? Required medical expertise ? Few people are working on this kind of information aggregation problem. Or maybe it was already known? Lexicon Construction What is a Lexicon? ? A database of the vocabulary of a particular domain (or a language) ? More than a list of words/phrases ? Usually some linguistic information ? Morphology
點(diǎn)擊復(fù)制文檔內(nèi)容
公司管理相關(guān)推薦
文庫吧 www.dybbs8.com
備案圖鄂ICP備17016276號(hào)-1