【正文】
900 questions ? Technique doesn?t do too well (though would have placed in top 9 of ~30 participants!) ? MRR = (., right answer ranked about 45 on average) ? Why? Because it relies on the enormity of the Web! ? Using the Web as a whole, not just TREC?s 1M documents… MRR = (., on average, right answer is ranked about 23) Issues ? In many scenarios (., monitoring an individual?s …) we only have a small set of documents ? Works best/only for “Trivial Pursuit”style factbased questions ? Limited/brittle repertoire of ? question categories ? answer data types/filters ? query rewriting rules ISI: Surface patterns approach ? Use of Characteristic Phrases ? When was person born” ? Typical answers ? Mozart was born in 1756.” ? Gandhi (18691948)...” ? Suggests phrases (regular expressions) like ? NAME was born in BIRTHDATE” ? NAME ( BIRTHDATE” ? Use of Regular Expressions can help locate correct answer Use Pattern Learning ? Example: ? “ The great poser Mozart (17561791) achieved fame at a young age” ? “ Mozart (17561791) was a genius” ? “ The whole world would always be indebted to the great music of Mozart (17561791)” ? Longest matching substring for all 3 sentences is Mozart (17561791)” ? Suffix tree would extract Mozart (17561791) as an output, with score of 3 Pattern Learning (cont.) ? Repeat with different examples of same question type ? “ Gandhi 1869” , “ Newton 1642” , etc. ? Some patterns learned for BIRTHDATE ? a. born in ANSWER, NAME ? b. NAME was born on ANSWER , ? c. NAME ( ANSWER ? d. NAME ( ANSWER ) Experiments ? 6 different Q types ? from Webclopedia QA Typology (Hovy et al., 2020a) ? BIRTHDATE ? LOCATION ? INVENTOR ? DISCOVERER ? DEFINITION ? WHYFAMOUS Experiments: pattern precision ? BIRTHDATE table: ? NAME ( ANSWER ) ? NAME was born on ANSWER, ? NAME was born in ANSWER ? NAME was born ANSWER ? ANSWER NAME was born ? NAME ( ANSWER ? NAME ( ANSWER ? INVENTOR ? ANSWER invents NAME ? the NAME was invented by ANSWER ? ANSWER invented the NAME in Experiments (cont.) ? DISCOVERER ? when ANSWER discovered NAME ? ANSWER39。 reactions within the EU and around the world. TDT: The Corpus ? Topic Detection and Tracking ? “ Bakeoff” sponsored by US government agencies ? TDT evaluation corpora consist of text and transcribed news from 1990s. ? A set of target events (., 119 in TDT2) is used for evaluation ? Corpus is tagged for these events (including first story) ? TDT2 consists of 60,000 news stories, JanJune 1998, about 3,000 are “ on topic” for one of 119 topics ? Stories are arranged in chronological order TDT2020 Tasks in News Detection ? There is no supervised topic training (like Topic Detection) Time First Stories Not First Stories = Topic 1 = Topic 2 The FirstStory Detection Task To detect the first story that discusses a topic, for all topics. First Story Detection ? New event detection is an unsupervised learning task ? Detection may consist of discovering previously unidentified events in an accumulated collection – retro ? Flagging onset of new events from live news feeds in an online fashion ? Lack of advance knowledge of new events, but have access to unlabeled historical data as a contrast set ? The input to online detection is the stream of TDT stories in chronological order simulating realtime ining documents ? The output of online detection is a YES/NO decision per document Approach 1: KNN ? Online processing of each ining story ? Compute similarity to all previous stories ? Cosine similarity ? Language model ? Prominent terms ? Extracted entities ? If similarity is below threshold: new story ? If similarity is above threshold for previous story s: assign to topic of s ? Threshold can be trained on training set ? Threshold is not topic specific! Approach 2: Single Pass Clustering ? Assign each ining document to one of a set of topic clusters ? A topic cluster is represented by its centroid (vector average of members) ? For ining story pute similarity with centroid Patterns in Event Distributions ? News stories discussing the same event tend to be temporally proximate ? A time gap between burst of topically similar stories is often an indication of different events ? Different earthquakes ? Airplane accidents ? A significant vocabulary shift and rapid changes in term frequency are typical of stories reporting a new event, including previously unseen proper nouns ? Events are typically reported in a relatively brief time window of 1 4 weeks Similar Events over Time Approach 3: KNN + Time ? Only consider documents in a (short) time window ? Compute similarity in a time weighted fashion: ? m: number of documents in window, di: ith document in window ? Time weighting significantly increases performance. FSD Results Discussion ? Hard problem ? Bees harder the more topics need to be tracked. ? Second Story Detection much easier that First Story Detection ? Example: ? retrospective detection of first 9/11 story easy, ? online detection hard References ? Online New Event Detection using SinglePass Clustering, Papka, Allan (University of Massachusetts, 1997) ? A study on Retrospective and OnLine Event Detection, Yang, Pierce, Carbonell (Carnegie Mellon University, 1998) ? Umass at TDT2020, Allan, Lavrenko, Frey, Khandelwal (Umass, 2020) ? Statistical Models for Tracking and Detection, (Dragon Systems, 1999) Summarization What is a Summary? ? Informative summary ? Purpose: replace original document ? Example: executive summary ? Indicative summary ? Purpose: support decision: do I want to read original docu