【正文】
ecific time and place, and the unavoidable consequences. Specific elections, accidents, crimes, natural disasters. ? Activity: A connected set of actions that have a mon focus or purpose campaigns, investigations, disaster relief efforts. ? Topic: a seminal event or activity, along with all directly related events and activities ? Story: a topically cohesive segment of news that includes two or more DECLARATIVE independent clauses about a single event. Examples ? Thai Airbus Crash () ? On topic: stories reporting details of the crash, injuries and deaths。 ? Spreading cortical depression (SCD) is implicated in some migraines。 official introduction of the Euro。 ? Text mining is the process of piling, anizing, and analyzing large document collections to support the delivery of targeted types of information to analysts and decision makers and to discover relationships between related facts that span wide domains of inquiry. True Text Data Mining: Don Swanson’ s Medical Work ? Given ? medical titles and abstracts ? a problem (incurable rare disease) ? some medical expertise ? find causal links among titles ? symptoms ? drugs ? results ? .: Magnesium deficiency related to migraine ? This was found by extracting features from medical literature on migraines and nutrition Swanson Example (1991) ? Problem: Migraine headaches (M) ? Stress is associated with migraines。ve Bayes ? Can rank sentences according to score and show top n to user. Evaluation ? Compare extracted sentences with sentences in abstracts Evaluation of features ? Baseline (choose first n sentences): 24% ? Overall performance (4244%) not very good. ? However, there is more than one good summary. DUC ? DUC: government sponsored bakeoff to further progress in summarization and enable researchers to participate in largescale experiments. Multidocuments Summarization Newsblaster (Columbia) QuerySpecific Summarization ? So far, we’ ve look at generic summaries. ? A generic summary makes no assumption about the reader’ s interests. ? Queryspecific summaries are specialized for a single information need, the query. ? Summarization is much easier if we have a description of what the user wants. ? Recall from last quarter: ? Googletype excerpts – simply show keywords in context Discussion ? Correct parsing of document format is critical. ? Need to know headings, sequence, etc. ? Limits of current technology ? Some good summaries require natural language understanding ? Example: President Bush’ s nominees for ambassadorships ? Contributors to Bush’ s campaign ? Veteran diplomats ? Others References ? A Trainable Document Summarizer (1995) Julian Kupiec, Jan Pedersen, Francine ChenResearch and Development in Information Retrieval ? The Columbia MultiDocument Summarizer for DUC 2020 K. McKeown, D. Evans, A. Nenkova, R. Barzilay, V. Hatzivassiloglou, B. Schiffman, S. BlairGoldensohn, J. Klavans, S. Sigelman, Columbia University QA systems Question Answering from text ? An idea originating from the IR munity ? With massive collections of fulltext documents, simply finding relevant documents is of limited use: we want answers from textbases ? QA: give the user a (short) answer to their question, perhaps supported by evidence. People want to ask questions? Examples from AltaVista query log who invented surf music? how to make stink bombs where are the snowdens of yesteryear? which english translation of the bible is used in official catholic liturgies? how to do clayart how to copy psx how tall is the sears tower? Examples from Excite query log (12/1999) how can i find someone in texas where can i find information on puritan religion? what are the 7 wonders of the world how can i eliminate stress What vacuum cleaner does Consumers Guide remend Around 12– 15% of query logs The Google answer 1 ? Include question words etc. in your stoplist ? Do standard IR ? Sometimes this (sort of) works: ? Question: ? What famed English site is found on Salisbury Plain? The Google answer 2 ? Take the question and try to find it as a string on the web ? Return the next sentence on that web page as the answer ? Works brilliantly if this exact question appears as a FAQ question, etc. ? Works lousily most of the time ? But a slightly more sophisticated version of this approach has been revived in recent years with considerable success? A Brief (Academic) History ? Question answering systems can be found in many areas of NLP research, including: ? Natural language database systems ? A lot of early NLP work on these (., LUNAR) ? Spoken dialog systems ? Currently very active and mercially relevant ? The focus on opendomain QA is fairly new ? MURAX (Kupiec 1993): Encyclopedia answers ? Hirschman: Reading prehension tests ? TREC QA petition: 1999– AskJeeves ? AskJeeves is probably most hyped example of “ Question answering” ? It largely does pattern matching to match your question to their own knowledge base of questions ? If that works, you get the humancurated answers to that known question ? If that fails, it falls back to regular web search ? A potentially interesting middle ground, but a fairly weak shadow of real QA Online QA Examples ? Examples ? LCC: ? AnswerBus is an opendomain question answering system: ? Ionaut: ? EasyAsk, AnswerLogic, AnswerFriend, Start, Quasm, Mulder, Webclopedia, etc. Question Answering at TREC ? Question answering petition at TREC consists of answering a set of 500 factbased questions, ., “When was Mozart born?”. ? For the first three years systems were allowed to return 5 ranked answer snippets (50/250 bytes) to each question. ? IR