【正文】
ve approach would be to sequentially scan each file for the given word or phrase. This approach has a number of flaws, the most obvious of which is that it doesn’t scale to larger file sets or cases where files are very large. This is where indexing es in: To search large amounts of text quickly, you must first index that text and convert it into a format that will let you search it rapidly, eliminating the slow sequential scanning process. This conversion process is called indexing, and its output is called an index. You can think of an index as a data structure that allows fast random access to words stored inside it. The concept behind it is analogous to an index at the end of a book, which lets you quickly locate pages that discuss certain topics. In the case of Lucene, an index is a specially designed data structure, typically stored on the file system as a set of index files. We cover the structure of index files in detail in appendix B, but for now just think of a Lucene index as a tool that allows quick word lookup. What is searching? Searching is the process of looking up words in an index to find documents where they appear. The quality of a search is typically described using precision and recall metrics. Recall measures how well the search system finds relevant documents, whereas precision measures how well the system filters out the irrelevant documents. However, you must consider a number of other factors when thinking about searching. We already mentioned speed and the ability to quickly search large quantities of text. Support for single and multiterm queries, phrase queries, wildcards, result ranking, and sorting are also important, as is a friendly syntax for entering those queries. Lucene’s powerful software library offers a number of search features, bells, and whistles— so many that we had to spread our search coverage over three chapters (chapters 3, 5, and 6). 。2 it was initially available for download from its home at the SourceFe web site. It joined the Apache Software Foundation’s Jakarta family of high quality open source Java products in September 2020. With each release since then, the project has enjoyed increased visibility, attracting more users and developers. As of July 2020, Lucene version has been released, with a bug fix release in early October. Table shows Lucene’s release history. Table Lucene’s release history Version Release date Milestones 1 March 2020 First open source release (Source Fe) October 2020 1b June 2020 Last Source Fe release June 2020 First Apache Jakarta release December 20