【正文】
Research Note listed data mining and artificial intelligence at the top of the five key technology areas that will clearly have a major impact across a wide range of industries within the next 3 to 5 years.2 Gartner also listed parallel architectures and data mining as two of the top 10 new technologies in which panies will invest during the next 5 years. According to a recent Gartner HPC Research Note, With the rapid advance in data capture, transmission and storage, largesystems users will increasingly need to implement new and innovative ways to mine the aftermarket value of their vast stores of detail data, employing MPP [massively parallel processing] systems to create new sources of business advantage ( probability).3 The most monly used techniques in data mining are: ? Artificial neural works: Nonlinear predictive models that learn through training and resemble biological neural works in structure. ? Decision trees: Treeshaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) . ? Geic algorithms: Optimization techniques that use processes such as geic bination, mutation, and natural selection in a design based on the concepts of evolution. ? Nearest neighbor method: A technique that classifies each record in a dataset based on a bination of the classes of the k record(s) most similar to it in a historical dataset (where k 179。t know the answer. For example, say that you are the director of marketing for a telemunications pany and you39。 1). Sometimes called a knearest neighbor technique. nonlinear model An analytical model that does not assume linear relationships in the coefficients of the variables being studied. OLAP Online analytical processing. Refers to arrayoriented database applications that allow users to view, navigate through, manipulate, and analyze multidimensional databases. outlier A data item whose value falls outside the bounds enclosing most of the other corresponding values in the sample. May indicate anomalous data. Should be examined carefully。t. For instance, if you were looking for a sunken Spanish galleon on the high seas the first thing you might do is to research the times when Spanish treasure had been found by others in the past. You might note that these ships often tend to be found off the coast of Bermuda and that there are certain characteristics to the ocean currents, and certain routes that have likely been taken by the ship’s captains in that era. You note these similarities and build a model that includes the characteristics that are mon to the locations of these sunken treasures. With these models in hand you sail off looking for treasure where your model indicates it most likely might be given a similar situation in the past. Hopefully, if you39。t know or what is going to happen next? The technique that is used to perform these feats in data mining is called modeling. Modeling is simply the act of building a model in one situation where you know the answer and then applying it to another situation that you don39。 for example, a multidimensional sales database might include the dimensions Product, Time, and City. exploratory data analysis The use of graphical and descriptive statistical techniques to learn about the structure of a dataset. geic algorithms Optimization techniques that use processes such as geic bination, mutation, and natural selection in a design based on the concepts of natural evolution. linear model An analytical model that assumes linear relationships in the coefficients of the variables being studied. linear regression A statistical technique used to find the bestfitting linear relationship between a target (dependent) variable and its predictors (independent variables). logistic regression A linear regression that predicts the proportions of a categorical target variable, such as type of customer, in a population. multidimensional database A database designed for online analytical processing. Structured as a multidimensional hypercube with one axis per dimension. multiprocessor puter A puter that includes multiple processors connected by a work. See parallel processing. nearest neighbor A technique that classifies each record in a dataset based on a bination of the classes of the k record(s) most similar to it in a historical dataset (where k 179。d like to acquire some new long distance phone customers. You could just randomly go out and mail coupons to the general population just as you could randomly sail the seas looking for sunken treasure. In neither case would you achieve the results you desired and of course you have the opportunity to do much better than random you could use your business experience stored in your database to build a model. As the marketing director you have access to a lot of information about all of your customers: their age, sex, credit history and long distance calling usage. The good news is that you also have a lot of information about your prospective customers: their age, sex, credit history etc. Your problem is that you don39。An Introduction to Data Mining Discovering hidden value in your data warehouse Overview Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help panies focus on the most important information in t