【文章內容簡介】
dy been solved in the past. Help desks, which assist in clarifying the questions a customer has about purchased products, are one practical usage of this type of procedure. While some panies use help desks to support their telephone hotlines, others give their customers direct access through a remote data transfer. Data mining can be very valuable in this context because it consolidates the information gathered in thousands of individual historical cases into key findings. The advantage of this procedure is the shorter process of searching for precedents which can be used to answer the current customer’s MethodsThere are many different types of methods to analyze and classify data. Some mon methods include cluster analysis,Bayesian inference as well as inductive learning. Cluster analysis can be used based on numerical measures as well as in the form of conceptual clustering.The structures of data mining systems are very different by nature. The following configuration, however, is very mon:jThe analysis method, which identifies and analyzes patterns, forms the core of the input can include ponents such as raw data, information from adata dictionary, knowledge of the usage scenario, or user entries to narrow the search output enpasses the found measures, rules or information which are presented to the user in an appropriate form, incorporated into the system as new knowledge or integrated into an expert system. Cluster analysisWhether in its traditional form or as conceptual clustering, cluster analysis attempts to divide or bine a set number of objects into groups based on the proximity that exists among these objects.The clusters are grouped so that there are large similarities among the objects of a class as well as large dissimilarities among the objects of different classes. Traditional cluster analysisRegardless of the scaling level of the object variables, there are multiple ways to measure the similarity and difference of the proximity. Basic examples include the Euclidean (i. e. the square root of the total squared difference) and Manhattan differences (i. e. the sum of the absolute differences of individual variables). In general, we can examine metric, nominal as well as mixed data sets by varying the proximity measure.When objects have different types of attributes, for example, Kaufman and Rousseeuw remend calculating a difference of 0 for the individual nominal attributes when the values are the same,and a difference of when they are different. In the case of metric variables, we first need to establish the difference among the object standardize them we then divide them by the maximum result is a difference between 0 then calculate the total difference between two object vectors as the sum of the individual differences (Kaufman and Rousseeuw 990).We can use this type of measure (eventually extended by the weight of an individual attribute) to cluster data sets in grossmargin analysis. These contain nominal attributes (e. g. product, customer, region)as well as numerical measures (revenues or gross margin).There is a general differentiation between the partitional and hierarchical classification methods. Simply put, partitional methods try to iteratively minimize the heterogeneity of a given initial allotment of objects into clusters. Hierarchical methods, which are practically significant,take a pletely different approach. Initially, each object is located in its own cluster. The objects, however, are then bined successively so that only the smallest level of homogeneity is lost in each can easily present the resulting hierar chy of nested clusters in a socalled dendrogram. Conceptual clusteringAs described above, traditional forms of cluster analysis can identify groups of similar objects but cannot describe these classes beyond a simple list of the individual objects. The objective of many usage scenarios, however, is to characterize the existing structures that are buried among the volumes of data. Instead of representing object classes through simply listing their objects, conceptual clusters intentionally describe them using terms which classify the individual objects through rules. A group of these rules forms a socalled concept.A basic example of a concept is a program that automatically and logically links individual attribute values. Advanced systems can even establish concepts and concept hierarchies with classification rules.The different concepts in partitional methods of conceptual clustering pete with each other. Ultimately, we have to choose the clustering concept that best meets the performance criteria for a specific method. Some performance criteria include the simplicity of the concept (based on the number of attributes involved) or the discriminatory power (as the number of variables that have values do not overlap beyond the different object classes.)Similar to traditional cluster analysis, there are also hierarchical techniques that form classification trees in a topdown approach. As described above, the best classification in terms of performance criteria will take place on each level of the tree. The process ends when no further improvement is possible from one tree4 Critical factorsThe following section outlines some problems associated with data mining. In our opinion, these critical factors for success will form the foundation for future research and development. Efficiency of algorithms Regarding the efficiency of data mining algorithms, we should consider the following times are a key factor. If the calculation times of algorithms grow faster than the linear dependency of the squared number of data records to be searched, we co