【正文】
參考文獻(xiàn) [1]. Barbara, D., GarciaMolina, H. and Porter, D. “The Management of Probabilistic Data,” IEEE Transactions on Knowledge and Data Engineering, 4(5), 1992. [2]. Bezdek, J. C. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York(1981). [3]. Cheng, R., Kalashnikov, D., and Prabhakar, S. “Evaluating Probabilistic Queries over Imprecise 不確定性數(shù)據(jù)挖掘:一種新的研究方向 10 Data,”Proceedings of the ACM SIGMOD International Conference on Management of Data, June 2021. [4]. Cheng, R., Kalashnikov, D., and Prabhakar, S. “Querying Imprecise Data in Moving Object Environments,”IEEE Transactions on Knowledge and Data Engineering, 16(9) (2021) 11121127. [5]. Cheng, R., Xia, X., Prabhakar, S., Shah, R. and Vitter, J. “Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data,” Proceedings of VLDB, 2021. [6]. de Souza, R. M. C. R. and de Carvalho, F. de A. T. “Clustering of Interval Data Based on City–Block Distances,” Pattern Recognition Letters, 25 (2021) 353–365. [7]. Dunn, J. C. “A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact WellSeparated Clusters,” Journal of Cyberics, 3 (1973) 3257. [8]. Hamdan, H. and Govaert, G. “Mixture Model Clustering of Uncertain Data,” IEEE International Conference on Fuzzy Systems (2021) 879884. [9]. Ichino, M., Yaguchi, H. “Generalized Minkowski Metrics for Mixed Feature Type Data Analysis,” IEEE Transactions on Systems, Man and Cyberics, 24(4) (1994) 698–708. [10]. Jain, A. and Dubes, R. Algorithms for Clustering Data. Prentice Hall, New Jersey (1988). [11]. Nilesh N. D. and Suciu, D. “Efficient Query Evaluation on Probabilistic Databases,” VLDB (2021) 864875. [12]. Pfoser D. and Jensen, C. “Capturing the Uncertainty of Movingobjects Representations,” Proceedings of the SSDBM Conference, 123–132, 1999. [13]. Ruspini, E. H. “A New Approach to Clustering,” Information Control, 15(1) (1969) 2232. [14]. Sato, M., Sato, Y., and Jain, L. Fuzzy Clustering Models and Applications. PhysicaVerlag, Heidelberg(1997). [15]. Wolfson, O., Sistla, P., Chamberlain, S. and Yesha, Y. “Updating and Querying Databases that Track Mobile Units,” Distributed and Parallel Databases, 7(3), 1999. [16]. Yeung, K. and Ruzzo, W. “An Empirical Study on Principal Component Analysis for Clustering Gene Expression Data,” Bioinformatics, 17(9) (2021) 763774. Uncertain Data Mining: A New Research Direction 1 Uncertain Data Mining: A New Research Direction Michael Chau1, Reynold Cheng2, and Ben Kao3 1: School of Business, The University of Hong Kong, Pokfulam, Hong Kong 2: Department of Computing, Hong Kong Polytechnic University Kowloon, Hong Kong 3: Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong Emails: , , Abstract Data uncertainty is often found in realworld applications due to reasons such as imprecise measurement, outdated sources, or sampling errors. Recently, much research has been published in the area of managing data uncertainty in databases. We propose that when data mining is performed on uncertain data, data uncertainty has to be considered in order to obtain high quality data mining results. We call this the Uncertain Data Mining problem. In this paper, we present a framework for possible research directions in this area. We also present the UKmeans clustering algorithm as an example to illustrate how the traditional Kmeans algorithm can be modified to handle data uncertainty in data mining. 1. Introduction Data is often associated with uncertainty because of measurement inaccuracy, sampling discrepancy,outdated data sources, or other errors. This is especially true for applications that require interaction with the physical world, such as locationbased services [15] and sensor monitoring [3]. For example,in the scenario of moving objects (such as vehicles or people), it is impossible for the database to track the exact locations of all objects at all time instants. Therefore, the location of each object is associated with uncertainty between updates [4]. These various sources of uncertainty have to be considered in order to produce accurate query and mining results. In recent years, there has been much research on the management of uncertain data in databases, such as the representation of uncertainty in databases and querying data with uncertainty. However, little research work has addressed the issue of mining uncertain data. We note that with uncertainty, data values are no longer atomic. To apply Uncertain Data Mining: A New Research Direction 2 traditional data mining techniques, uncertain data has to be summarized into atomic values. Taking movingobject applications as an example again, the location of an object can be summarized either by its last recorded location, or by an expected location (if the probability distribution of an object’s location is taken into account). Unfortunately, discrepancy in the summarized recorded values and the actual values could seriously affect the quality of the mining results. Figure 1 illustrates this problem when a clustering algorithm is applied to moving objects with location uncertainty. Figure 1(a) shows the actual locations of a set of objects, and Figure 1(b) shows the recorded location of these objects, which are already outdated. The clusters obtained from these outdated values could be significantly different from those obtained as if the actual locations were available (Figure 1(b)). If we solely rely on the recorded values, many objects could possibly be put into wrong clusters. Even worse, each member of a cluster would change the cluster centroids, thus resulting in more errors. Figure 1 Figure 1. (a) The realworld data are partitioned into three clusters (a, b, c). (b) The recorded locations of some objects (shaded) are not the