【正文】
Data into a form you can use (MR) ? Picking Canopy Centers (MR) ? Assign Data Points to Canopies (MR) ? Pick KMeans Cluster Centers ? KMeans algorithm (MR) ?Iterate! Data Massage ? This isn’t interesting, but it has to be done. Selecting Canopy Centers Assigning Points to Canopies KMeans Map Iterating KMeans Elbow Criterion ? Choose a number of clusters . adding a cluster doesn’t add interesting information. ? Rule of thumb to determine what number of Clusters should be chosen. ? Initial assignment of cluster seeds has bearing on final model performance. ? Often required to run clustering several times to get maximal performance Conclusions ? Clustering is slick ? And it can be done super efficiently ? And in lots of different ways