【正文】
iijij ffTF m a x/?ii nNI D F /lo g 2?8 Similarity Applications Many Webmining problems can be expressed as finding “similar” sets: Plagiarism/Mirror Pages/Articles from the Same Source/Duplication Remove Collaborative Filtering as a SimilarSets Problem Remend to users items that were liked by other users who have exhibited smilar tastes Measurement ?Edit distance ?Short text, words ?For personal text ?Jaccard distance ?Long text, ignoring the word similarity ?For government text Microsoft Academic Search PK /Author/ ./A