【正文】
spam/not spam. None of the above—this is not a machine learning problem. 學(xué)習(xí)的種類 想象一下,一個智能體或機(jī)器收集到一系列的傳感輸入( sensory inputs) :x1 , x2, x3, x4, . . . Supervised learning(監(jiān)督學(xué)習(xí)) : The machine is also given desired outputs y1 , y2, . . ., and its goal is to learn to produce the correct output given a new input. Unsupervised learning(無監(jiān)督學(xué)習(xí)) : outputs y1 , y2, . . . Not given, the agent still wants to build a model of x that can be used for reasoning, decision making, predicting things, municating etc. Semisupervised learning (半監(jiān)督學(xué)習(xí)) Representing “objects” in machine learning ? 舉個實例 , x, represents a specific object ? x 通常表示一個 d維的特征向量 x = (x1, . . . , xd) ∈ Rd ? 其中每一個維度叫做 feature or attribute ? 特征值是連續(xù)的或離散的 ? x 在 d維的特征空間中是一個點 ? 目標(biāo)抽象化 . 忽略其他方面 (., two people having the same weight and height may be considered identical) Feature vector representation 特征向量表示法 ? 文本文件 詞匯 of size d (~100,000) “bag of words”: 對每個詞條的計數(shù) 通常忽略掉 stopwords: the, of, at, in, … 特別的,用 “outofvocabulary” (OOV) 來捕捉 所有未知的詞 特征向量表示法 ? 圖像 像素 ,