【正文】
values corresponding to the number of M support vectors are then () if we represent the model as (). This approach gives explicit links between the primal and the dual representation. However, the approximation is based on a random selection of the support vectors, but does not provide an algorithm for making a good selection of the support vectors. Active selection of support vectors In order to make a more suitable selection of the support vectors instead of a random selection, one can relate the Nystrom method to kernel principal ponent analysis, density estimation and entropy criteria, as discussed by Girolami in [94]. These links will be explained in more detail in the Chapter on unsupervised learning and support vector machine formulations to kernel PCA. In [94] an analysis is done of the quadratic Renyi entropy () in relation to kernel PCA and density estimation with () where 1v = [1。 One can then show that () where O (N, M) is the N block matrix taken from O (N, N). These insights are used then for solving in an approximate sense the linear system ()without bias term in the model, as considered in Gaussian process regression problems. By applying the ShermanMorrisonWoodbury formula [98] one obtains [294]: () where 2 are calculated from () base upon 2 from the small matrix. In LSSVM classification and regression one usually considers a bias term which leads to centering of the kernel matrix. For application of the Nystrom method the eigenvalue deposition of the centered kernel matrix is the taken. Finally, further characterizations of the error for approximations to a kernel matrix have been investigated in [206。 Where M1 and M2 are estimates to M1 and M2 respectively, for the integral equation, and uki denotes the kith entry of the matrix U. This can be understood form sampling the integral by M points x1, x2… , xm. For the big kernel matrix on has the eigenvalue deposition () Furthermore, as explained in [294] one has 。 Seeger in [294]. The method is related to finding a low rank approximation to the given kernel matrix by randomly choosing M rows/columns of the kernel matrix. Let us denote the big kernel matrix by (N, N) and the small kernel matrix based on the random subsample (M, M) with M N (In practice often M《 N). Consider the eigenvalue deposition of the small kernel matrix (M, M) () where B contains the eigenvalues and U the corresponding eigenvectors. This is related to eigenfunctions 1 and eigenvalues 2 of the integral equation( ) 。 第六章 大尺寸問題 在這一章節(jié)我們討論一些方法 ,為了解決 LSSVM 的大數(shù)據(jù)設(shè)定中的方法設(shè)計和歸類問題 . We explain Nystom method as proposed in the context of Gaussian processes and inplete Cholesky factorization for low rank approximation. Then a new technique of fixed size LSSVM is presented. In this fixed size LSSVM method one solves the primal problem instead of the dual, after estimating the map to the feature space B based upon the eigenfunctions obtained form kernel PCA, which is explained in more detail in the next Chapter. This method gives explicit links between function estimation and density estimation, exploits the primaldual formulations, and addresses the problem of how to actively select suitable support vectors instead of taking random points as in the Nystrom method. Next we explain methods that aim at constructing a suitable basis in the feature space. Furthermore, approaches for bining submodels are discussed such as mittee works and nonlinear and multilayer extensions of this approach. Low rank approximation methods Nystrom method Suppose one takes a linear kernel. We already mentioned that one can in fact equally well solve then the primal problem as the dual problem. In fact solving the primal problem is more advantageous for larger data sets wile solving the dual problem is more suitable for large dimensional input For linear support vector machines the dual problem is suitable for solving problems with large dimensional input spaces while the primal problem is convenient twords large data sets. However, for nonlinear SVMs one has no expression for B(x), as a result one can only solve the dual problem in terms of the related kernel function. In the method of Fixed Size LSSVM the Nystrom method is used to estimate eigenfunctions. After obtaining estimates for B(x) and linking primaldual formulations, the putation of W, B is done in the primal space. spaces because the unknowns are ERn and ERN, respectively, where n denotes the dimension for the input space and N the number of given training data points. for example in the linear function estimation case one has 公式 () by elimination of the error variables EK, which one can immediately solve. In this case the mapping B bees B(XK) = (XK) and there is no need to solve the dual problem in the support values A, certainly not for large data sets. For the nonlinear case on the other hand the situation is much more plicated. For many choices of the kernel, B (~) may bee infinite dimensional and hence also the W vector. However, one may still try in this case to find meaningful estimates for B (XK). A procedure to find such estimates is implicitly done by the Nystrom method, which is well known in the area of integral equations [14。 63] and has been successfully applied in the context of Gaussian processes by Williams amp。 as follows () 。 () 。 226]. The Nystrom method approach has been applied to the Bayesian LSSVM framework at the second level of inference while solving the level 1 problems without Nystrom method approximation by the conjugate gradient method [273]. In Table this is illustrated on three data sets cra( leptograpsus crab), rsy(Ripley synthetic data), hea (heart disease) according to [273]. In [273] it has also been illustrated that larger data sets such as the UCI adult data set, a successfu