【正文】
of interest points and regions is the search for spots and areas in an image which exhibit a predefined property making them special in relation to their local neighborhood. This property should make the region distinguishable from its neighborhood and detectable repeatedly. Furthermore, the detection of these features should be—to the best possible—illumination and viewpoint invariant. The first important interest point detector, the socalled Harris Corner detector, was proposed in 1988 by Harris and Stephens. It exhibits excellent repeatability and was subsequently used for object recognition purposes by Schmid and Mohr. An extension to the Harris detector to include scale information was later reported by Mikolajczyk and Schmid as Harris–Laplace detector and was used by Schaffalitzky and Zisserman formultiview matching of unordered image sets. Another approach to detect bloblike image structure is to search points where the determinant of the Hessian matrix assumes a local extreme um, which is called the Hessian detector. Further developments to include affine covariance resulted in the Harris–Affine and Hessian–Affine detectors proposed by Mikolajczyk, Mikolajczyk and Schmid. The currently most popular twopart approach known as scale invariant feature transform (SIFT) was proposed by Lowe, where the first part is an interest point detector. The DoG detector takes the differences of Gaussian blurred images as an approximation of the scale normalized Laplacian and uses the local maximum of the responses in scale space as an indicator for a keypoint. A plementary feature detector, the maximally stable extremal regions (MSER) detector, was proposed by Matas et al. In short, the MSER detector searches for regions which are brighter or darker than their surroundings, ., are surrounded by darker, viceversa brighter pixels. First, pixels are sorted in ascending or descending order of their intensity value, depending on the region type to be detected. The pixel array is sequentially fed into a unionfind algorithm and a treelike shaped data structure is maintained, whereas the nodes contain information about pixel neighborhoods, as well as information about intensity value relationships. Finally, nodes which satisfy a set of predefined criteria are sought by a treetraversing algorithm. Two affine covariant region detectors were proposed by Tuytelaars and Van Gool, intensitybased regions (IBR) and edgebased regions (EBR). IBRs are based on extrema in intensity. Given a local intensity extremum, the brightness function along rays emanating from the extremum is studied. This function itself exhibits an extremum at locations where the image intensity suddenly changes. Linking all points of the emanating rays corresponding to this extremum forms and IBR. EBRs are determined from corner points and edges nearby. Given a single corner point and walking along the edges in opposite directions with two more control points, a onedimensional class of parallelograms is introduced using the corner itself and the vectors pointing from the corner to the control points. Studying a function of texture and using additional constraints, a single parallelogram is selected to be an EBR. Another algorithm, termed Salient Region detector was proposed by Kadir et al. and is based on the probability density function (PDF) of intensity values puted over an elliptical region. For each pixel, the entropy extrema for an ellipse centered at this pixel is recorded over the ellipse parameter’s orientation, h, scale s and the ratio of major to minor axis k. From a sorted list of all region candidates the n most salient ones are chosen. For an extensive evaluation of a large number of affine region detectors refer to the work of. Generally speaking, a descriptor is an abstract characterization of an image patch. Usually, the image patch is chosen to be the local environment of an interest region. Based on various algorithms methods or transformations, the resulting character can be made rotation invariant or, at least partially, insensitive to affine transformations. Most approaches are based on gradient calculations or image brightness values. As a second part of the SIFT approach, Lowe proposed the use of descriptors based on stacked gradient histograms. The single histograms are calculated in a subdivided patch describe the gradient orientation in order to cover spati