【正文】
附錄 A 英文原文 Scene recognition for mine rescue robot localization based on vision CUI Yian(崔益安 ), CAI Zixing(蔡自興 ), WANG Lu(王 璐 ) Abstract: A new scene recognition system was presented based on fuzzy logic and hidden Markov model(HMM) that can be applied in mine rescue robot localization during emergencies. The system uses monocular camera to acquire omnidirectional images of the mine environment where the robot locates. By adopting centersurround difference method, the salient local image regions are extracted from the images as natural landmarks. These landmarks are anized by using HMM to represent the scene where the robot is, and fuzzy logic strategy is used to match the scene and landmark. By this way, the localization problem, which is the scene recognition problem in the system, can be converted into the evaluation problem of HMM. The contributions of these skills make the system have the ability to deal with changes in scale, 2D rotation and viewpoint. The results of experiments also prove that the system has higher ratio of recognition and localization in both static and dynamic mine environments. Key words: robot location。 scene recognition。 salient image。 matching strategy。 fuzzy logic。 hidden Markov model 1 Introduction Search and rescue in disaster area in the domain of robot is a burgeoning and challenging subject[1]. Mine rescue robot was developed to enter mines during emergencies to locate possible escape routes for those trapped inside and determine whether it is safe for human to enter or not. Localization is a fundamental problem in this field. Localization methods based on camera can be mainly classified into geometric, topological or hybrid ones[2]. With its feasibility and effectiveness, scene recognition bees one of the important technologies of topological localization. Currently most scene recognition methods are based on global image features and have two distinct stages: training offline and matching online. During the training stage, robot collects the images of the environment where it works and processes the images to extract global features that represent the scene. Some approaches were used to analyze the dataset of image directly and some primary features were found, such as the PCA method [3]. However, the PCA method is not effective in distinguishing the classes of features. Another type of approach uses appearance features including color, texture and edge density to represent the image. For example, ZHOU et al[4] used multidimensional histograms to describe global appearance features. This method is simple but sensitive to scale and illumination changes. In fact, all kinds of global image features are suffered from the change of environment. LOWE [5] presented a SIFT method that uses similarity invariant descriptors formed by characteristic scale and orientation at interest points to obtain the features. The features are invariant to image scaling, translation, rotation and partially invariant to illumination changes. But SIFT may generate 1 000 or more interest points, which may slow down the processor dramatically. During the matching stage, nearest neighbor strategy(NN) is widely adopted for its facility and intelligibility[6]. But it cannot capture the contribution of individual feature for scene recognition. In experiments, the NN is not good enough to express the similarity between two patterns. Furthermo