【文章內(nèi)容簡(jiǎn)介】
examples. Go to step 2.Some examples of nonfaces that are collected during training are shown in Fig. 5. Note that some of the examples resemble faces, although they are not very close to the positive examples shown in Fig. 4. The presence of these examples forces the neural network to learn the precise boundary between face and nonface images. We used 120 images of scenery for collecting negative examples in the bootstrap manner described above. A typical training run selects approximately8000 nonface images from the 146,212,178 subimages that are available at all locations and scalesin the training scenery images. A similar training algorithm was described in [5], where at each iteration an entirely new network was trained with the examples on which the previous networks had made mistakes. Stage Two: Merging Overlapping Detections and ArbitrationThe examples in Fig. 3 showed that the raw output from a single network will contain a number of false detections. In this section, we present two strategies to improve the reliability of the detector: merging overlapping detections from a single network and arbitrating among multiple networks. Merging Overlapping DetectionsNote that in Fig. 3, most faces are detected at multiple nearby positions or scales, while false detec tions often occur with less consistency. This observation leads to a heuristic which can eliminate many false detections. For each location and scale, the number of detections within a specified neighborhood of that location can be counted. If the number is above a threshold, then that lo cation is classified as a face. The centroid of the nearby detections defines the location of the detection result, thereby collapsing multiple detections. In the experiments section, this heuristic will be referred to as “thresholding”.If a particular location is correctly identified as a face, then all other detection locations whichoverlap it are likely to be errors, and can therefore be eliminated. Based on the above heuristic regarding nearby detections, we preserve the location with the higher number of detections withina small neighborhood, and eliminate locations with fewer detections. In the discussion of the experiments, this heuristic is called “overlap elimination”. There are relatively few cases in which this heuristic fails。 however, one such case is illustrated by the left two faces in Fig. 3B, where one face partially occludes another.The implementation of these two heuristics is illustrated in Fig. 6. Each detection at a particularlocation and scale is marked in an image pyramid, labelled the “output” pyramid. Then, each location in the pyramid is replaced by the number of detections in a specified neighborhood of that location. This has the effect of “spreading out” the detections. Normally, the neighborhood extends an equal number of pixels in the dimensions of scale and position, but for clarity in Fig. 6 detections are only spread out in position. A threshold is applied to these values, and the centroids (in both position and scale) of all above threshold regions are puted. All detections contributing to a centroid are collapsed down to a single point. Each centroid is then examined in order, starting from the ones which had the highest number of detections within the specified neighborhood. If any other centroid locations represent a face overlapping with the current centroid, they are removed from the output pyramid. All remaining centroid locations constitute the final detection result. In the face detection work described in [3], similar observations about the nature of the outputs were made, resulting in the development of heuristics similar to those described above. Arbitration among Multiple NetworksTo further reduce the number of false positives, we can apply multiple networks, and arbitrate between their outputs to produce the final decision. Each network is trained in a similar manner, but with random initial weights, random initial nonface images, and permutations of the order of presentation of the scenery images. As will be seen in the next section, the detection and false positive rates of the individual networks will be quite close. However, because of different training conditions and because of selfselection of negative training examples, the networks will have different biases and will make different errors. the implementation of arbitration is illustrated in Fig. 7. Each detection at a particular position and scale is recorded in an image pyramid, as was done with the previous heuristics. One way to bine two such pyramids is by ANDing them. This strategy signals a detection only if both networks detect a face at precisely the same scale and position. Due to the different biases of the individual networks, they will rarely agree on a false detection of a face. This allows ANDing to eliminate most false detections. Unfortunately, this heuristic can decrease the detection rate because a face detected by only one network will be thrown out. However, we will see later that individual networks can all detect roughly the same set of faces, so that the number of faces lost due to ANDing is small.Similar heuristics, such as ORing the outputs of two networks, or voting among three networks,were also tried. Each of these arbitration methods can be applied before or after the “thresholding” and “overlap elimination” heuristics. If applied afterwards, we bine the centroid locations rather than actual detection locations, and require them to be within some neighborhood of one another rather than precisely aligned.Arbitration strategies such as ANDing, ORing, or voting see