Research project

The long-term goal of subproject B2 of the Sonderforschungsbereich 603 is the development of optimization-based methods for the integration of camera images during object classification, localization, and tracking. The representation of the knowledge about the system is always based on a probabilistic approach which is created and constantly updated by the fusion of all available information sources. The key to the quality of the estimates of object class, pose and position is the active approach to sensor data acquisition.

In the past, the majority of the effort was applied to the theoretical optimization of the methods for active view planning and active object tracking. Although the same information theoretic approaches were used, both parts of subproject B2 were mostly optimized independently of each other. With the advancing integration of object tracking and object classification in a common system, the mutual information exchange and thus the reciprocal reduction of uncertainty in both parts has been made possible.

Regarding the view planning for efficient object recognition, attention was payed to the handling of sparse object models, i.e. datasets consisting of just a few object views. With those models it is potentially unreasonable to immediately start with the main task of classification, since the underlying data might turn out to be unreliable for discriminating classes. Instead, the available object model is selectively augmented by a procedure called Active Learning, which again contains the process of view planning. Compared to the known view planning strategy used for recognition, we neither apply a training phase nor a probabilistic suggestion about object class and pose for the mentioned model augmentation. Rather, an optimal sensor movement is generated by utilizing uncertainty criteria of the current object model. Not losing track of our original objective, the decisive condition for model enhancement is a maximization of the expected classification ratio as an optimization term. To be able to reduce the required features, PCA-tranformed eigenraum features as well as Wavelet features have been evaluated.

Another aspect that in turn deals with active view planning using Reinforcement Learning is the comparison of state densities. The latter represent a probabilistic description of the current suggestion about an object's class and its pose relative to a camera. During view planning, a multiple calculation of distances between couples of those densities is necessary, making use of the Kullback-Leibler distance. Since state densities are composed of particles, a very time consuming Parzen estimation has to be performed for each comparison. So a procedure was implemented which gains a quick impression of two densities' similarity by applying easy to calculate measures, like mean or entropy. This way, most time consuming comparisons can be avoided beforehand. To save even more calculation time, methods were realized that can reduce the search space of the camera movement optimization or the number of particles needed to reliably represent a state density.

Work in the subfield of active object tracking was continued. The cameras used not only had electrically changable zoom lenses, but additionally pan-tilt-units, in order to expand their potential field of view. The linearization of the visibility tree and the evaluation with the sequential Kalman filter were adapted to the expanded problem. This allowed the optimal action selection to continue observing multiple time steps with a variable camera number, in real time.

Additionally, the connection between object tracking and object classification was expanded. Using simultaneous tracking and classification, an object could be tracked concurrently, and at the same time its class and pose were determined at a lowert frame rate. The tracking was accomplished with a color histogram comparison, while the classification used a wavelet decomposition or the comparison with an image generated from a light field. By tracking the object, the search space for the classification was reduced noticeably.

A new research topic in object tracking was the determination of sensor noise using the adaptive Kalman filter and several sensors (cameras). The sensor noise is an important component of action selection with the Kalman filter. The results still need to be included in the active optimal action selection, however.

In a student thesis (Diplomarbeit), the fusion of the information from moving and static sensors was examined. A robot (in this work of the type "Volksbot") moves through the scene with the goal of reaching a user-selected object and classifying it with a close-up view. The robot possesses an omnidirectional camera, in which the object can be tracked relative to the robot. Additionally, the scene is viewed by several static cameras which determine the object's and robot's position (but not the orientation of the robot) in a global coordinate system. The information of these two sensor system is fused to guide the robot to its goal. As was expected, this task is easily accomplished when using the omnidirectional camera. If this camera is deactivated, however, the robot can still be guided to the object using only the static, external cameras. One of the main problems in this case was determining the orientation of the robot, since this could not be directly measured.