The goal of the project, which is carried out at the Graduate Research Center "3D Image Analysis and Synthesis", is the classification and localization of 3D objects in images. An appearance-based approach is applied. We don't use any segmentation process, which detects geometric features like edges or corners. 2-D local feature vectors are determined directly from pixel intensities in gray level images. They are computed using a wavelet transformation.
The components of the feature vectors are statistically modeled as normal distributed. In this way illumination changes and noises can be handled.In real scenes and applications objects could be placed on
heterogeneous background or be partly occluded. This is why
we introduced a separate background model. In the recognition phase the decision is made, which feature vectors belong to the object and which to the background. The components of the background vectors are then modeled using the uniform distribution.
A so called global assignment function makes it possible to
recognize more than one object in a scene. The number of
objects in an image is unknown. A special abort criterion
decides, when the searching process ends. The finding of an
efficient working abort criterion is very important.
The learning process begins with the image acquisition of
all possible object classes in many known poses. In the
laboratory environment, the images are taken with a special
setup with turntable and camera arm. In real problems of
object recognition in images, it is much easier to record
the objects using a hand-held camera. For this reason we
propose a new approach for object recognition, where the
image acquisition is done in this way. The poses of the
objects in all training frames are computed using a
so called structure-from-motion algorithm. The image acquisition process is, therefore, much easier, but we have to deal with an additional training inaccuracy.
In order to evaluate the object recognition system we
took a very large image data base 3D-REAL-ENV. With
more than 30000 training images and more than 8000
test images with real heterogeneous background different
algorithms can be objective compared. The illumination
in the test images is different from the illumination
in the training phase.
In the last time the color modeling of the object is introduced. We use 6-D local feature vectors in this case,
whereas the wavelet transformation is performed separately for the red, the green, and the blue channel. The classification rate improved in this way from 55.4% (gray level modeling) to 87.3% (color modeling). The localization rate increased from 69.0% (gray level modeling) to 77.1% (color modeling).