Activities

Interleaved Object Detection and Segmentation.

We have developed a local-feature based recognition approach which considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate towards a common goal. The resulting Implicit Shape Model (ISM) algorithm can learn the characteristics of a new object category already from a relatively small number of training examples, recognize previously unseen instances of that category in novel test images, and automatically segment them from the background, even under scale changes and partial occlusion. In more recent work, we improved this approach through numerous extensions for multi-cue integration, recognition from multiple viewpoints, and multi-category discrimination.

Pedestrian Detection in Crowded Scenes.

Pedestrian detection has become a key technology for many applications and is therefore attracting great commercial interest. In our research, we have developed a state-of-the-art pedestrian detector based on the ISM representation that is particularly suitable for detection in crowded scenes and under strong occlusion.

Coupled Detection and Tracking.

We have developed a novel approach for multi-object tracking that connects object detection and spacetime trajectory estimation in a coupled optimization framework motivated by MDL model selection. Building on the output of an object detector, our approach searches at each time instant for the optimal set of spacetime trajectories which provides the best explanation for the current image and for all evidence collected so far, while satisfying the physical constraint that no two objects may occupy the same space at the same time. The resulting approach can initialize automatically and track a large and varying number of objects through complex scenes with clutter, occlusions, and large-scale background changes. In addition, the model selection framework allows our approach to retrospectively adapt the data association and thus recover from mismatches and temporarily lost tracks.

Object Detection and Tracking for Mobile Applications.

Combining the above capabilities, we have developed a mobile vision system for localizing other traffic participants (cars, pedestrians, bicyclists) in a vehicle’s field-of-view and for tracking them over time. Particular emphasis went into making the system robust for operation in highly dynamic inner-city environments, such as a busy pedestrian zone. Mounted onto a child stroller or robotic platform, our system can reliably detect and track a large number of interacting pedestrians for dynamic obstacle avoidance. Mounted on a car, the system could one day be used in driver assistance systems for car safety.

Combining Recognition and Reconstruction for 3D City Modeling.

In order to support the large-scale reconstruction of entire city areas for visualization purposes, we developed methods for exact localization of parked and moving cars from recorded video streams of a survey vehicle. Such objects pose a problem for automatic 3D city modeling, since they are difficult to reconstruct due to dynamic motion, specularities on their surfaces, and partial occlusion. In addition, they defy the simplifying geometry assumptions that can else be used in order to speed up large-scale reconstruction. Through the combination with object recognition and 3D pose estimation, our approach could automatically remove the disturbing objects from the reconstruction and replace them by virtual placeholder models, leading to a considerable reduction in the number of visible reconstruction artifacts and thus a visually more pleasing look of the final model.

Large-Scale Mining of Landmark Buildings from Community Photo Collections.

One of our target applications is visual search from mobile phones. A user sends pictures from his mobile phone camera to an automatic recognition server in order to obtain additional information about objects and buildings in his surroundings. For such an application to become practical, it becomes important to tap the vast amount of publicly available data available from internet sources and mine it for usable content. Our approach mines community photo collections (such as Flickr) for geotagged photos of entire cities at a time, extracts landmark buildings and places through visual recognition techniques, and automatically links them to corresponding Wikipedia pages. The resulting annotated clusters then serve as a basis for automatically geolocating additional, novel images.

In addition, we have made important contributions in the following areas.

Fast 3D Scanning and Surface Registration.
Efficient Feature Mining, Clustering, and Matching for Object Recognition.
Object Categorization.
Object Recognition from Range Images.