Software - Computer Vision

Detecting Watermarks, Timestamps, and Frames (WTFs)

A common problem of computer vision applications that are based on Internet photos are false-positive matches caused by Watermarks, Timestamps, and Frames (WTFs) superimposed on the image content. If a WTF is present in two otherwise unrelated images, the pair is often falsely considered a match by local-feature based image matching, because WTFs cause spatially coherent local feature matches even though the images show different objects. This can in turn hurt computer vision applications such as image retrieval or large-scale Structure-from-Motion that require reliable image matching as a building block.

Project Page

Real-Time RGB-D based People Detection and Tracking

On this web page, we provide code for RGB-D based people tracking, as used in our ICRA'14 paper. When using this software for your own research, please acknowledge the effort that went into its construction by citing the corresponding paper.

Project Page

GroundHOG - GPU-based Object Detection with Geometric Constraints

In this project, we have developed an algorithm to integrate geometric constraints directly into the design of sliding-window object detectors. The benefit of this approach is best evaluated after other speedups, such as parallelization, have been applied. We therefore ported the original HOG (as described by N. Dalal) to the GPU employing NVidia's CUDA architecture in a highly optimized implementation.

Project Page

Implicit Shape Model (ISM) detector code

On this web page, we provide binaries and example codebooks for the Implicit Shape Model (ISM) detector, as used in our IJCV'08 paper. The code archive consists of pre-compiled executables, including all required libraries. In order to facilitate experiments, we also provide several pre-trained detectors.

Project Page

Efficient CNN for Human Pose Estimation

In recent years, human pose estimation has greatly benefited from deep learning and huge gains in performance have been achieved on on the well-known FLIC, LSP, and MPII benchmarks. The trend to maximize the accuracy on benchmarks, however, resulted in computationally expensive deep network architectures that require expensive hardware and pre-training on large datasets. In this work, we propose an efficient deep network architecture that can be efficiently trained on mid-range GPUs without the need of any pre-training and that is on par with much more complex models on the benchmarks.

Project Page

Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes

View Code on GitHub to learn more.

Project Page

Combined Image- and World-Space Tracking in Traffic Scenes

View Code on GitHub to learn more.

Project Page