Alexander G. Schwing

Home

Projects

Publications

Teaching

Links

Science, like art, is not a copy of nature
but a re-creation of her.

(Jacob Bronowski)

3D Indoor Scene Understanding

This project subsumes approaches for predicting the 3D Layout of indoor scenes given a single input image. We present

1. an approach based on graphical models
2. a method employing branch and bound to predict the global optimum
3. an algorithm for inferring an object within a scene

For inference and learning we use:
Distributed Convex Belief Propagation
Structured Prediction

Some reconstructions of 3D room layouts given a single input image are illustrated in the following video:

video: Efficient Structured Prediction for 3D Indoor Scene Understanding

play / pause

play position

00:00 / 00:00

volume control

zoomin / zoomout

1. Efficient Structured Prediction for 3D Indoor Scene Understanding

by: A.G. Schwing, T. Hazan, M. Pollefeys and R. Urtasun

Existing approaches to indoor scene understanding formulate the problem as a structured prediction task focusing on estimating the 3D bounding box which best describes the scene layout. Unfortunately, these approaches utilize high order potentials which are computationally intractable and rely on ad-hoc approximations for both learning and inference. In this paper we show that the potentials commonly used in the literature can be decomposed into pairwise potentials by extending the concept of integral images to geometry. As a consequence no heuristic reduction of the search space is required. In practice, this results in large improvements in performance over the state-of-the-art, while being orders of magnitude faster.

Preprint:

2. Efficient Exact Inference for 3D Indoor Scene Understanding

by: A.G. Schwing and R. Urtasun

In this paper we propose the first exact solution to the problem of estimating the 3D room layout from a single image. This problem is typically formulated as inference in a Markov random field, where potentials count image features (e.g., geometric context, orientation maps, lines in accordance with vanishing points) in each face of the layout. We present a novel branch and bound approach which splits the label space in terms of candidate sets of 3D layouts, and efficiently bounds the potentials in these sets by restricting the contribution of each individual face. We employ integral geometry in order to evaluate these bounds in constant time, and as a consequence, we not only obtain the exact solution, but also in less time than approximate inference tools such as message-passing. We demonstrate the effectiveness of our approach in two benchmarks and show that our bounds are tight, and only a few evaluations are necessary.

Preprint:

3. Box In the Box: Joint 3D Layout and Object Reasoning from Single Images

by: A.G. Schwing, S. Fidler, M. Pollefeys and R. Urtasun

In this paper we propose an approach to jointly infer the room layout as well as the objects present in the scene. Towards this goal, we propose a branch and bound algorithm which is guaranteed to retrieve the global optimum of the joint problem. The main difficulty resides in taking into account occlusion in order to not over-count the evidence. We introduce a new decomposition method, which generalizes integral geometry to triangular shapes, and allows us to bound the different terms in constant time. We exploit both geometric cues and object detectors as image features and show large improvements in 2D and 3D object detection over state-of-the-art deformable part-based models.

Preprint: