TöRF: Time-of-Flight Radiance Fields for Dynamic Scene View Synthesis

TöRF = ToF + NeRF. Pronounced just like ‘Turf’.

Neural networks can represent and accurately reconstruct radiance fields for static 3D scenes (e.g., NeRF). Several works extend these to dynamic scenes captured with monocular video, with promising performance. However, the monocular setting is known to be an under-constrained problem, and so methods rely on data-driven priors for reconstructing dynamic content. We replace these priors with measurements from a time-of-flight (ToF) camera, and introduce a neural representation based on an image formation model for continuous-wave ToF cameras. Instead of working with processed depth maps, we model the raw ToF sensor measurements to improve reconstruction quality and avoid issues with low reflectance regions, multi-path interference, and a sensor's limited unambiguous depth range. We show that this approach improves robustness of dynamic scene reconstruction to erroneous calibration and large motions, and discuss the benefits and limitations of integrating RGB+ToF sensors that are now available on modern smartphones.

Paper

Novel-view synthesis (NVS) is a long-standing problem in computer graphics and computer vision, where the objective is to photorealistically render images of a scene from novel viewpoints. Given a number of images taken from different viewpoints, it is possible to infer both the geometry and appearance of a scene, and then use this information to synthesize images at novel camera poses.

The key idea behind TöRF is to tackle the NVS problem by taking advantage of the depth sensors available on many consumer devices (phones, tablets, laptops). Specifically, we use both the RGB images captured with regular cameras and the depth information captured with time-of-flight (ToF) cameras to optimize for a neural radiance field. Because the depth images produced by ToF cameras can be unreliable, TöRF instead models the raw data captured by a continuous-wave ToF camera (referred to as phasor images), which leads to better view synthesis results.

Illustration of Time-of-Flight Radiance Fields. (a) We move a handheld imaging system around a dynamic scene, capturing (b) color images and (c) raw phasor images with a continuous-wave ToF camera. (d) Then, we optimize for a continuous neural radiance field of the scene that predicts the captured color and phasor images. This allows novel view synthesis.

Resolving phase wrapping

Using raw phasor supervision allows us to reconstruct depth ranges that otherwise wrap around in the derived depth.

The Photocopier sequence below shows incorrect derived depth from ToF on the far wall to the right, but our approach manages to better reconstruct the object. Using derived depth in VideoNeRF produces large errors (video far right).

Low-reflectance noise

Depth values become unreliable (noisy) when the amount of light reflected to the camera is small. Modeling the phasor images directly makes the solution more robust to sensor noise.

The DeskBox sequence below on the back of the monitor shows noisy derived depth from ToF (bright yellow sparkles), but our approach manages to better reconstruct the object.

Better handling of multi-path interference

In ToF, the light detected may not travel along a single path; this results in mixtures of phasors, which can result in phase values that do not correspond to a single depth. This can occur near depth edges as well as for specular reflections and transparent surfaces. We model the response from multiple single-scattering events along a ray to provide better handling over such scenarios.

The Cupboard sequence below shows improved reconstruction of specular surfaces such as the fridge on the right, which should have depth values that reflect the surrounding environment.

Please visit https://github.com/breuckelen/torf.

Bibtex

@article{attal2021torf,
  title={T{\"o}RF: Time-of-Flight Radiance Fields for Dynamic Scene View Synthesis},
  author={Attal, Benjamin and Laidlaw, Eliot and Gokaslan, Aaron and Kim, Changil and Richardt, Christian and Tompkin, James and O'Toole, Matthew},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

TöRF: Time-of-Flight Radiance Fields for Dynamic Scene View Synthesis | NeurIPS 2021

TöRF = ToF + NeRF. Pronounced just like ‘Turf’.

Abstract

Video

Key idea

Benefits of using raw phasor supervision vs. derived depth

Resolving phase wrapping

Low-reflectance noise

Better handling of multi-path interference

Results

Baseline

Code and data

Bibtex

Sponsors