Spatially-Varying Autofocus

Yingsi Qin

Matthew O'Toole

Carnegie Mellon University

ICCV 2025

Conventional Autofocus in today's cameras can only focus to a single depth, with a limited, planar depth of field.
Spatially-Varying Autofocus can autofocus independent pixel regions to any depth, enabling a freeform-shaped depth of field,
while maintaining a large aperture and the highest possible spatial resolution.

Publications

Yingsi Qin, Aswin C. Sankaranarayanan, and Matthew O'Toole. Spatially-Varying Autofocus. IEEE International Conference on Computer Vision (ICCV), 2025

Paper Supplementary Poster 5-Minute Video Code

Overview

An autofocused focal plane that can conform to any scene geometry

Spatially-varying autofocus to produce an optical all-in-focus image. Left: A conventional photo with a regular lens, where objects at a single focal plane appear sharp. Right: An all-in-focus photo captured through spatially-varying autofocusing. To achieve this, we combine (i) a programmable lens with spatially-varying control over focus, and (ii) a spatially-varying autofocus algorithm to drive the focus of this lens. Note that this is an optically-captured image of a real scene with no post-capture processing used.

Comparison of all-in-focus imaging techniques

Optical sharpness: Most methods either use a small effective aperture (increasing the amount of diffraction blur), or intentionally blur the photos (e.g., to create depth-invariant blur). Our approach forms all-in-focus images by bringing each scene point into focus optically, while maintaining a large aperture. # of images required: Our method requires at least one image to approximate the scene geometry, and a second image to form the all-in-focus image. Moreover, our method is well suited for dynamic settings, where each frame determines the focus for the next frame. All-in-focus generation: Unlike most techniques, our approach forms images using an all-optical process; no additional computational post-processing is required. Outputs depth: A useful byproduct of several methods is the ability recover a scene’s depth map.

Abstract

A lens brings a single plane into focus on a planar sensor; hence, parts of the scene that are outside this planar focus plane are resolved under defocus. Can we break this precept by enabling a “lens” that can change its depth of field arbitrarily? This work investigates the design and implementation of such a computational lens with spatially- selective focusing. Our design uses an optical arrangement of a Lohmann lens and a phase-only spatial light modulator to allow each pixel to focus at a different depth. We extend classical autofocusing techniques to the spatially-varying scenario where the depth map is iteratively estimated using contrast and disparity cues, enabling the camera to progressively shape its depth-of-field to the scene’s depth. By obtaining an all-in-focus image optically, our technique advances upon prior work in two key aspects: the ability to bring an entire scene in focus simultaneously, and the ability to maintain the highest possible spatial resolution.

Results

Static Scenes Interactive Results (click on image)

For each scene, we provide interactive visualizations of the conventional photo, the Phase-Detection Autofocus (PDAF) all-in-focus photo, the Contrast-Detection Autofocus (CDAF) all-in-focus photo, and the focus-stacking all-in-focus photo, as well as the CDAF and PDAF autofocusing progressions.

Static Scenes Freeform Depth-of-Field Results

Thin structure removal example. The character stands in front of a thin wire mesh, far away from the lions print background.

Conventional: The character is in focus but the wire mesh is visible.
Ours: Our proposed prototype optically removes the wire mesh by focusing the locations of the wire mesh to the far background. The large defocus blur of the wire mesh, therefore, makes it hardly visible.

Freeform depth-of-field focusing examples. The scene has a vertically tilted depth.

Conventional: The depth-of-field is planar and the defocus scales vertically.
Tilt-shift focusing: We can spatially-vary focus to scale the defocus horizontally.
Selective focusing: We can select regions to be in focus while others to be defocused.

Dynamic Scenes Results

Spatially-Varying PDAF at 21 FPS.

We show a screen recording of performing spatially-varying autofocusing on dynamic scenes on a proof-of-concept prototype.

This is different from the original prototype with the Canon EOS R10 sensor whose bottleneck was the 0.3 FPS in reading dual-pixel (DP) images.

To overcome the lack of off-the-shelf solutions for streaming DP images, we modified a machine vision sensor to enable capturing, reading, and processing DP images at 21 FPS.

Methods

The Optics: How do we enable spatially-varying focusing for a camera?

Schematic of the Optics

The Split-Lohmann display is a recent near-eye 3D display technology that can simultaneously place individual pixel areas to different virtual depths, fully supporting the native focusing ability of the human eye.

We propose inverting the function of the Split-Lohmann display, by replacing the OLED display with a camera sensor and adding a camera lens. The result is a Split-Lohmann computational lens that now offers a camera the ability to spatially-vary its focus.

The Algorithm: How do we autofocus any scene?

Contrast-Detection Autofocus (CDAF)

The CDAF Autofocusing Progression

CDAF is one of the primary methods used by digital cameras to focus a lens. The approach involves adjusting the focus settings of the lens until the camera detects the highest contrast (usually at a few select locations).

We extend CDAF to its spatially-varying counterpart by identifying an independent focus parameter for every superpixel that maximizes its image contrast, and then using the depth map to drive the focus of the camera lens.

Phase-Detection Autofocus (PDAF)

The PDAF Autofocusing Progression

PDAF is an alternate technique that is commonly available in cameras with a dual-pixel (DP) sensor. When a scene point is in focus, the two images captured by the corresponding sub-pixels match. Otherwise, disparity is introduced between the two views. The signed disparity determines the lens focus for the scene point.

Similarly, we extend PDAF to its spatially-varying counterpart to drive the focus of the camera lens.

Since a DP image pair provides both the magnitude and direction for focusing, PDAF requires only a single image to identify the spatially-varying focus map; this allows it to adapt to scene dynamics and is less likely to get stuck in local minima.

The Prototype Camera

The benchtop prototype camera that can perform both spatially-varying CDAF and PDAF. It uses the HOLOEYE GAEA2 Spatial Light Modulator that has a resolution of 3840x2160 pixels with a 3.74 μm pixel pitch, and the Canon EOS R10 dual-pixel sensor that captures dual-pixel images with a 3.72 μm pixel pitch.

Citation

@inproceedings{qin2025spatially,
      author = {Qin, Yingsi and Sankaranarayanan, Aswin C. and O'Toole, Matthew},
      title = {Spatially-Varying Autofocus},
      year = {2025},
      publisher = {IEEE},
      url = {https://imaging.cs.cmu.edu/svaf/static/pdfs/Spatially_Varying_Autofocus.pdf},
      booktitle = {2025 IEEE/CVF International Conference on Computer Vision (ICCV)},
      isbn = {},
      language = {eng},
      pages = {},
      keywords = {All-in-Focus Imaging, Extended Depth of Field, Autofocus Algorithms, Computational Imaging}
      }