StereoGS

Sparse-View 3D Gaussian Splatting via Stereo Priors

ECCV 2026
1 South China University of Technology
* Corresponding author

TL;DR

StereoGS brings stereo priors into sparse-view 3D Gaussian Splatting, enforcing absolute scale and cross-view consistency through virtual stereo pairs, a gradient-aware opacity decay, and a zero-shot MVS initialization. It achieves state-of-the-art results on LLFF, DTU, Mip-NeRF360 and Blender — with zero inference overhead compared to vanilla 3DGS.

Stereo Depth Regularization

Virtual stereo pairs + FoundationStereo deliver absolute-scale, cross-view-consistent depth supervision.

Gradient-Aware Opacity Decay

Exponential soft-thresholding of relative opacity gradients prunes floaters while preserving surface Gaussians.

Dense MVS Initialization

Zero-shot MVSAnywhere + cross-view reprojection filtering yields a dense, view-consistent point cloud.

StereoGS framework overview

Framework overview. StereoGS comprises three components: (a) Stereo Depth Regularization synthesizes virtual stereo pairs to extract stereo depth, enforcing explicit geometric consistency between paired views during optimization. (b) Gradient-Aware Opacity Decay dynamically penalizes Gaussian opacities according to their gradients, removing redundant primitives and mitigating overfitting. (c) Consistency-Aware Dense Initialization leverages multi-view depth priors and geometric filtering to construct a dense, reliable geometric foundation.

Abstract

3D Gaussian Splatting (3DGS) has achieved remarkable success in real-time novel view synthesis, yet it suffers from severe overfitting under sparse-view settings due to insufficient geometric constraints. While recent methods introduce monocular depth priors to mitigate this, they inherently struggle with scale ambiguity and cross-view inconsistency, leading to defective geometry.

In this paper, we propose StereoGS, a novel sparse-view 3DGS framework that integrates stereo priors to establish reliable cross-view consistency. Unlike scale-agnostic monocular constraints, StereoGS introduces a Stereo Depth Regularization by constructing virtual stereo pairs during optimization and leveraging a foundation stereo model to enforce absolute scale and cross-view-consistent structures.

To further suppress overfitting and eliminate redundant primitives, we design a Gradient-Aware Opacity Decay strategy that dynamically penalizes Gaussians based on their relative opacity gradient magnitudes. Combined with a Consistency-Aware Dense Initialization using zero-shot multi-view depth estimation, StereoGS effectively anchors primitives to accurate scene surfaces. Extensive experiments on LLFF, DTU, Mip-NeRF360, and Blender datasets demonstrate that StereoGS achieves state-of-the-art performance in sparse-view settings without incurring any additional inference overhead.

Method

Three training-time components that turn monocular-cue fragile sparse-view 3DGS into a scale-aware, cross-view-consistent reconstruction pipeline.

A

Stereo Depth Regularization

For each training view treated as the left camera, we synthesize a virtual right camera via horizontal translation and render the corresponding right-view image from the current 3D Gaussians. Feeding the ground-truth left image and the rendered right view into FoundationStereo produces a left-view disparity that we convert to depth Zstereo = fd/D̂l and supervise against the rendered depth in inverse-depth space.

To prevent noisy priors from corrupting geometry, we filter pixels with a left-right consistency check, a background mask, and a disparity anomaly mask, fused into a final validity mask Mvalid. The resulting loss reads

Ldepth = ‖ Mvalid ⊙ (1/Ẑ − 1/Zstereo) ‖1

Unlike photometric warping, this directly pushes Gaussian primitives toward scale-accurate and cross-view-consistent geometry.

B

Gradient-Aware Opacity Decay

We argue that an opacity gradient magnitude inherently reflects a Gaussian's contribution to reconstruction: large gradient → important; negligible gradient → redundant floater. Because raw opacity gradients are tiny (∼10−6), we adopt a relative gradient β = g/ḡ (inspired by GRPO in DeepSeekMath) and use an exponential soft-threshold to produce the dynamic decay factor

γ = 1 − (1 − γbase) · exp(−s · β),   α̂ = γ · α

Gaussians with below-average gradients are heavily penalized (background floaters get pruned), while above-average valid surfaces are quickly retained. Default: γbase = 0.99, s = 0.5.

C

Consistency-Aware Dense Initialization

Sparse SfM points are insufficient under few-view conditions. We instead run the zero-shot MVS model MVSAnywhere on every training view (as target) with the remaining views as sources to obtain a set of cross-view-consistent depth maps. Cross-view reprojection errors are then used to filter outliers, and the filtered depths are back-projected and fused into a dense, reliable point cloud as the Gaussian initialization.

The strong zero-shot generalization of MVSAnywhere makes this initialization significantly denser and cleaner than PDCNet+ and MVSFormer on in-the-wild scenes.

Visual Comparisons

Novel view synthesis and rendered depth on four standard benchmarks. Zoom in for details.

LLFF  3 views · 1/8 resolution

Visual comparison on LLFF

On fern and trex, vanilla 3DGS shows severe patchy artifacts; monocular-depth methods (DNGaussian, FSGS) recover rough geometry but fail on fine structure. MVPGS and Binocular3DGS still produce artifacts in textureless regions. StereoGS reconstructs clean surface skeletons (e.g., the trex) and crisp depth.

DTU  3 views · 1/4 resolution

Visual comparison on DTU

DNGaussian, FSGS, and MVPGS produce blurry results; Binocular3DGS shows textureless-region artifacts due to its self-supervised photometric loss. StereoGS delivers sharp details and accurate depth on all three scenes.

Mip-NeRF360  12–24 views · 1/8 resolution

Visual comparison on Mip-NeRF360

On the bicycle and stump scenes, baseline methods (FSGS, CoR-GS, DropGaussian) exhibit prominent artifacts in large textureless grass areas. StereoGS preserves the high-frequency details and produces the most coherent depth maps.

Blender  8 views · 1/2 resolution

Visual comparison on Blender

On the synthetic Blender dataset, StereoGS consistently outperforms all baselines across PSNR/SSIM/LPIPS, demonstrating its capability in object-centric sparse-view reconstruction.

Quantitative Results

StereoGS achieves state-of-the-art PSNR / SSIM / LPIPS across LLFF, DTU, Mip-NeRF360 and Blender under every sparse-view configuration tested. Ours* denotes the variant with a fixed dropout rate of 0.3.

LLFF (3 / 6 / 9 views)

Method3-view
PSNR↑
6-view
PSNR↑
9-view
PSNR↑
3DGS16.0219.4521.13
DNGaussian19.1222.1823.17
FSGS20.3124.2025.32
CoR-GS20.4524.4926.06
MVPGS20.5423.6424.23
DropGaussian20.7624.7426.21
NexusGS21.07
Binocular3DGS21.4424.8726.17
D2GS21.3524.84
StereoGS (Ours)21.9124.9226.25
StereoGS (Ours*)22.0525.4026.44

DTU (3 / 6 / 9 views)

Method3-view
PSNR↑
6-view
PSNR↑
9-view
PSNR↑
3DGS10.9920.3322.90
DNGaussian18.9122.1023.94
FSGS17.3421.5524.33
CoR-GS19.2124.5127.18
MVPGS20.6523.9826.45
NexusGS20.21
Binocular3DGS20.7124.3126.70
StereoGS (Ours)21.4624.8626.83
StereoGS (Ours*)22.0025.4127.39

Mip-NeRF360 (12 / 24 views)

Method12-view
PSNR↑
24-view
PSNR↑
3DGS18.5222.80
FSGS18.8023.28
CoR-GS19.5223.39
DropGaussian19.7424.05
D2GS20.0924.13
StereoGS (Ours)20.2524.18
StereoGS (Ours*)20.5124.25

Blender (8 views)

MethodPSNR↑SSIM↑LPIPS↓
3DGS23.200.8700.104
DNGaussian24.310.8860.088
FSGS24.640.8950.095
CoR-GS24.430.8960.084
Binocular3DGS24.710.8720.101
StereoGS (Ours)24.830.8990.081
StereoGS (Ours*)25.040.8990.078

Full PSNR/SSIM/LPIPS for every configuration is reported in the paper and supplementary material.

Ablation Studies

Each component contributes measurably; the full model achieves the best PSNR/SSIM/LPIPS.

Ablation visual comparison

Effect of incrementally adding Consistency-Aware Dense Initialization (CI), Stereo Depth Regularization (SDR), and Gradient-Aware Opacity Decay (GOD). CI gives a strong geometric foundation (depth holes still present in fine details); SDR fills the holes and smooths the background; GOD finally prunes the remaining floaters around foreground boundaries.

Component Ablation (LLFF / DTU, 3 views)

CISDRGOD LLFF
PSNR↑
LLFF
SSIM↑
LLFF
LPIPS↓
DTU
PSNR↑
DTU
SSIM↑
DTU
LPIPS↓
16.020.4650.37810.990.5850.313
19.750.6910.21514.100.7860.196
17.320.5240.31712.460.6970.208
18.180.5690.29115.050.7510.202
18.960.6050.26217.660.7840.172
19.790.6950.21415.570.8120.166
21.180.7410.17119.760.8630.112
21.910.7730.15721.460.8790.099

Deeper Analysis

Initialization point clouds

Initialization point cloud comparison

Compared to sparse SfM and other learning-based matchers (PDCNet+, MVSFormer), MVSAnywhere produces the densest, most uniformly distributed, and structurally most complete point clouds across the fern, horns, fortress, and room scenes.

MVS model ablation

MVS model ablation

Swapping the initialization MVS model: SfM (sparse, 18.96 / 17.66 dB on LLFF / DTU) → PDCNet+ (20.10 / 19.81) → MVSFormer (21.08 / 20.71) → MVSAnywhere (21.91 / 21.46). MVSAnywhere's zero-shot generalization removes the pre-training domain gap and gives the best geometric foundation.

Stereo model ablation

Stereo model ablation

The Stereo Depth Regularization is a universal module: plugging in S2M2 or LiteAnyStereo also improves over the no-stereo baseline. FoundationStereo (default) yields the sharpest details and fewest artifacts.

Validity mask ablation

Validity mask ablation

Without Mvalid, noisy priors in occluded / textureless regions misguide the optimization, producing floaters and distorted geometry (e.g., the trex skeleton). Our mask keeps supervision strictly on reliable pixels.

Opacity decay strategies

Opacity decay visual comparison Opacity gradient statistics
Decay function curves

Left: Constant decay indiscriminately kills useful geometry; Step / Linear fail to penalize moderate-to-high gradient floaters. Top-right: decay function shapes. Bottom-left: raw opacity gradients are extremely small (∼10−6), which is why using the relative gradient β = g/ḡ is necessary.

When the exponential is applied to raw g instead of β, the exponential term exp(−s·g) ≈ 1.0 and the function degenerates to a constant (Exp-g ≈ 21.41 dB). Using the relative gradient β — inspired by the relative-advantage idea in GRPO (DeepSeekMath) — provides a scale-invariant normalization that lets our Exp-β reach 21.91 dB on LLFF 3-view.

Limitations & Failure Cases

Honest accounting of where StereoGS still struggles.

Failure cases

Top: Consistency-Aware Dense Initialization may fail on reflective objects on a specific view, because view-dependent reflections make MVS depths be filtered out by reprojection filtering. Bottom: Large textureless regions can make stereo matching ambiguous, producing erroneous stereo depth and inaccurate geometric constraints. Addressing these two failure modes is a promising direction for future work.

BibTeX

@inproceedings{yuan2026stereogs,
  title     = {{StereoGS}: Sparse-View 3D Gaussian Splatting via Stereo Priors},
  author    = {Yuan, Wenhao and Ge, Yiyuan and Cai, Deli},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026},
  url       = {https://github.com/StringerYwh00/StereoGS}
}