Gaussian Splatting: Papers #8

Here are the latest papers related to Gaussian Splatting! 🤘

Gaussian Splatting
8 min readMay 24, 2024

If you are a researcher eager to present your work on Gaussian Splatting, please use the ‘Present Paper’ form on www.gaussian-splatting.org or connect directly with Felix on LinkedIn: https://www.linkedin.com/in/f3lix/

Gaussian Time Machine: A Real-Time Rendering Methodology for Time-Variant Appearances

Gaussian Time Machine: A Real-Time Rendering Methodology for Time-Variant Appearances [PDF]

by Licheng Shen, Ho Ngai Chow, Lingyun Wang, Tong Zhang, Mengqiu Wang, Yuxing Han

2024–05–22

Recent advancements in neural rendering techniques have significantly enhanced the fidelity of 3D reconstruction.

Notably, the emergence of 3D Gaussian Splatting (3DGS) has marked a significant milestone by adopting a discrete scene representation, facilitating efficient training and real-time rendering. Several studies have successfully extended the real-time rendering capability of 3DGS to dynamic scenes.

However, a challenge arises when training images are captured under vastly differing weather and lighting conditions. This scenario poses a challenge for 3DGS and its variants in achieving accurate reconstructions. Although NeRF-based methods (NeRF-W, CLNeRF) have shown promise in handling such challenging conditions, their computational demands hinder real-time rendering capabilities.

In this paper, we present Gaussian Time Machine (GTM) which models the time-dependent attributes of Gaussian primitives with discrete time embedding vectors decoded by a lightweight Multi-Layer-Perceptron (MLP). By adjusting the opacity of Gaussian primitives, we can reconstruct visibility changes of objects. We further propose a decomposed color model for improved geometric consistency.

GTM achieved state-of-the-art rendering fidelity on 3 datasets and is 100 times faster than NeRF-based counterparts in rendering. Moreover, GTM successfully disentangles the appearance changes and renders smooth appearance interpolation.

DoGaussian: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus

DoGaussian: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus [PDF]

by Yu Chen, Gim Hee Lee

2024–05–22

The recent advances in 3D Gaussian Splatting (3DGS) show promising results on the novel view synthesis (NVS) task.

With its superior rendering performance and high-fidelity rendering quality, 3DGS is excelling at its previous NeRF counterparts. The most recent 3DGS method focuses either on improving the instability of rendering efficiency or reducing the model size. On the other hand, the training efficiency of 3DGS on large-scale scenes has not gained much attention.

In this work, we propose DoGaussian, a method that trains 3DGS distributedly. Our method first decomposes a scene into K blocks and then introduces the Alternating Direction Method of Multipliers (ADMM) into the training procedure of 3DGS. During training, our DoGaussian maintains one global 3DGS model on the master node and K local 3DGS models on the slave nodes. The K local 3DGS models are dropped after training and we only query the global 3DGS model during inference.

The training time is reduced by scene decomposition, and the training convergence and stability are guaranteed through the consensus on the shared 3D Gaussians. Our method accelerates the training of 3DGS by 6+ times when evaluated on large-scale scenes while concurrently achieving state-of-the-art rendering quality.

Our project page is available at https://aibluefisher.github.io/DoGaussian

D-MiSo: Editing Dynamic 3D Scenes using Multi-Gaussians Soup

D-MiSo: Editing Dynamic 3D Scenes using Multi-Gaussians Soup [PDF]

by Joanna Waczyńska, Piotr Borycki, Joanna Kaleta, Sławomir Tadeja, Przemysław Spurek

2024–05–23

Over the past years, we have observed an abundance of approaches for modeling dynamic 3D scenes using Gaussian Splatting (GS).

Such solutions use GS to represent the scene’s structure and the neural network to model dynamics. Such approaches allow fast rendering and extracting each element of such a dynamic scene. However, modifying such objects over time is challenging. SC-GS (Sparse Controlled Gaussian Splatting) enhanced with Deformed Control Points partially solves this issue.

However, this approach necessitates selecting elements that need to be kept fixed, as well as centroids that should be adjusted throughout editing. Moreover, this task poses additional difficulties regarding the re-productivity of such editing.

To address this, we propose Dynamic Multi-Gaussian Soup (D-MiSo), which allows us to model the mesh-inspired representation of dynamic GS. Additionally, we propose a strategy of linking parameterized Gaussian splats, forming a Triangle Soup with the estimated mesh. Consequently, we can separately construct new trajectories for the 3D objects composing the scene. Thus, we can make the scene’s dynamic editable over time or while maintaining partial dynamics.

RoGS: Large Scale Road Surface Reconstruction based on 2D Gaussian Splatting

RoGS: Large Scale Road Surface Reconstruction based on 2D Gaussian Splatting [PDF]

by Zhiheng Feng, Wenhua Wu, Hesheng Wang

2024–05–23

Road surface reconstruction plays a crucial role in autonomous driving, which can be used for road lane perception and autolabeling tasks.

Recently, mesh-based road surface reconstruction algorithms show promising reconstruction results. However, these mesh-based methods suffer from slow speed and poor rendering quality. In contrast, the 3D Gaussian Splatting (3DGS) shows superior rendering speed and quality. Although 3DGS employs explicit Gaussian spheres to represent the scene, it lacks the ability to directly represent the geometric information of the scene.

To address this limitation, we propose a novel large-scale road surface reconstruction approach based on 2D Gaussian Splatting (2DGS), named RoGS. The geometric shape of the road is explicitly represented using 2D Gaussian surfels, where each surfel stores color, semantics, and geometric information. Compared to Gaussian spheres, the Gaussian surfels align more closely with the physical reality of the road. Distinct from previous initialization methods that rely on point clouds for Gaussian spheres, we introduce a trajectory-based initialization for Gaussian surfels.

Thanks to the explicit representation of the Gaussian surfels and a good initialization, our method achieves a significant acceleration while improving reconstruction quality. We achieve excellent results in the reconstruction of road surfaces in a variety of challenging real-world scenes.

TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing

TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing [PDF]

by Teng Xu, Jiamin Chen, Peng Chen, Youjia Zhang, Junqing Yu, Wei Yang

2024–05–23

Editing objects within a scene is a critical functionality required across a broad spectrum of applications in computer vision and graphics.

As 3D Gaussian Splatting (3DGS) emerges as a frontier in scene representation, the effective modification of 3D Gaussian scenes has become increasingly vital. This process entails accurately retrieving the target objects and subsequently performing modifications based on instructions. Though available in pieces, existing techniques mainly embed sparse semantics into Gaussians for retrieval, and rely on an iterative dataset update paradigm for editing, leading to over-smoothing or inconsistency issues.

To this end, this paper proposes a systematic approach, namely TIGER, for coherent text-instructed 3D Gaussian retrieval and editing. In contrast to the top-down language grounding approach for 3D Gaussians, we adopt a bottom-up language aggregation strategy to generate a denser language embedded 3D Gaussians that supports open-vocabulary retrieval. To overcome the over-smoothing and inconsistency issues in editing, we propose a Coherent Score Distillation (CSD) that aggregates a 2D image editing diffusion model and a multi-view diffusion model for score distillation, producing multi-view consistent editing with much finer details.

In various experiments, we demonstrate that our TIGER is able to accomplish more consistent and realistic edits than prior work.

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes [PDF]

by Ruiyuan Gao, Kai Chen, Zhihao Li, Lanqing Hong, Zhenguo Li, Qiang Xu

2024–05–23

While controllable generative models for images and videos have achieved remarkable success, high-quality models for 3D scenes, particularly in unbounded scenarios like autonomous driving, remain underdeveloped due to high data acquisition costs.

In this paper, we introduce MagicDrive3D, a novel pipeline for controllable 3D street scene generation that supports multi-condition control, including BEV maps, 3D objects, and text descriptions. Unlike previous methods that reconstruct before training the generative models, MagicDrive3D first trains a video generation model and then reconstructs from the generated data. This innovative approach enables easily controllable generation and static scene acquisition, resulting in high-quality scene reconstruction.

To address the minor errors in generated content, we propose deformable Gaussian splatting with monocular depth initialization and appearance modeling to manage exposure discrepancies across viewpoints. Validated on the nuScenes dataset, MagicDrive3D generates diverse, high-quality 3D driving scenes that support any-view rendering and enhance downstream tasks like BEV segmentation. Our results demonstrate the framework’s superior performance, showcasing its transformative potential for autonomous driving simulation and beyond.

Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras

Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras [PDF]

by Hanzhang Tu, Ruizhi Shao, Xue Dong, Shunyuan Zheng, Hao Zhang, Lili Chen, Meili Wang, Wenyu Li, Siyan Ma, Shengping Zhang, Boyao Zhou, Yebin Liu

2024–05–23
In this paper, we present a low-budget and high-authenticity bidirectional telepresence system, Tele-Aloha, targeting peer-to-peer communication scenarios.

Compared to previous systems, Tele-Aloha utilizes only four sparse RGB cameras, one consumer-grade GPU, and one autostereoscopic screen to achieve high-resolution (2048x2048), real-time (30 fps), low-latency (less than 150ms), and robust distant communication.

As the core of Tele-Aloha, we propose an efficient novel view synthesis algorithm for the upper body. Firstly, we design a cascaded disparity estimator for obtaining a robust geometry cue. Additionally, a neural rasterizer via Gaussian Splatting is introduced to project latent features onto the target view and decode them into a reduced resolution. Further, given the high-quality captured data, we leverage a weighted blending mechanism to refine the decoded image into the final resolution of 2K.

Exploiting world-leading autostereoscopic display and low-latency iris tracking, users are able to experience a strong three-dimensional sense even without any wearable head-mounted display device. Altogether, our telepresence system demonstrates the sense of co-presence in real-life experiments, inspiring the next generation of communication.

--

--

No responses yet