Gaussian Splatting: Papers #12
Here are the latest papers related to Gaussian Splatting! 🤘
Learn more and sign up here: https://zoom.us/webinar/register/WN_M9fAz91JR56L-BbwdPv5Sw#/registration
Adversarial Generation of Hierarchical Gaussians for 3D Generative Model
Adversarial Generation of Hierarchical Gaussians for 3D Generative Model [PDF]
by Sangeek Hyun, Jae-Pil Heo
2024–06–05
Most advances in 3D Generative Adversarial Networks (3D GANs) largely depend on ray casting-based volume rendering, which incurs demanding rendering costs. One promising alternative is rasterization-based 3D Gaussian Splatting (3D-GS), providing a much faster rendering speed and explicit 3D representation.
In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its efficient and explicit characteristics. However, in an adversarial framework, we observe that a naïve generator architecture suffers from training instability and lacks the capability to adjust the scale of Gaussians. This leads to model divergence and visual artifacts due to the absence of proper guidance for initialized positions of Gaussians and densification to manage their scales adaptively.
To address these issues, we introduce a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians. Specifically, we design a hierarchy of Gaussians where finer-level Gaussians are parameterized by their coarser-level counterparts; the position of finer-level Gaussians would be located near their coarser-level counterparts, and the scale would monotonically decrease as the level becomes finer, modeling both coarse and fine details of the 3D scene.
Experimental results demonstrate that ours achieves a significantly faster rendering speed (x100) compared to state-of-the-art 3D consistent GANs with comparable 3D generation capability.
Project page: https://hse1032.github.io/gsgan
Event3DGS: Event-based 3D Gaussian Splatting for Fast Egomotion
Event3DGS: Event-based 3D Gaussian Splatting for Fast Egomotion [PDF]
by Tianyi Xiong, Jiayi Wu, Botao He, Cornelia Fermuller, Yiannis Aloimonos, Heng Huang, Christopher A. Metzler
2024–06–05
The recent emergence of 3D Gaussian splatting (3DGS) leverages the advantage of explicit point-based representations, which significantly improves the rendering speed and quality of novel-view synthesis. However, 3D radiance field rendering in environments with high-dynamic motion or challenging illumination conditions remains problematic in real-world robotic tasks.
The reason is that fast egomotion is prevalent in real-world robotic tasks, which induces motion blur, leading to inaccuracies and artifacts in the reconstructed structure. To alleviate this problem, we propose Event3DGS, the first method that learns Gaussian Splatting solely from raw event streams. By exploiting the high temporal resolution of event cameras and explicit point-based representation, Event3DGS can reconstruct high-fidelity 3D structures solely from the event streams under fast egomotion.
Our sparsity-aware sampling and progressive training approaches allow for better reconstruction quality and consistency. To further enhance the fidelity of appearance, we explicitly incorporate the motion blur formation process into a differentiable rasterizer, which is used with a limited set of blurred RGB images to refine the appearance.
Extensive experiments on multiple datasets validate the superior rendering quality of Event3DGS compared with existing approaches, with over 95% lower training time and faster rendering speed in orders of magnitude.
LPM: Localized Gaussian Point Management
Localized Gaussian Point Management [PDF]
by Haosen Yang, Chenhao Zhang, Wenqing Wang, Marco Volino, Adrian Hilton, Li Zhang, Xiatian Zhu
2024–06–06
Point management is a critical component in optimizing 3D Gaussian Splatting (3DGS) models, as the point initiation (e.g., via structure from motion) is often distributionally inappropriate. Typically, the Adaptive Density Control (ADC) algorithm is applied, leveraging view-averaged gradient magnitude thresholding for point densification, opacity thresholding for pruning, and regular all-points opacity reset.
However, we reveal that this strategy is limited in tackling intricate or special image regions (e.g., transparent areas) as it is unable to identify all the 3D zones that require point densification and lacks an appropriate mechanism to handle ill-conditioned points with negative impacts, such as occlusion due to false high opacity.
To address these limitations, we propose a Localized Point Management (LPM) strategy, capable of identifying error-contributing zones in the highest demand for both point addition and geometry calibration. Zone identification is achieved by leveraging the underlying multiview geometry constraints, guided by image rendering errors. We apply point densification in the identified zones while resetting the opacity of points residing in front of these regions to create a new opportunity to correct ill-conditioned points.
Serving as a versatile plugin, LPM can be seamlessly integrated into existing 3D Gaussian Splatting models. Experimental evaluation across both static 3D and dynamic 4D scenes validates the efficacy of our LPM strategy in boosting a variety of existing 3DGS models both quantitatively and qualitatively. Notably, LPM improves both vanilla 3DGS and SpaceTimeGS to achieve state-of-the-art rendering quality while retaining real-time speeds, outperforming on challenging datasets such as Tanks & Temples and the Neural 3D Video Dataset.
Project page: https://surrey-uplab.github.io/research/LPM/
🤘 Join the Gaussian Splatting ecosystem 🤘
Discord: https://discord.gg/qVuNpxT4Pq
LinkedIn: https://www.linkedin.com/company/gaussian-splatting/
Collaborate: www.gaussian-splatting.org
A Survey on 3D Human Avatar Modeling — From Reconstruction to Generation
A Survey on 3D Human Avatar Modeling — From Reconstruction to Generation [PDF]
by Ruihe Wang, Yukang Cao, Kai Han, Kwan-Yee K. Wong
2024–06–06
3D modeling has long been an important area in computer vision and computer graphics. Recently, thanks to breakthroughs in neural representations and generative models, we have witnessed rapid development in 3D modeling. 3D human modeling, lying at the core of many real-world applications such as gaming and animation, has attracted significant attention.
Over the past few years, a large body of work on creating 3D human avatars has been introduced, forming a new and abundant knowledge base for 3D human modeling. The scale of the literature makes it difficult for individuals to keep track of all the works. This survey aims to provide a comprehensive overview of these emerging techniques for 3D human avatar modeling, from both reconstruction and generation perspectives.
Firstly, we review representative methods for 3D human reconstruction, including methods based on pixel-aligned implicit function, neural radiance field, and 3D Gaussian Splatting, among others. We then summarize representative methods for 3D human generation, especially those using large language models like CLIP, diffusion models, and various 3D representations, which demonstrate state-of-the-art performance.
Finally, we discuss our reflections on existing methods and open challenges for 3D human avatar modeling, shedding light on future research.
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image [PDF]
by Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, João F. Henriques, Christian Rupprecht, Andrea Vedaldi
2024–06–06
In this paper, we propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image that is both highly generalizable and efficient. For generalizability, we start from a “foundation” model for monocular depth estimation and extend it to a full 3D shape and appearance reconstructor. For efficiency, we base this extension on feed-forward Gaussian Splatting.
Specifically, we predict a first layer of 3D Gaussians at the predicted depth and then add additional layers of Gaussians that are offset in space. This allows the model to complete the reconstruction behind occlusions and truncations. Flash3D is very efficient, trainable on a single GPU in a day, making it accessible to most researchers.
It achieves state-of-the-art results when trained and tested on RealEstate10k. When transferred to unseen datasets like NYU, it outperforms competitors by a large margin. More impressively, when transferred to KITTI, Flash3D achieves better PSNR than methods trained specifically on that dataset. In some instances, it even outperforms recent methods that use multiple views as input.
Code, models, demo, and more results are available at Flash3D Project Page