LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Arxiv 2023

Jaeyoung Chung*, Suyoung Lee*, Hyeongjin Nam, Jaerin Lee, Kyoung Mu Lee

* Denotes equal contribution

Computer Vision Lab, Seoul National University  

Our method can generate navigatable 3D scenes out of a single text prompt of a single image.
Click and drag (navigate) / shift and scroll (zoom) to feel the 3D.

Abstract

With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model.

Our LucidDreamer has two alternate steps: Dreaming and Alignment. First, to generate multi-view consistent images from inputs, we set the point cloud as a geometrical guideline for each image generation. Specifically, we project a portion of point cloud to the desired view and provide the projection as a guidance for inpainting using the generative model. The inpainted images are lifted to 3D space with estimated depth maps, composing a new points. Second, to aggregate the new points into the 3D scene, we propose an aligning algorithm which harmoniously integrates the portions of newly generated 3D scenes. The finally obtained 3D scene serves as initial points for optimizing Gaussian splats. LucidDreamer produces Gaussian splats that are highly-detailed compared to the previous 3D scene generation methods, with no constraint on domain of the target scene.

Introducing LucidDreamer

Algorithm description of LucidDreamer

LucidDreamer maintains and expands its world model by recursive dreaming and alignment.

Dynamic Re-prompting

Intermeidate prompting for diverse control of scene genreation

LucidDreamer can accept a sequence of text prompts for scene generation, enabling fine-grained controls.

Perceptual Quality

Perceptual quality comparison

CLIP-based Quantitative comparison of generated scenes from images generated by Stable Diffusion. Wequantitatively compare the results using CLIP-Score and CLIP-IQA with RGBD2. For CLIP-IQA, we use quality, colorful, and sharp criteria. LucidDreamer shows dominating results on all metrics.

\[ \begin{array}{c|c|ccc} \hline \text{Models} & \text{CLIP-Score} \uparrow & \text{CLIP-IQA Quality} \uparrow & \text{CLIP-IQA Colorful} \uparrow & \text{CLIP-IQA Sharp} \uparrow \\ \hline \text{RGBD2} & 0.2035 & 0.1279 & 0.2081 & 0.0126 \\ \textbf{LucidDreamer} & \textbf{0.2110} & \textbf{0.6161} & \textbf{0.8453} & \textbf{0.5356} \\ \hline \end{array} \]

Reconstruction Quality

Reconstruction metrics of Gaussian splats according to the source of initial SfM points. We use the initial point cloud generated by COLMAP and compare the reconstruction results. Using our point cloud consistently shows better reconstruction metrics.

\[ \begin{array}{c|c|ccc} \hline \text{Iters} & \text{Source of SfM points} & \text{PSNR} \uparrow & \text{SSIM} \uparrow & \text{LPIPS} \downarrow \\ \hline 1000 & \text{COLMAP} & 23.15 & 0.7246 & 0.2910 \\ & \textbf{LucidDreamer} & \textbf{32.59} & \textbf{0.9672} & \textbf{0.0272} \\ \hline 3000 & \text{COLMAP} & 30.87 & 0.9478 & 0.0353 \\ & \textbf{LucidDreamer} & \textbf{33.80} & \textbf{0.9754} & \textbf{0.0178} \\ \hline 7000 & \text{COLMAP} & 32.52 & 0.9687 & 0.0208 \\ & \textbf{LucidDreamer} & \textbf{34.24} & \textbf{0.9781} & \textbf{0.0164} \\ \hline \end{array} \]

More 3D Gaussian Splatting Scenes

Click and drag to navigate. Shift and scroll to zoom in/out.

Citation

@article{chung2023luciddreamer,
  title={LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes},
  author={Chung, Jaeyoung and Lee, Suyoung and Nam, Hyeongjin and Lee, Jaerin and Lee, Kyoung Mu},
  journal={arXiv preprint arXiv:2311.13384},
  year={2023}
}