FLOWDREAMER: EXPLORING HIGH FIDELITY TEXT- TO-3D GENERATION VIA RECTIFIED FLOW
arxiv 2024
-
Hangyu Li
AI Thrust, HKUST(GZ) -
Xiangxiang Chu
Alibaba Group -
Dingyuan Shi
Alibaba Group -
Addison Lin Wang
AI & CMA Thrust, HKUST(GZ)
Dept. of CSE, HKUST
Abstract
Recent advances in text-to-3D generation have made significant progress. In particular, with the pretrained diffusion models, existing methods predominantly use Score Distillation Sampling (SDS) to train 3D models such as Neural Ra- Recent advances in text-to-3D generation have made significant progress. In par- ticular, with the pretrained diffusion models, existing methods predominantly use Score Distillation Sampling (SDS) to train 3D models such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3D GS). However, a hurdle is that they often encounter difficulties with over-smoothing textures and over-saturating col- ors. The rectified flow model – which utilizes a simple ordinary differential equa- tion (ODE) to represent a straight trajectory – shows promise as an alternative prior to text-to-3D generation. It learns a time-independent vector field, thereby reducing the ambiguity in 3D model update gradients that are calculated using time-dependent scores in the SDS framework. In light of this, we first develop a mathematical analysis to seamlessly integrate SDS with rectified flow model, paving the way for our initial framework known as Vector Field Distillation Sam- pling (VFDS). However, empirical findings indicate that VFDS still results in over-smoothing outcomes. Therefore, we analyze the grounding reasons for such a failure from the perspective of ODE trajectories. On top, we propose a novel framework, named FlowDreamer, which yields high-fidelity results with richer textual details and faster convergence. The key insight is to leverage the coupling and reversible properties of the rectified flow model to search for the correspond- ing noise, rather than using randomly sampled noise as in VFDS. Accordingly, we introduce a novel Unique Couple Matching (UCM) loss, which guides the 3D model to optimize along the same trajectory. Our FlowDreamer is superior in its flexibility to be applied to both NeRF and 3D GS. Extensive experiments demon- strate the high-fidelity outcomes and accelerated convergence of FlowDreamer. Moreover, we highlight the intriguing open questions, such as initialization chal- lenges in NeRF and sampling techniques, to benefit the research community.
3D GS results comparisons
DreamGaussian
GaussianDreamer
LucidDreamer
FlowDreamer(ours)
an origami pig.
A pumpkin covered in cobwebs with plastic spiders crawling on it.
A watch, highly detailed.