LRM : Large Reconstruction Model for Single Image to 3D

Last updated on Mar 22, 2024

In the realm of computer vision and 3D modeling, recent advancements have introduced transformative techniques like Neural Radiance Fields (NeRF) for single-image 3D reconstruction. These methods promise to revolutionize how we create immersive 3D scenes from 2D images, offering a path to generating detailed and photorealistic models with minimal input data. One noteworthy approach within this domain is the concept of Latent Radiance Mapping (LRM), which builds upon the foundation of NeRF while introducing novel enhancements to improve reconstruction quality and efficiency.

Camera modulation

At its core, LRM introduces a sophisticated camera modulation mechanism, which dynamically adjusts the model's predictions based on specific camera characteristics. This modulation is crucial in adapting the model's outputs to different viewpoints and perspectives, enhancing the accuracy and realism of the reconstructed 3D scenes. By incorporating this camera-awareness into the model, LRM can produce more faithful representations of the original scene, even from a single input image.

Model architecture

The architecture of LRM is structured around three key components: an image encoder, an image-to-triplane decoder, and a NeRF-based volumetric rendering module. The image encoder processes the input image and extracts patch-wise features, which are then fed into the image-to-triplane decoder. This decoder projects image features onto triplane tokens, leveraging cross-attention mechanisms to integrate camera information effectively. Finally, the NeRF module synthesizes the final 3D representation, predicting RGB colors and densities for each point in the scene.

Loss function

Training LRM involves optimizing a combination of loss functions, including Mean Squared Error (MSE) and Learned Perceptual Image Patch Similarity (LPIPS), to ensure both pixel-level accuracy and perceptual fidelity in the reconstructed models. By iteratively refining the model parameters through backpropagation, LRM learns to generate high-quality 3D reconstructions that closely resemble the original scenes. This makes LRM a powerful tool for various applications, including virtual reality, augmented reality, and digital content creation, where realistic 3D representations are essential for immersive user experiences.