Skip to content

Latest commit

 

History

History
91 lines (60 loc) · 3.47 KB

lerf.md

File metadata and controls

91 lines (60 loc) · 3.47 KB

LERF

📎 Language Embedded Radiance Fields 🚜

:color: primary
:outline:
Paper Website
:color: primary
:outline:
Code

Grounding CLIP vectors volumetrically inside NeRF allows flexible natural language queries in 3D

Installation

First install nerfstudio dependencies. Then run:

pip install git+https://github.com/kerrj/lerf

Running LERF

Details for running LERF (built with Nerfstudio!) can be found here. Once installed, run:

ns-train lerf --help

Three variants of LERF are provided:

Method Description Memory Quality
lerf-big LERF with OpenCLIP ViT-L/14 ~22 GB Best
lerf Model with OpenCLIP ViT-B/16, used in paper ~15 GB Good
lerf-lite LERF with smaller network and less LERF samples ~8 GB Ok

lerf-lite should work on a single NVIDIA 2080. lerf-big is experimental, and needs further tuning.

Method

LERF enables pixel-aligned queries of the distilled 3D CLIP embeddings without relying on region proposals, masks, or fine-tuning, supporting long-tail open-vocabulary queries hierarchically across the volume.

Multi-scale supervision

To supervise language embeddings, we pre-compute an image pyramid of CLIP features for each training view. Then, each sampled ray during optimization is supervised by interpolating the CLIP embedding within this pyramid.

LERF Optimization

LERF optimizes a dense, multi-scale language 3D field by volume rendering CLIP embeddings along training rays, supervising these embeddings with multi-scale CLIP features across multi-view training images.

Inspired by Distilled Feature Fields (DFF), we use DINO features to regularize CLIP features. This leads to qualitative improvements in object boundaries, as CLIP embeddings in 3D can be sensitive to floaters and regions with few views.

After optimization, LERF can extract 3D relevancy maps for language queries interactively in real-time.

Visualizing relevancy for text queries

Set the "Output Render" type to relevancy_0, and enter the text query in the "LERF Positives" textbox (see image). The output render will show the 3D relevancy map for the query. View the project page for more examples and details about the relevancy map normalization.

Results

For results, view the project page!

Datasets used in the original paper can be found here.

@article{lerf2023,
 author = {Kerr, Justin and Kim, Chung Min and Goldberg, Ken and Kanazawa, Angjoo and Tancik, Matthew},
 title = {LERF: Language Embedded Radiance Fields},
 journal = {arXiv preprint arXiv:2303.09553},
 year = {2023},
}