Skip to content

Cosmos-Predict1 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.

License

Notifications You must be signed in to change notification settings

Aurora-edu/cosmos-predict1

 
 

Repository files navigation

NVIDIA Cosmos Header

NVIDIA Cosmos is a developer-first world foundation model platform designed to help Physical AI developers build their Physical AI systems better and faster. Cosmos contains

  1. Pre-trained models (available via Hugging Face) under the NVIDIA Open Model License that allows commercial use of the models for free.
  2. Training scripts under the Apache 2 License for post-training the models for various downstream Physical AI applications.

Key Features

Cosmos-Predict1 includes the following features:

  • Diffusion-based world foundation models for Text2World and Video2World generation, where a user can generate visual simulation based on text prompts and video prompts.
  • Autoregressive-based world foundation models for Video2World generation, where a user can generate visual simulation based on video prompts and optional text prompts.
  • Image and video tokenizers for tokenizing videos into continuous tokens (latent vectors) and discrete tokens (integers) efficiently and effectively.

Examples

Inference with pre-trained models:

Post-training models:

Inference with post-trained models:

The code snippet below provides a gist of the inference usage.

PROMPT="A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."

CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python cosmos_predict1/diffusion/inference/text2world.py \
    --checkpoint_dir checkpoints \
    --diffusion_transformer_dir Cosmos-Predict1-7B-Text2World \
    --prompt "${PROMPT}" \
    --offload_prompt_upsampler \
    --video_save_name diffusion-text2world-7b
text2world-example.mp4

Model Family

We provide a series of pre-trained models of different families, available for download on Hugging Face.

Diffusion models

Autoregressive models

Tokenizers

License and Contact

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

NVIDIA Cosmos source code is released under the Apache 2 License.

NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact [email protected].

About

Cosmos-Predict1 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 78.2%
  • Python 21.8%