|
| 1 | +<p align="center"> |
| 2 | + <img src="assets/nvidia-cosmos-header.png" alt="NVIDIA Cosmos Header"> |
| 3 | +</p> |
1 | 4 |
|
2 | | - |
| 5 | +## GitHub project for NVIDIA Cosmos: https://github.com/nvidia-cosmos |
3 | 6 |
|
4 | | --------------------------------------------------------------------------------- |
5 | | -### [Website](https://www.nvidia.com/en-us/ai/cosmos/) | [HuggingFace](https://huggingface.co/collections/nvidia/cosmos-6751e884dc10e013a0a0d8e6) | [GPU-free Preview](https://build.nvidia.com/explore/discover) | [Paper](https://arxiv.org/abs/2501.03575) | [Paper Website](https://research.nvidia.com/labs/dir/cosmos1/) |
| 7 | +NVIDIA Cosmos now includes three subprojects: |
6 | 8 |
|
7 | | -[NVIDIA Cosmos](https://www.nvidia.com/cosmos/) is a developer-first world foundation model platform designed to help Physical AI developers build their Physical AI systems better and faster. Cosmos contains |
| 9 | +### Cosmos-Predict1: https://github.com/nvidia-cosmos/cosmos-predict1 |
| 10 | +- Cosmos-Predict1 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications. |
8 | 11 |
|
9 | | -1. pre-trained models, available via [Hugging Face](https://huggingface.co/collections/nvidia/cosmos-6751e884dc10e013a0a0d8e6) under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) that allows commercial use of the models for free |
10 | | -2. training scripts under the [Apache 2 License](https://www.apache.org/licenses/LICENSE-2.0), offered through [NVIDIA Nemo Framework](https://github.com/NVIDIA/NeMo) for post-training the models for various downstream Physical AI applications |
| 12 | +### Cosmos-Transfer1: https://github.com/nvidia-cosmos/cosmos-transfer1 |
| 13 | +- Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments. |
11 | 14 |
|
12 | | -Details of the platform is described in the [Cosmos paper](https://research.nvidia.com/publication/2025-01_cosmos-world-foundation-model-platform-physical-ai). Preview access is avaiable at [build.nvidia.com](https://build.nvidia.com). |
| 15 | +### Cosmos-Reason1: https://github.com/nvidia-cosmos/cosmos-reason1 |
| 16 | +- Cosmos-Reason1 models can understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes. |
13 | 17 |
|
14 | | -## Key Features |
| 18 | +----------------------------------------------------------- |
15 | 19 |
|
16 | | -- [Pre-trained Diffusion-based world foundation models](cosmos1/models/diffusion/README.md) for Text2World and Video2World generation where a user can generate visual simulation based on text prompts and video prompts. |
17 | | -- [Pre-trained Autoregressive-based world foundation models](cosmos1/models/autoregressive/README.md) for Video2World generation where a user can generate visual simulation based on video prompts and optional text prompts. |
18 | | -- [Video tokenizers](cosmos1/models/tokenizer) for tokenizing videos into continuous tokens (latent vectors) and discrete tokens (integers) efficiently and effectively. |
19 | | -- Video curation pipeline for building your own video dataset. [Coming soon] |
20 | | -- [Post-training scripts](cosmos1/models/POST_TRAINING.md) via NeMo Framework to post-train the pre-trained world foundation models for various Physical AI setup. |
21 | | -- Pre-training scripts via NeMo Framework for building your own world foundation model. [[Diffusion](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/diffusion)] [[Autoregressive](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/multimodal_autoregressive)] [[Tokenizer](cosmos1/models/tokenizer/nemo/README.md)]. |
22 | | - |
23 | | -## Model Family |
24 | | - |
25 | | -| Model name | Description | Try it out | |
26 | | -|------------|----------|----------| |
27 | | -| [Cosmos-1.0-Diffusion-7B-Text2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Text2World) | Text to visual world generation | [Inference](cosmos1/models/diffusion/README.md) | |
28 | | -| [Cosmos-1.0-Diffusion-14B-Text2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-14B-Text2World) | Text to visual world generation | [Inference](cosmos1/models/diffusion/README.md) | |
29 | | -| [Cosmos-1.0-Diffusion-7B-Video2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Video2World) | Video + Text based future visual world generation | [Inference](cosmos1/models/diffusion/README.md) | |
30 | | -| [Cosmos-1.0-Diffusion-14B-Video2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-14B-Video2World) | Video + Text based future visual world generation | [Inference](cosmos1/models/diffusion/README.md) | |
31 | | -| [Cosmos-1.0-Autoregressive-4B](https://huggingface.co/nvidia/Cosmos-1.0-Autoregressive-4B) | Future visual world generation | [Inference](cosmos1/models/autoregressive/README.md) | |
32 | | -| [Cosmos-1.0-Autoregressive-12B](https://huggingface.co/nvidia/Cosmos-1.0-Autoregressive-12B) | Future visual world generation | [Inference](cosmos1/models/autoregressive/README.md) | |
33 | | -| [Cosmos-1.0-Autoregressive-5B-Video2World](https://huggingface.co/nvidia/Cosmos-1.0-Autoregressive-5B-Video2World) | Video + Text based future visual world generation | [Inference](cosmos1/models/autoregressive/README.md) | |
34 | | -| [Cosmos-1.0-Autoregressive-13B-Video2World](https://huggingface.co/nvidia/Cosmos-1.0-Autoregressive-13B-Video2World) | Video + Text based future visual world generation | [Inference](cosmos1/models/autoregressive/README.md) | |
35 | | -| [Cosmos-1.0-Guardrail](https://huggingface.co/nvidia/Cosmos-1.0-Guardrail) | Guardrail contains pre-Guard and post-Guard for safe use | Embedded in model inference scripts | |
36 | | - |
37 | | -## Example Usage |
38 | | - |
39 | | -### Inference |
40 | | - |
41 | | -Follow the [Cosmos Installation Guide](INSTALL.md) to setup the docker. For inference with the pretrained models, please refer to [Cosmos Diffusion Inference](cosmos1/models/diffusion/README.md) and [Cosmos Autoregressive Inference](cosmos1/models/autoregressive/README.md). |
42 | | - |
43 | | -The code snippet below provides a gist of the inference usage. |
44 | | - |
45 | | -```bash |
46 | | -PROMPT="A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. \ |
47 | | -The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. \ |
48 | | -A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, \ |
49 | | -suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. \ |
50 | | -The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of \ |
51 | | -field that keeps the focus on the robot while subtly blurring the background for a cinematic effect." |
52 | | - |
53 | | -# Example using 7B model |
54 | | -PYTHONPATH=$(pwd) python cosmos1/models/diffusion/inference/text2world.py \ |
55 | | - --checkpoint_dir checkpoints \ |
56 | | - --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \ |
57 | | - --prompt "$PROMPT" \ |
58 | | - --offload_prompt_upsampler \ |
59 | | - --video_save_name Cosmos-1.0-Diffusion-7B-Text2World |
60 | | -``` |
61 | | - |
62 | | -<video src="https://github.com/user-attachments/assets/db7bebfe-5314-40a6-b045-4f6ce0a87f2a"> |
63 | | - Your browser does not support the video tag. |
64 | | -</video> |
65 | | - |
66 | | -We also offer [multi-GPU inference](cosmos1/models/diffusion/nemo/inference/README.md) support for Diffusion Text2World WFM models through NeMo Framework. |
67 | | - |
68 | | -### Post-training |
69 | | - |
70 | | -NeMo Framework provides GPU accelerated post-training with general post-training for both [diffusion](cosmos1/models/diffusion/nemo/post_training/README.md) and [autoregressive](cosmos1/models/autoregressive/nemo/post_training/README.md) models, with other types of post-training coming soon. |
71 | | - |
72 | | -## License and Contact |
73 | | - |
74 | | -This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use. |
75 | | - |
76 | | -NVIDIA Cosmos source code is released under the [Apache 2 License](https://www.apache.org/licenses/LICENSE-2.0). |
77 | | - |
78 | | -NVIDIA Cosmos models are released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license). For a custom license, please contact [cosmos-license@nvidia.com](mailto:cosmos-license@nvidia.com). |
| 20 | +This repository will be archived soon. To check out the initial release of NVIDIA Cosmos, please follow [README_CES2025.md](README_CES2025.md). |
0 commit comments