From c54cd061844b5cbb1b590fffbbf6794f0bb3aa8e Mon Sep 17 00:00:00 2001
From: Santix12 <Santix12@users.noreply.github.com>
Date: Mon, 5 May 2025 10:11:31 +0000
Subject: [PATCH] Create Prometheus-generated README file

---
 README_Prometheus.md | 447 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 447 insertions(+)
 create mode 100644 README_Prometheus.md

diff --git a/README_Prometheus.md b/README_Prometheus.md
new file mode 100644
index 0000000..4a34048
--- /dev/null
+++ b/README_Prometheus.md
@@ -0,0 +1,447 @@
+# Cube3D: Generative AI for 3D Asset Creation by Roblox Foundation AI Team
+
+## Project Overview
+
+Cube is an innovative generative AI system for 3D asset creation, developed by Roblox's Foundation AI Team. The project aims to build a foundation model for 3D intelligence that can support developers in producing comprehensive 3D content for digital experiences.
+
+#### Key Objectives
+- Create a powerful text-to-shape generation model
+- Enable developers to generate 3D objects and scenes with natural language prompts
+- Provide tools for creative 3D asset generation and manipulation
+
+#### Core Features
+- Text-to-Shape Generation: Convert textual descriptions into 3D models
+- Shape Tokenization: Advanced tokenization technique for representing 3D shapes
+- High-Quality 3D Asset Creation: Generate detailed 3D objects with complex geometries
+- Flexible Inference: Support for various hardware configurations and resolution settings
+
+#### Key Benefits
+- Democratizes 3D content creation by lowering technical barriers
+- Accelerates 3D asset generation for developers and creators
+- Provides a flexible and powerful tool for 3D model generation
+- Supports creative exploration through text-based 3D modeling
+
+The project represents a significant step towards enabling more accessible and intelligent 3D content creation, with potential applications in game development, design, animation, and virtual environments.
+
+## Getting Started, Installation, and Setup
+
+### Prerequisites
+
+- Python 3.7 or higher
+- A CUDA-capable GPU (recommended, with at least 16GB VRAM)
+- Blender version 4.3 or higher (optional, for rendering GIFs)
+
+### Installation
+
+#### Basic Installation
+
+To install the project, clone the repository and install the package:
+
+```bash
+git clone https://github.com/Roblox/cube.git
+cd cube
+pip install -e .
+```
+
+#### Installation with Optional Dependencies
+
+For additional mesh processing capabilities, install with the `meshlab` extra:
+
+```bash
+pip install -e .[meshlab]
+```
+
+#### CUDA Configuration
+
+For Windows users or those requiring CUDA support:
+
+```bash
+pip install torch --index-url https://download.pytorch.org/whl/cu124 --force-reinstall
+```
+
+### Quick Start
+
+#### Download Model Weights
+
+Download the model weights from Hugging Face:
+
+```bash
+huggingface-cli download Roblox/cube3d-v0.1 --local-dir ./model_weights
+```
+
+#### Generate 3D Shapes
+
+Generate a 3D model from a text prompt:
+
+```bash
+python -m cube3d.generate \
+    --gpt-ckpt-path model_weights/shape_gpt.safetensors \
+    --shape-ckpt-path model_weights/shape_tokenizer.safetensors \
+    --fast-inference \
+    --prompt "Broad-winged flying red dragon, elongated, folded legs."
+```
+
+### Development Tips
+
+- For lower VRAM usage, use the `--resolution-base` flag to reduce output resolution
+- The `--render-gif` flag can create a turntable animation of the generated mesh
+- Ensure Blender is in your system PATH for GIF rendering
+
+### Alternative Usage Methods
+
+1. **Jupyter Notebook**: Use `cube3d/colab_cube3d.ipynb` for interactive development
+2. **Python Library**: Import and use the `cube3d.inference.engine` module for programmatic generation
+
+### Supported Platforms
+
+- Linux
+- macOS
+- Windows
+
+### Hardware Compatibility
+
+Tested on:
+- Nvidia H100 GPU
+- Nvidia A100 GPU
+- Nvidia Geforce 3080
+- Apple Silicon M2-4 Chips
+
+## Dataset
+
+The Cube 3D project utilizes a diverse dataset for 3D shape generation, focusing on creating a generative AI system for 3D object creation. While the specific training dataset details are not explicitly provided in the repository, the project demonstrates capability through example 3D models.
+
+#### Example Dataset
+The repository includes a set of example 3D objects that showcase the model's generation capabilities:
+
+| Object     | Description                                               |
+|------------|-----------------------------------------------------------|
+| Bulldozer  | A standard industrial vehicle model                       |
+| Dragon     | A fantasy creature with specific attributes (broad-winged, red, folded legs) |
+| Boat       | A generic boat model                                      |
+| Sword      | A fantasy sword with specific design details (purple crystal, green gem accents) |
+
+##### Data Characteristics
+- File Formats: `.obj` (3D mesh), `.gif` (turntable rendering)
+- Number of Example Objects: 4
+- Diversity: Covers industrial, fantasy, and generic object types
+
+#### Data Processing
+The project uses a shape tokenizer to convert 3D meshes into token representations, enabling text-to-shape generation. The tokenization process allows for:
+- Encoding 3D shapes into discrete token indices
+- Reconstructing meshes from tokenized representations
+
+#### External Resources
+For comprehensive dataset information, refer to:
+- [Hugging Face Model Page](https://huggingface.co/Roblox/cube3d-0.1)
+- [ArXiv Technical Report](https://arxiv.org/abs/2503.15475)
+
+Note: Detailed training dataset specifics are not publicly disclosed in this repository.
+
+## Model Architecture and Training
+
+### Model Architecture
+
+The project utilizes a sophisticated dual-stream Roformer architecture designed for advanced 3D shape generation. The model consists of several key components:
+
+#### Key Model Components
+- **Transformer Architecture**: Dual-stream Roformer with configurable layers
+- **Embedding Dimensions**: 
+  - Text embedding dimension: 768 (from CLIP ViT-Large)
+  - Shape model embedding dimension: 32
+  - Model embedding dimension: 1536
+
+#### Model Configuration
+- **Transformer Layers**: 
+  - 23 dual-stream layers
+  - Additional single-stream layers optional
+- **Attention Heads**: 12
+- **Vocabulary Size**: 16,384 shape tokens
+
+### Model Capabilities
+The model is designed to:
+- Process both text and shape representations
+- Generate 3D shapes conditioned on textual descriptions
+- Use rotary positional embeddings (RoPE)
+- Support cross-attention between text and shape representations
+
+### Training Considerations
+- Pre-trained CLIP text encoder used as text model backbone
+- Flexible embedding projection for text and shape inputs
+- Custom key-value caching mechanism for efficient inference
+
+### Training Setup
+The model uses a configuration-driven approach, allowing easy modification of architectural parameters through the configuration file (`cube3d/configs/open_model.yaml`).
+
+#### Key Training Configurations
+- Rotary embedding base (theta): 10,000
+- Layer normalization epsilon: 1e-6
+- Bias in linear layers: Enabled
+- Cross-attention levels: [0, 2, 4, 8]
+
+### Important Model Features
+- Special tokens for beginning-of-sequence (BOS), end-of-sequence (EOS), and padding
+- Dual-stream attention mechanism allowing rich interaction between text and shape representations
+- Flexible decoding with key-value caching
+
+## Evaluation and Results
+
+The Cube 3D model demonstrates advanced capabilities in generative 3D shape modeling through text-to-shape generation. The evaluation focuses on the model's ability to generate diverse and high-quality 3D assets from textual descriptions.
+
+### Performance Characteristics
+
+The model has been developed and tested on various hardware configurations, including:
+- Nvidia H100 GPU
+- Nvidia A100 GPU
+- Nvidia Geforce 3080
+- Apple Silicon M2-4 Chips
+
+#### Key Performance Metrics
+- Text-to-Shape Generation: The model can generate 3D models from natural language prompts
+- Resolution Control: Supports variable resolution from base 4.0 to 9.0, allowing trade-offs between generation quality and computational efficiency
+- Inference Speed: Optimized with fast inference mode for GPUs with sufficient VRAM
+
+### Evaluation Methodology
+
+The model's performance is primarily evaluated through:
+- Prompt-driven Shape Generation: Ability to create 3D models that match textual descriptions
+- Reconstruction Accuracy: Tokenization and de-tokenization capabilities demonstrated in `vq_vae_encode_decode.py`
+- Rendering Quality: Optional GIF turntable rendering to visualize generated 3D models
+
+### Example Evaluation Commands
+
+Text-to-Shape Generation:
+```bash
+python -m cube3d.generate \
+    --gpt-ckpt-path model_weights/shape_gpt.safetensors \
+    --shape-ckpt-path model_weights/shape_tokenizer.safetensors \
+    --fast-inference \
+    --prompt "Broad-winged flying red dragon, elongated, folded legs."
+```
+
+Shape Tokenization and Reconstruction:
+```bash
+python -m cube3d.vq_vae_encode_decode \
+    --shape-ckpt-path model_weights/shape_tokenizer.safetensors \
+    --mesh-path ./outputs/output.obj
+```
+
+### Recommended Evaluation Setup
+
+- Minimum GPU VRAM: 16GB (24GB recommended for fast inference)
+- Supported Platforms: CUDA-enabled GPUs, Apple Silicon
+- Resolution Range: Base 4.0 to 9.0 (lower values increase inference speed, reduce model quality)
+
+### Limitations and Considerations
+
+- Fast inference mode is not available on MacOS
+- Rendering GIF requires Blender (version >= 4.3)
+- Model performance varies with prompt complexity and specificity
+
+## Project Structure
+
+The project is organized into several key directories and files to support its 3D modeling and generation capabilities:
+
+### Main Package Structure
+- `cube3d/`: Primary package directory containing the core implementation
+  - `configs/`: Configuration files
+    - `open_model.yaml`: Model configuration settings
+  
+  - `inference/`: Model inference-related modules
+    - `engine.py`: Core inference logic
+    - `logits_postprocesses.py`: Post-processing of model logits
+    - `utils.py`: Utility functions for inference
+
+  - `mesh_utils/`: Mesh processing utilities
+    - `postprocessing.py`: Mesh post-processing functions
+
+  - `model/`: Model architecture components
+    - `autoencoder/`: Autoencoder-related implementations
+      - `embedder.py`: Embedding layer implementations
+      - `grid.py`: Grid-related model components
+      - `one_d_autoencoder.py`: One-dimensional autoencoder
+      - `spherical_vq.py`: Spherical vector quantization implementation
+
+    - `gpt/`: GPT model-specific implementations
+      - `dual_stream_roformer.py`: Dual-stream RoFormer model
+
+    - `transformers/`: Transformer architecture components
+      - `attention.py`: Attention mechanism implementations
+      - `cache.py`: Caching mechanisms
+      - `dual_stream_attention.py`: Dual-stream attention implementation
+      - `norm.py`: Normalization layers
+      - `roformer.py`: RoFormer implementation
+      - `rope.py`: Rotary Position Embedding implementation
+
+  - `renderer/`: Rendering utilities
+    - `blender_script.py`: Blender rendering script
+    - `renderer.py`: Rendering engine
+
+- `examples/`: Sample data and examples
+  - Contains `.obj` 3D models and corresponding `.gif` animations
+  - `prompts.json`: Example prompts or configuration file
+
+- `resources/`: Additional resource files
+  - Contains various image and visualization resources
+
+### Project Configuration and Setup
+- `pyproject.toml`: Project configuration and build settings
+- `setup.py`: Package installation and setup script
+
+### Documentation and Meta Files
+- `README.md`: Project documentation
+- `LICENSE`: Project licensing information
+- `SECURITY.md`: Security policy and guidelines
+- `.github/`: GitHub-specific configuration
+  - Issue and pull request templates
+
+## Technologies Used
+
+### Programming Languages
+- Python 3.10+
+
+### Deep Learning Frameworks
+- PyTorch (torch) v2.2.2+
+- Transformers
+- Hugging Face Accelerate
+
+### 3D and Mesh Processing
+- Trimesh
+- PyMeshLab (optional)
+- Warp-lang
+
+### Machine Learning and Numerical Computing
+- NumPy
+- Scikit-image
+
+### Configuration and Development
+- OmegaConf
+- Setuptools
+- Wheel
+
+### Utilities
+- tqdm
+- Hugging Face Hub CLI
+
+### Tools and Environments
+- CUDA (for GPU acceleration)
+- Jupyter Notebook (for Colab support)
+
+### Optional Tools
+- Ruff (linting)
+
+## Additional Notes
+
+### Research and Experimental Status
+
+This project represents an early-stage research effort in 3D generative AI, focusing on text-to-shape generation. The current implementation is experimental and may undergo significant changes as the research progresses.
+
+### Model Limitations
+
+- The model currently supports generating individual 3D objects rather than complete scenes
+- Generation quality and complexity may vary depending on the input prompt
+- Performance can be impacted by hardware constraints, particularly VRAM availability
+
+### Future Development Roadmap
+
+The project aims to expand capabilities in the following areas:
+- Bounding box conditioning for more precise shape generation
+- Scene generation capabilities
+- Enhanced 3D asset creation for creative and development workflows
+
+### Performance Considerations
+
+- Inference speed and quality can be adjusted using the `resolution_base` parameter
+- Recommended GPU memory is 24GB for fast inference, with 16GB as a minimum viable configuration
+- Performance may vary across different hardware platforms (NVIDIA GPUs, Apple Silicon)
+
+### Ethical and Responsible AI
+
+Roblox emphasizes responsible AI development, encouraging:
+- Thoughtful and creative use of generative technologies
+- Exploration of 3D asset generation for diverse applications
+- Collaborative research and innovation in the field of 3D intelligence
+
+### Known Limitations
+
+- Rendering GIFs requires Blender (version >= 4.3) installed in system PATH
+- Some advanced features may have platform-specific restrictions
+- Mesh quality can be affected by resolution and generation parameters
+
+### Community and Collaboration
+
+The project is open-sourced to:
+- Engage the research community
+- Encourage collaborative development
+- Explore the potential of generative AI in 3D asset creation
+
+## Contributing
+
+We welcome contributions to the project! Here are some guidelines to help you get started:
+
+### Contribution Process
+
+1. Fork the repository and create your branch from `main`.
+2. If you've added code that should be tested, add tests.
+3. Ensure the test suite passes.
+4. Make sure your code follows the project's coding style.
+5. Issue a pull request with a clear and descriptive summary.
+
+### Pull Request Guidelines
+
+When submitting a pull request, please:
+- Provide a clear, concise description of your changes
+- Include the purpose of the changes
+- Specify any new dependencies or breaking changes
+- Include testing instructions
+- Add screenshots if applicable
+
+### Checklist for Contributors
+- Verify that your code is well-documented
+- Ensure all tests are passing
+- Update relevant documentation
+- Follow the existing code style and conventions
+
+### Reporting Issues
+
+If you find a bug or have a feature request, please open an issue in the GitHub repository. Use the provided issue templates to ensure we have all the necessary information.
+
+### Code of Conduct
+
+Please be respectful and constructive in all interactions. Harassment, discrimination, or any form of inappropriate behavior will not be tolerated.
+
+### Questions?
+
+If you have any questions about contributing, please reach out through the repository's issue tracker.
+
+## License
+
+The Cube3D project is released under the **Cube3D Research-Only RAIL-MS License**.
+
+### Key License Highlights
+
+- This is a research-only license
+- Usage is restricted to academic or research purposes only
+- Strict use restrictions are in place, including:
+  - Prohibitions on discrimination
+  - Restrictions on intellectual property usage
+  - Limitations on potential harmful applications
+  - Privacy and ethical use constraints
+
+### Important Restrictions
+
+The license includes comprehensive use restrictions covering areas such as:
+- Discrimination and harmful content
+- Intellectual property protection
+- Legal compliance
+- Disinformation prevention
+- Privacy protection
+- Health and safety considerations
+- Restrictions on military or law enforcement applications
+
+#### Full License Terms
+
+For complete details, please refer to the [LICENSE](LICENSE) file in the repository. Users must carefully review and comply with all terms before using this project.
+
+### Disclaimer
+
+THE ARTIFACT IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND. THE CONTRIBUTORS SHALL NOT BE LIABLE FOR ANY DAMAGES ARISING FROM ITS USE.
\ No newline at end of file