Contributing

This guide will help you understand the overall organization of the project. It's the single source of truth for how to contribute to the code base.

Note

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD","SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Principles

These principles guide every aspect of design and implementation, underpinning the core goal to make large-scale machine learning accessible by removing barriers and enabling widespread innovation. Each principle represents a core quality describing how the system SHOULD be:

Secure

Security, data sovereignty, and privacy are foundational and non-negotiable. Emphasize secure defaults, confidential computing, comprehensive authentication, and strict access controls to responsibly handle data.

Operationally Autonomous

The system should operate independently and reliably with minimal user intervention. Address resource constraints and operational complexity by automating resource discovery, workload distribution, fault tolerance, and scaling, significantly lowering operational barriers.

Real-world Ready

Real-world constraints like network limitations, regulatory compliance, and diverse infrastructure challenges should inform design decisions. Proactively accommodate these realities, ensuring seamless integration into existing environments.

Intuitive

Ensure designs and implementations are coherent, easy to understand, and straightforward, whether at the system level or within individual features and components. Balance "How simple can we make it?" with "How complex does it have to be?" (Laws of Simplicity / Reduce) and answer "What goes with what?" (Laws of Simplicity / Organize). Prioritize clear structure and streamlined interactions to reduce cognitive load and foster trust.

Performant

Ensure efficient and optimized training and inference capabilities. Performance is essential for practical usability, enabling demanding real-world applications to meet high standards for scale, latency, and reliability.

Compatible

Seamlessly integrate diverse, heterogeneous hardware setups commonly found in enterprises and research institutions. Supporting varied infrastructure unlocks the latent potential of underutilized hardware, enabling innovative ideas previously hindered by infrastructure limitations.

Examples

The following are illustrative scenarios showing how these principles can be applied, especially when principles might conflict:

Operationally Autonomous and Intuitive: Integrate and implement services, when possible instead of relying on separate services which would come with extra operetational cost and complexities.
Secure vs. Performant: Even if bypassing encryption significantly enhances performance, security must never be compromised. Always choose secure defaults, such as encryption and authentication, despite potential performance trade-offs.
Real-world Ready vs. Performant: While certain network protocols like QUIC might offer better performance, ensure there's a fallback mechanism (e.g., WebRTC or WebSockets) to handle realistic network constraints, such as environments blocking UDP traffic.
Intuitive vs. Performant: Avoid overly complex configurations—even if they deliver maximum performance—if they compromise usability. Prioritize straightforward, intuitive setups to ensure users don't require extensive system knowledge for effective operation.

Setup

Install the stable toolchain for building and the nightly toolchain for formatting:

rustup toolchain install stable
rustup toolchain install nightly --component rustfmt

Format the codebase using cargo +nightly fmt before committing.

Components

Important

We will probably split this mono-repo into multiple ones to separate the different components and their responsibilities better, allow for easier maintenance and development, as well as ease license handling.

The project is organized into multiple components (crates), each with its own purpose and responsibilities.

flowchart RL
    classDef transparent opacity:0

    TODO["TODO: Add component overview"]

Adding Components

New components SHOULD be added using cargo new crates/${NAME}. This will create a new crate directory with the standard Rust project structure.

After creating a new crate, you MUST link the appropriate license:

ln -s LICENSE-${TYPE} crates/${NAME}/LICENSE-${TYPE}

Where ${TYPE} refers to the license type (e.g., APACHE for Apache-2.0).

Make sure to also update the license metadata in each crate's Cargo.toml file to correctly reflect the license being used:

[package]
# Other package metadata...
license = "${TYPE}"

The default license is Apache-2.0, but the project uses multiple licenses for the time being. If you're uncertain about which license to use, please consult the project lead. Make sure to use the SPDF license identifier, see https://spdx.org/licenses/ for more information.

Documentation

All user-facing features MUST be documented. Good documentation is crucial to making making large-scale machine learning accessible. Quality documentation removes barriers and enables widespread innovation by ensuring users can effectively understand and utilize the system.

Each crate MUST begin with comprehensive front-page documentation in lib.rs and SHOULD including an introduction, quick start example, feature overview, and integration guide. Every public function, struct, enum, trait, and module MUST be documented with /// comments that describe their purpose and SHOULD document include usage examples.

All Documentation SHOULD be:

Clear and concise - Easy to understand for the target audience
Complete - Covering all public functionality comprehensively
Current - Updated with every change
Consistent - Following project-wide style and conventions

Poor or missing documentation for user-facing features will block pull requests. When in doubt, err on the side of more documentation rather than less.

For detailed guidelines, refer to the rustdoc book.

Adding Dependencies

When adding dependencies, these MUST be added to the respective crate's Cargo.toml file. You can add dependencies using:

cargo add ${DEPENDENCY_NAME}

Make sure that the dependency is compatible with the project's licenses.

Upgrade Dependencies

All dependencies SHALL be updated regularly to maintain an up to date and secure product. Updates SHOULD consider backward compatibility and MUST document compatibility issues.

Version Control

Changes SHOULD be committed frequently in small logical chunks that MUST be consistent, work independently of any later commits, and pass the linter plus the tests. Doing so eases rollback and rebase operations. Commits MUST not include any customer data.

Commit messages SHALL follow the Conventional Commits specification. This provides a framework for explicit, readable messages and enables automated changelog generation.

AI Assistants

Tip

For AI Agent guidance on effective collaboration on this project, please refer to the AGENTS.md file in the repository root.

The use of AI in open-source development is controversial and projects take varied approaches:

Full bans – Projects like Servo reject any AI-generated code due to maintainability risks, quality concerns, and the challenges of reviewing code that contributors may not fully understand.
Embracing with guardrails – Projects like Cloudflare's workers-oauth-provider, written largely with Claude, demonstrate that rigorous review processes and clear guidelines can harness AI's productivity benefits while maintaining quality.

The Cloudflare example along with others show that with the right procedures, AI assistance can deliver on productivity promises while balancing risks. Our approach follows this second path, drawing inspiration from "Field Notes From Shipping Real Code With Claude" which observes:

Good development practices aren't just nice-to-haves—they're the difference between AI that amplifies your capabilities versus your chaos.

Building on this insight, our AGENTS.md file and the principles below serve as a starting point for responsible AI collaboration practices. We believe that AI assistants can be powerful tools for increasing productivity when used with appropriate safeguards.

Using AI assistants (e.g., Gemini, Claude, ChatGPT) for development is encouraged to increase productivity. However, the human contributor is always ultimately responsible for the code they commit. Every use of AI assistants MUST follow the principles outlined below.

Principles

Accountability: The contributor, is accountable for any code committed. This responsibility is not diminished if the code was generated by an AI. Contributers MUST thoroughly review, understand, and test all AI-generated code to ensure it is correct, secure, and aligns with the project's principles and coding standards before submitting it.
Attribution: If an AI assistant provides a significant contribution to a commit, it MUST be acknowledged using a Co-Authored-By: trailer in the commit message. This maintains transparency in the project's history.

Attribution

Add the appropriate trailer on a new line in the commit message body:

Claude: Co-Authored-By: Claude <noreply@anthropic.com>
Gemini: Co-Authored-By: Gemini <google@users.noreply.github.com>
ChatGPT: Co-Authored-By: ChatGPT <openai@users.noreply.github.com>

Note

Not all AI assistant have an official Gihub account. Please use the organization name with the default github noreply email to prevent mentioning actual users when using a new assistant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing

Principles

Examples

Setup

Components

Adding Components

Documentation

Adding Dependencies

Upgrade Dependencies

Version Control

AI Assistants

Principles

Attribution

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing

Principles

Examples

Setup

Components

Adding Components

Documentation

Adding Dependencies

Upgrade Dependencies

Version Control

AI Assistants

Principles

Attribution