Skip to content

The-AI-Alliance/tapestry

The AI Alliance banner

Welcome to Project Tapestry

Note

Project Tapestry is bringing together talented people, data, and compute from a global consortium of partners to build a new foundation model system trained on a larger and more diverse corpus than ever before.

Tapestry will enable sovereign AI by ensuring ownership of data and compute remains with partners, and that partners can continue to train sovereign derivatives of the consortium-trained base model that they own using the Tapestry open source training platform.

Learn more from our Kickoff Workshop Blog and check out the Project Tapestry website for more information about partnering, events, and how to support Project Tapestry.

This repo contains the code and technical documentation for the project. We invite you to jump in and help!

Project Tapestry Logo

The rest of this README provides information for contributors and users of this repository.

Contribute to Our First Work Streams

Project Tapestry has big plans, and we're starting with some fundamental building blocks.

  • LLM Cultural Alignment and Re-alignment repository coming soon - help us develop techniques for cultural alignment, initially based on the Inglehart–Welzel Cultural Map as a metric. This task will implement a corresponding evaluation and implement tuning experiments to understand how to shift alignment without compromising general model performance. Prior expertise in evaluation and tuning technologies are especially welcome.
  • Consortium Training repository coming soon - Tapestry's approach to global model development relies on a balance between centralized and distributed training that preserves use and privacy requirements for data sets. Help us adapt and develop optimal techniqes with ideas from both federated learning and the latest LLM pre-training and post-training methods. Prior expertise in large scale LLM training, distributed infrastructure, and federated learning are especially welcome.
  • Global Training Data Corpus A core thesis of project Tapestry is that bringing together a much more diverse set of data can provide a path to a better frontier base model for all. What unique datasets exist that could be brought to Tapestry model training? They don't have to be fully open; we will work with you to define and enforce appropriate requirements.
  • Tapestry Model Development Roadmap - coming soon - we want your input!

Quick Paths

Note

Make sure to read Getting Involved below for information on contribution guidelines, etc.

We use the develop branch as our default (integration) branch, reserving main for occasional "baked" releases.

Working with the Source Code

The source code is under the src directory.

Working with the Technical Documentation

The technical documentation lives under tech-docs:

For repo layout, conventions, and where to find implementation code, see AGENTS.md.

Development

Setup

This project uses uv for Python package management.

Install uv

On macOS/Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

On Windows:

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

The rest of the steps discussed next are automated using make. Try the following:

make one-time-setup

Create a Virtual Environment

The one-time-setup target runs the following command (but it only works on macOS or Linux). You can also do this manually:

On macOS/Linux:

uv venv
source .venv/bin/activate

On Windows:

uv venv
.venv\Scripts\activate

Install Dependencies

The one-time-setup target runs the first of the following commands (but it only works on macOS or Linux). You can also run either command manually:

uv pip install -e ".[dev]"  # full development dependencies
uv pip install -e .         # minimum dependencies

Running Tests

We use pytest for testing. The easiest way to run the test suite is using make:

make unit-tests # or just tests; they are currently the same.

This runs the following commands, which you can run yourself if you prefer:

cd src
uv run python -m pytest tests -q

Code Formatting

Use either of the following commands to format the Python code with black:

make format
# or
uv run black src

Linting

Use either of the following commands to lint the Python code with ruff and pylint:

make lint
# or
uv run ruff check src
uv pylint src

Type Checking

Use either of the following commands to type check the Python code with ty:

make type-check
# or
uv run ty src

There is also a "watch" option that keeps ty running as you fix mistakes and save the files:

make type-check-watch
# or
uv run ty --watch src

Before You Submit a PR...

Before submitting a PR, please run the format, lint, and type checking commands, then run the tests. Make sure everything passes cleanly! Use the convenient make target before-pr, or run the individual commands above:

make before-pr               # Equivalent to 'make format lint type-check tests'
make format-lint-type-check  # Equivalent to 'make format lint type-check'

Note

Make sure to read Getting Involved below before submitting a PR.

Project Code Structure

In addition to the top-level directories tech-docs, discussed above, docs, discussed below, and contrib, the staging area for contributed ideas and techniques, the code structure is as follows. At this time, there are three major subsystems:

  • data for all data governance and management capabilities.
  • training for all distributed training and tuning capabilities.
  • infrastructure for all underlying infrastructure.
tapestry/
├── contrib/        # Contributed ideas & techniques, proposed via PR
├── src/
│   └── tapestry/
│       └── data/
│       └── infrastructure/
│       └── training/
│   └── tests
│       └── tapestry/
│           └── data/
│           └── infrastructure/
│           └── training/

Getting Involved

We welcome contributions as pull requests, issues, and discussions.

See CONTRIBUTING.md for guidelines. In particular, read this section on using DCO with any commits.

Have an idea, technique, or experiment you'd like the project to consider? The contrib/ directory is a lightweight staging area where contributors can propose work via a PR into their own subdirectory. See contrib/README.md for the simple workflow and contribution policy.

You can also join one or more work groups that are being organized to identify requirements in several areas and to start the engineering work to prototype and test ideas, followed by the initial implementation iterations. Details are are being documented in tech-docs/work-groups/.

Licenses

All code contributions are licensed under the Apache 2.0 LICENSE (which is also in this repo, LICENSE.Apache-2.0).

All documentation contributions are licensed under the Creative Commons Attribution 4.0 International (which is also in this repo, LICENSE.CC-BY-4.0).

All data contributions are licensed under the Community Data License Agreement - Permissive - Version 2.0 (which is also in this repo, LICENSE.CDLA-2.0).

We use the "Developer Certificate of Origin" (DCO).

Warning

Before you make any git commits with changes, understand what's required for DCO.

See the contributing guide section on DCO for details. In practical terms, supporting this requirement means you must use the -s flag with your git commit commands.

About the Technical Website (GitHub Pages)

The website for this repository provides another way to discover and navigate the technical documentation content in tech-docs. However, at this time, the site mostly just points to the content in tech-docs. The website sources are in the docs directory.

The website is published using GitHub Pages, where the pages are written in Markdown and served using Jekyll. See GITHUB_PAGES.md for all the details.

About

Project Tapestry aims to give every nation and participant frontier AI they can call their own — uniting a global consortium to train a shared frontier model from which partners build and own sovereign models aligned to their national, socio-cultural, and industrial needs.

Topics

Resources

License

Apache-2.0 and 2 other licenses found

Licenses found

Apache-2.0
LICENSE.Apache-2.0
CC-BY-4.0
LICENSE.CC-BY-4.0
Unknown
LICENSE.CDLA-2.0

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors