Note
Project Tapestry is bringing together talented people, data, and compute from a global consortium of partners to build a new foundation model system trained on a larger and more diverse corpus than ever before.
Tapestry will enable sovereign AI by ensuring ownership of data and compute remains with partners, and that partners can continue to train sovereign derivatives of the consortium-trained base model that they own using the Tapestry open source training platform.
Learn more from our Kickoff Workshop Blog and check out the Project Tapestry website for more information about partnering, events, and how to support Project Tapestry.
This repo contains the code and technical documentation for the project. We invite you to jump in and help!
The rest of this README provides information for contributors and users of this repository.
Project Tapestry has big plans, and we're starting with some fundamental building blocks.
- LLM Cultural Alignment and Re-alignment repository coming soon - help us develop techniques for cultural alignment, initially based on the Inglehart–Welzel Cultural Map as a metric. This task will implement a corresponding evaluation and implement tuning experiments to understand how to shift alignment without compromising general model performance. Prior expertise in evaluation and tuning technologies are especially welcome.
- Consortium Training repository coming soon - Tapestry's approach to global model development relies on a balance between centralized and distributed training that preserves use and privacy requirements for data sets. Help us adapt and develop optimal techniqes with ideas from both federated learning and the latest LLM pre-training and post-training methods. Prior expertise in large scale LLM training, distributed infrastructure, and federated learning are especially welcome.
- Global Training Data Corpus A core thesis of project Tapestry is that bringing together a much more diverse set of data can provide a path to a better frontier base model for all. What unique datasets exist that could be brought to Tapestry model training? They don't have to be fully open; we will work with you to define and enforce appropriate requirements.
- Tapestry Model Development Roadmap - coming soon - we want your input!
Note
Make sure to read Getting Involved below for information on contribution guidelines, etc.
We use the develop branch as our default (integration) branch, reserving main for occasional "baked" releases.
The source code is under the src directory.
- Use the
Makefiletargets, e.g.,make help. More details are in Development below. - Runnable demos in
examples/(trymake consortium-demo). - Consortium training prototype in
src/tapestry/training/consortium/(trymake consortium-demoandmake consortium-tests). - Contrib experiment metrics for the consortium prototype in
contrib/jneums-consortium-experiment/(trymake consortium-experiment).
The technical documentation lives under tech-docs:
- Architecture
- The TVA methodology: phased outputs (stakeholder map through design goals), architectural options and core thesis, plus:
- Governance
- Strategic Plan
- Reference Materials (e.g. training paradigms)
- Work Groups
For repo layout, conventions, and where to find implementation code, see AGENTS.md.
This project uses uv for Python package management.
On macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | shOn Windows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"The rest of the steps discussed next are automated using make. Try the following:
make one-time-setupThe one-time-setup target runs the following command (but it only works on macOS or Linux). You can also do this manually:
On macOS/Linux:
uv venv
source .venv/bin/activateOn Windows:
uv venv
.venv\Scripts\activateThe one-time-setup target runs the first of the following commands (but it only works on macOS or Linux). You can also run either command manually:
uv pip install -e ".[dev]" # full development dependencies
uv pip install -e . # minimum dependenciesWe use pytest for testing. The easiest way to run the test suite is using make:
make unit-tests # or just tests; they are currently the same.This runs the following commands, which you can run yourself if you prefer:
cd src
uv run python -m pytest tests -qUse either of the following commands to format the Python code with black:
make format
# or
uv run black srcUse either of the following commands to lint the Python code with ruff and pylint:
make lint
# or
uv run ruff check src
uv pylint srcUse either of the following commands to type check the Python code with ty:
make type-check
# or
uv run ty srcThere is also a "watch" option that keeps ty running as you fix mistakes and save the files:
make type-check-watch
# or
uv run ty --watch srcBefore submitting a PR, please run the format, lint, and type checking commands, then run the tests. Make sure everything passes cleanly! Use the convenient make target before-pr, or run the individual commands above:
make before-pr # Equivalent to 'make format lint type-check tests'
make format-lint-type-check # Equivalent to 'make format lint type-check'Note
Make sure to read Getting Involved below before submitting a PR.
In addition to the top-level directories tech-docs, discussed above, docs, discussed below, and contrib, the staging area for contributed ideas and techniques, the code structure is as follows. At this time, there are three major subsystems:
datafor all data governance and management capabilities.trainingfor all distributed training and tuning capabilities.infrastructurefor all underlying infrastructure.
tapestry/
├── contrib/ # Contributed ideas & techniques, proposed via PR
├── src/
│ └── tapestry/
│ └── data/
│ └── infrastructure/
│ └── training/
│ └── tests
│ └── tapestry/
│ └── data/
│ └── infrastructure/
│ └── training/
We welcome contributions as pull requests, issues, and discussions.
See CONTRIBUTING.md for guidelines. In particular, read this section on using DCO with any commits.
Have an idea, technique, or experiment you'd like the project to consider? The contrib/ directory is a lightweight staging area where contributors can propose work via a PR into their own subdirectory. See contrib/README.md for the simple workflow and contribution policy.
You can also join one or more work groups that are being organized to identify requirements in several areas and to start the engineering work to prototype and test ideas, followed by the initial implementation iterations. Details are are being documented in tech-docs/work-groups/.
All code contributions are licensed under the Apache 2.0 LICENSE (which is also in this repo, LICENSE.Apache-2.0).
All documentation contributions are licensed under the Creative Commons Attribution 4.0 International (which is also in this repo, LICENSE.CC-BY-4.0).
All data contributions are licensed under the Community Data License Agreement - Permissive - Version 2.0 (which is also in this repo, LICENSE.CDLA-2.0).
We use the "Developer Certificate of Origin" (DCO).
Warning
Before you make any git commits with changes, understand what's required for DCO.
See the contributing guide section on DCO for details. In practical terms, supporting this requirement means you must use the -s flag with your git commit commands.
The website for this repository provides another way to discover and navigate the technical documentation content in tech-docs. However, at this time, the site mostly just points to the content in tech-docs. The website sources are in the docs directory.
The website is published using GitHub Pages, where the pages are written in Markdown and served using Jekyll. See GITHUB_PAGES.md for all the details.

