WorldForge changes should keep the provider boundary truthful. A contribution is not complete when the code compiles; it is complete when the public contract, tests, docs, and agent context agree.
WorldForge follows the standard GitHub fork-and-pull-request flow against main.
- Fork the repository and clone your fork.
- Create a topic branch:
git checkout -b feat/<short-description>(orfix/,docs/,chore/). - Run
uv sync --group devand the validation gate below before opening the PR. - Open the pull request against
AbdelStark/worldforge:main. Keep the title imperative and under ~70 characters; describe user-visible changes and link related issues in the body. - Address review feedback with new commits on the same branch; squash-merging is the default.
Direct pushes to main are reserved for maintainers performing release operations.
uv sync --group dev
uv run worldforge doctor
uv run worldforge examplesRun the focused gate while iterating, then run the full gate before publishing work.
uv lock --check
uv run ruff check src tests examples scripts
uv run ruff format --check src tests examples scripts
uv run python scripts/generate_provider_docs.py --check
uv run python scripts/check_docs_commands.py
uv run python scripts/check_docs_snippets.py
uv run python scripts/manage_fixture_snapshots.py --format markdown
uv run python scripts/check_wrapper_portability.py
uv run python scripts/check_optional_import_boundaries.py
uv run python scripts/check_core_performance.py
uv run python scripts/generate_quality_dashboard.py
uv run mkdocs build --strict
uv run pytest
uv run --extra harness pytest --cov=src/worldforge --cov-report=term-missing --cov-fail-under=90
bash scripts/test_package.sh
uv build --out-dir dist --clear --no-build-logsBefore tagging or publishing, run the same gate plus the locked dependency audit documented in docs/src/playbooks.md.
bash scripts/test_package.sh is the packaging contract check. It builds the wheel and sdist with uv,
checks the distribution contents, installs the wheel into an isolated virtual environment, and runs
the root test suite against the installed package.
uv run python scripts/generate_provider_docs.py --check plus uv run mkdocs build --strict
verifies the generated provider catalog and builds the MkDocs Material site in strict mode.
uv run python scripts/manage_fixture_snapshots.py --format markdown validates the fixture
snapshot manifest for capability fixtures, provider payload fixtures, benchmark inputs, scenarios,
and scene artifact fixtures; refresh it explicitly with --write after an intended fixture change.
uv run python scripts/generate_quality_dashboard.py reads existing release evidence,
dependency-audit evidence, and core-performance output to write a local JSON/Markdown dashboard; it
does not execute gates or replace release evidence.
The exact release gate is documented in docs/src/playbooks.md.
Before editing an issue, pick the closest Contributor Task Starters entry. The starter packs map provider, docs-only, demo, artifact/report, evaluation/benchmark, and CLI work to likely files, forbidden shortcuts, validation commands, evidence artifacts, docs/changelog expectations, and review checklists.
src/worldforge/models.py: public domain models, validation helpers, request policies, and serialization contracts.src/worldforge/framework.py:WorldForge,World, persistence, prediction, comparison, planning, diagnostics, and evaluation entry points.src/worldforge/providers/: provider interfaces, catalog, concrete adapters, and scaffolds.src/worldforge/testing/: reusable provider contract helpers for adapter packages.src/worldforge/evaluation/: deterministic evaluation suites and report rendering.src/worldforge/benchmark.py: provider benchmark harness.src/worldforge/observability.py: provider event sinks.docs/src/: user docs, architecture, playbooks, provider docs, and API notes.tests/: unit, contract, CLI, packaging, and regression tests.examples/: runnable checkout examples and compatibility wrappers.scripts/: docs generation, provider scaffolding, package validation, and optional smokes.
- Keep public APIs typed, serializable, and explicit about failure modes.
- Fail fast on invalid inputs instead of silently coercing them.
- Use
ProviderErrorfor provider/runtime failures. - Use
WorldForgeErrorfor invalid caller input and public model validation failures. - Use
WorldStateErrorfor malformed persisted or provider-supplied world state. - Do not advertise a provider capability that is not implemented end to end.
- Keep optional model runtimes, robot stacks, checkpoints, datasets, and credentials out of the base dependency set and out of the repository.
- Keep local JSON persistence documented as single-writer unless a dedicated persistence adapter is designed and reviewed.
- Keep docs aligned with the live package surface.
- Start from docs/src/provider-authoring-guide.md.
- Use
scripts/scaffold_provider.pyfor new adapter skeletons. - Declare only the capabilities the adapter actually supports.
- Fail clearly on missing credentials, optional dependencies, malformed inputs, malformed upstream outputs, partial outputs, expired artifacts, and unsupported flows.
- Register the provider only when auto-detection is safe.
- Add fixture-driven tests for happy paths and every documented failure mode.
- Run
worldforge.testing.assert_provider_contract()in adapter tests for supported surfaces. - Update provider docs, generated provider catalog tables, README or API docs when public
behavior changes,
CHANGELOG.md, andAGENTS.mdwhen future contributors need new context.
Public behavior changes need docs in the same branch. Use this routing:
- README for the front-door story and common commands.
docs/src/architecture.mdfor new components, flows, or ownership boundaries.docs/src/playbooks.mdfor operator or maintainer runbooks.- provider pages for provider-specific config, limits, examples, failure modes, and validation.
docs/src/api/python.mdfor public API and exception behavior.CHANGELOG.mdfor user-visible changes.AGENTS.mdfor repo commands, constraints, gotchas, and agent-facing context.
Use labels to make the roadmap stream, capability surface, severity, and release scope visible before implementation starts.
Roadmap stream labels:
stream: provider-evidence: provider selection, adapter runtime contracts, provider promotion, runtime manifests, provider docs, optional runtime smokes, and upstream research validation.stream: evidence-integrity: evaluation suites, benchmarks, budgets, reports, preserved run evidence, release evidence, artifact bundles, provenance, and claim-supporting docs.stream: ops-authoring: operator workflows, TheWorldHarness, adapter authoring loops, reference hosts, local persistence, runbooks, drills, and contributor experience.
Capability labels are predict, generate, reason, embed, transfer, score, and policy.
Use them only when the issue touches that public capability surface. Common domain labels include
provider, research, artifacts, evaluation, benchmark, harness, operations,
observability, persistence, security, examples, developer-experience, robotics, and
optional-dependency.
Severity and release-scope labels:
severity: blocking: blocks release-candidate readiness, a public contract, or operator safety.severity: quality: degrades quality or evidence but does not block all development.type: hardening: reliability, validation, redaction, recovery, or safety hardening.releaseandrelease: provider-hardening-rc: release process or named release-candidate scope.
Triage rules:
- Provider runtime or promotion work uses the provider template,
stream: provider-evidence,provider, the capability labels it claims, and any relevantoptional-dependency,robotics,security, orresearchlabels. New runtime families or unclear upstream contracts need a selection record before implementation. Promotion work must cite the provider authoring guide promotion gate, runtime manifest, fixtures, provider docs, and live-smoke or explicit blocker evidence. - Evaluation, benchmark, claim, report, budget, or artifact work uses the eval/benchmark template,
stream: evidence-integrity, and the relevantevaluation,benchmark,artifacts, orobservabilitylabels. Public claims and release-candidate gates need preserved run evidence or release evidence before the issue can be closed. - Operator workflow, local persistence, run workspace, harness, drill, reference host, and adapter
authoring workflow work uses
stream: ops-authoringplusoperations,harness,developer-experience,persistence,reliability, orexamplesas appropriate. These issues should name the command to run, expected success signal, first triage step, and recovery command. - Architecture, persistence-boundary, provider-selection, or runtime-ownership changes need a design record or selection record before broad implementation.
- Security-sensitive reports still route through the private Security tab. Do not open public issues containing vulnerabilities, credentials, signed URLs, private endpoints, or host-local artifacts.
- lint, docs check, tests, coverage, package check, and build pass.
- new behavior has tests, including error paths.
- provider docs and generated catalog tables are current.
- README and API docs reflect public contract changes.
CHANGELOG.mdrecords user-visible changes.AGENTS.mdrecords new constraints, commands, or gotchas.- no secrets,
.envfiles, checkpoints, datasets, generated runtime artifacts, or optional heavy dependencies are committed accidentally.