This file provides guidance to AI agents (including Claude Code, Cursor, and other LLM-powered tools) when working with code in this repository.
- ALL tests MUST pass for code to be considered complete and working
- Never describe code as "working as expected" if there are ANY failing tests
- Even if specific feature tests pass, failing tests elsewhere indicate broken functionality
- Changes that break existing tests must be fixed before considering implementation complete
- A successful implementation must pass linting, type checking, AND all existing tests
gp-sphinx (gp_sphinx) is a shared Sphinx documentation platform for Python projects. It consolidates duplicated docs configuration, extensions, theme settings, and workarounds from 14+ repositories into a single reusable package.
Key features:
merge_sphinx_config()API for shared defaults with per-project overrides- Shared extension list (autodoc, intersphinx, myst_parser, sphinx_design, etc.)
- Shared Furo theme configuration (CSS variables, fonts, sidebar, footer)
- Bundled workarounds (tabs.js removal, spa-nav.js injection)
- Shared font configuration (IBM Plex via Fontsource)
This project uses:
- Python 3.10+
- Sphinx 8.1+ (required for the typed
env.domains.<name>_domainaccessors) - uv for dependency management
- ruff for linting and formatting
- mypy for type checking
- pytest for testing
- pytest-watcher for continuous testing
# Install dependencies
uv sync --all-packages
# Install with development dependencies
uv sync --all-packages --all-extras --group dev# Run all tests
just test
# or directly with pytest
uv run pytest
# Run a single test file
uv run pytest tests/test_config.py
# Run a specific test
uv run pytest tests/test_config.py::test_merge_sphinx_config
# Run tests with test watcher
just start
# or
uv run ptw .
# Run tests with doctests
uv run ptw . --now --doctest-modules# Run ruff for linting
just ruff
# or directly
uv run ruff check .
# Format code with ruff
just ruff-format
# or directly
uv run ruff format .
# Run ruff linting with auto-fixes
uv run ruff check . --fix --show-fixes
# Run mypy for type checking
just mypy
# or directly
uv run mypy src tests
# Watch mode for linting (using entr)
just watch-ruff
just watch-mypyFollow this workflow for code changes:
- Format First:
uv run ruff format . - Run Tests:
uv run pytest - Run Linting:
uv run ruff check . --fix --show-fixes - Check Types:
uv run mypy - Verify Tests Again:
uv run pytest
# Build documentation
just build-docs
# Start documentation server with auto-reload
just start-docsgp-sphinx provides a shared configuration layer for Sphinx documentation:
gp_sphinx/
__init__.py # Package entry point
config.py # merge_sphinx_config() and config building logic
defaults.py # Default extensions, theme options, MyST config, fonts
assets/ # Shared JS/CSS (spa-nav.js, workarounds)
_compat.py # Sphinx/docutils version compatibility
-
Config (
src/gp_sphinx/config.py)merge_sphinx_config()API for building complete Sphinx config- Deep-merge support for theme options
- Per-project override mechanism
-
Defaults (
src/gp_sphinx/defaults.py)DEFAULT_EXTENSIONSlistDEFAULT_THEME_OPTIONSdictDEFAULT_MYST_EXTENSIONSlistDEFAULT_FONT_FAMILIESdict- Shared sidebar configuration
A workspace package's own CSS must style every class its Python code
emits. If a directive appends SAB.X (or any gp-sphinx-* class) to a
node, the package's own CSS file carries a rule targeting SAB.X.
Cross-package reuse of a shared class (e.g., gp-sphinx-badge
styled in sphinx-ux-badges) is fine; cross-package dependence —
where your feature only renders correctly because a sibling package
happens to be loaded — is not. A downstream user installing a single
extension standalone must get the correct visual result.
All tests are plain functions (def test_*). No class TestFoo: groupings. Every test
function and every NamedTuple fixture class must be fully type-annotated; mypy runs as
part of CI.
Run continuously while developing:
$ uv run ptw .Include doctests:
$ uv run ptw . --now --doctest-modulesPick the lightest level that exercises the behavior. Never reach for a full Sphinx build when a docutils node test suffices — an integration build takes 2–10 s, a node test runs in microseconds.
| Level | When to use |
|---|---|
| Pure unit | Transforming strings, dicts, dataclasses — no nodes, no Sphinx |
| Docutils tree unit | Testing transforms/visitors/renderers by constructing nodes.* directly |
| Snapshot unit | Same as docutils tree, but output is large or complex — assert via snapshot_doctree |
Sphinx integration (@pytest.mark.integration) |
Any test that constructs a Sphinx app. build_shared_sphinx_result / build_isolated_sphinx_result with any builder — including buildername="dummy" — counts. If the test touches env.domains.*, walks a built doctree, or asserts on result.warnings, it is integration. |
Every test function must annotate all parameters and the return type:
def test_something(value: str, expected: int) -> None:
assert compute(value) == expectedEvery NamedTuple fixture class must annotate all fields.
Use t.NamedTuple for any parametrized test with three or more inputs. Two wiring
styles are in use — pick whichever reads more clearly for the case at hand.
Style A — unpack all fields (dominant; used in test_unit.py, lexer tests, etc.)
Each field becomes a typed parameter in the test function, which makes the signature self-documenting:
import typing as t
import pytest
class FooFixture(t.NamedTuple):
"""Test case for foo()."""
test_id: str # always the first field
input: str
expected: str
_FOO_FIXTURES: list[FooFixture] = [
FooFixture(test_id="basic", input="a", expected="A"),
FooFixture(test_id="empty", input="", expected=""),
]
@pytest.mark.parametrize(
list(FooFixture._fields),
_FOO_FIXTURES,
ids=[f.test_id for f in _FOO_FIXTURES],
)
def test_foo(test_id: str, input: str, expected: str) -> None:
"""foo() uppercases its input."""
assert foo(input) == expectedStyle B — pass whole struct as case (used in test_directives.py,
test_nodes.py, when the struct is reused in assertion messages or has many fields):
@pytest.mark.parametrize(
"case",
_FOO_FIXTURES,
ids=lambda c: c.test_id,
)
def test_foo(case: FooFixture) -> None:
"""foo() uppercases its input."""
assert foo(case.input) == case.expectedNaming conventions:
test_id: stris always the first field- Fixture list:
_FOO_FIXTURES(module-private, all-caps) - Fixture class:
FooFixtureorFooCase— neverTestFoo
Test transforms, visitors, and renderers by constructing docutils.nodes and
sphinx.addnodes objects directly. Follow the pattern in
tests/ext/layout/test_transforms.py:
from docutils import nodes
from sphinx import addnodes
def _make_desc(
*content_children: nodes.Node,
domain: str = "py",
objtype: str = "function",
) -> addnodes.desc:
desc = addnodes.desc(domain=domain, objtype=objtype)
desc += addnodes.desc_signature()
content = addnodes.desc_content()
for child in content_children:
content += child
desc += content
return desc
def test_transform_wraps_content_runs() -> None:
"""_wrap_content_runs groups consecutive content nodes."""
desc = _make_desc(nodes.paragraph("", "summary"), nodes.field_list())
_wrap_content_runs(desc)
assert any(isinstance(n, ContentGroup) for n in desc[1])- Put
_make_*()builder helpers at the top of the test file, near the tests that use them. - Never import
sphinx.application.Sphinxin a pure tree test. - Use
nodes.document()(with a minimalsettingsobject fromdocutils.frontend.OptionParser) only when the transform requires a real document root.
Use when the expected output is too large or fragile to inline. The three fixtures
(from tests/_snapshots.py, loaded automatically via pytest_plugins) normalize their
inputs before asserting so that build-path churn and docutils version noise do not cause
spurious failures:
snapshot_doctree(doctree, *, name=None, roots=())— normalizes anodes.Nodesnapshot_html_fragment(fragment, *, name=None, roots=())— strips ANSI, normalizes whitespacesnapshot_warnings(warnings, *, name=None, roots=())— strips noise lines and ANSI codes
import typing as t
def test_layout_render(
snapshot_doctree: t.Callable[..., None],
) -> None:
"""Transform produces a stable doctree."""
desc = _make_large_signature_desc()
on_doctree_resolved(desc)
snapshot_doctree(desc)Update stored snapshots after intentional output changes:
$ uv run pytest --snapshot-updateUse the harness in tests/_sphinx_scenarios.py. The key types and helpers:
SphinxScenario(files=(...), confoverrides={}, buildername="html")— describes the synthetic project;buildernamedefaults to"html", override for text buildsScenarioFile(relative_path, contents, substitute_srcdir=False)— one source filebuild_shared_sphinx_result(cache_root, scenario, *, purge_modules=())— builds once per content-hash digest;purge_modulesremoves named modules fromsys.modulesbefore the initial build to prevent stale import cache — required when scenario files inject a Python module intosys.pathbuild_isolated_sphinx_result(cache_root, tmp_path, scenario, *, purge_modules=())— fresh build per test, for mutating assertionsderive_sphinx_scenario_cache_root(tmp_path)— derives a stable per-session cache root from anytmp_pathby using its parent directorycopy_scenario_tree(cache_root, scenario, destination_root)— materialize source files into a directory without running a Sphinx buildget_doctree(result, docname, post_transforms=False)— deep-copied doctree from the built environmentread_output(result, filename)— reads a built output file as a string
Always use a module-scoped (or session-scoped) fixture for the build — never
function-scoped — so the expensive Sphinx build is shared across all tests in the
module. Follow the pattern in tests/ext/typehints_gp/test_integration.py:
import textwrap
import pytest
from tests._sphinx_scenarios import (
SCENARIO_SRCDIR_TOKEN,
ScenarioFile,
SharedSphinxResult,
SphinxScenario,
build_shared_sphinx_result,
read_output,
)
_CONF_PY = textwrap.dedent(
"""\
import sys
sys.path.insert(0, r"__SCENARIO_SRCDIR__")
extensions = ["sphinx.ext.autodoc", "my_extension"]
"""
)
_INDEX_RST = textwrap.dedent(
"""\
Demo
====
.. autofunction:: my_module.my_function
"""
)
@pytest.fixture(scope="module")
def my_html_result(
tmp_path_factory: pytest.TempPathFactory,
) -> SharedSphinxResult:
"""Build a minimal Sphinx project using my_extension."""
cache_root = tmp_path_factory.mktemp("my-ext-html")
scenario = SphinxScenario(
files=(
ScenarioFile("index.rst", _INDEX_RST),
ScenarioFile(
"conf.py",
_CONF_PY.replace("__SCENARIO_SRCDIR__", SCENARIO_SRCDIR_TOKEN),
substitute_srcdir=True,
),
),
)
return build_shared_sphinx_result(
cache_root,
scenario,
purge_modules=("my_module", "my_extension"),
)
@pytest.mark.integration
def test_my_feature_appears_in_html(my_html_result: SharedSphinxResult) -> None:
"""Extension renders the expected markup."""
html = read_output(my_html_result, "index.html")
assert "my-feature" in htmlRules:
- Always mark with
@pytest.mark.integration - Always
scope="module"orscope="session"on the build fixture — neverscope="function" - Use
textwrap.dedent("""...""")for inline source strings - Use
SCENARIO_SRCDIR_TOKEN+substitute_srcdir=Trueforsys.pathinjection inconf.py
See also:
notes/test-analysis.md— profiling data, 9.5x speedup rationale, and the per-package migration history for the shared autodoc stack.
| Fixture | Source | When to use |
|---|---|---|
tmp_path |
pytest built-in | Per-test temp directory |
tmp_path_factory |
pytest built-in | Session/module fixtures that create temp dirs |
monkeypatch |
pytest built-in | Env vars, module attributes, sys.modules patching |
caplog |
pytest built-in | Log assertions; use caplog.records, not caplog.text |
snapshot_doctree |
tests/_snapshots.py |
Normalized doctree snapshot assertion |
snapshot_html_fragment |
tests/_snapshots.py |
Normalized HTML string snapshot assertion |
snapshot_warnings |
tests/_snapshots.py |
Normalized Sphinx warning snapshot assertion |
spf_suite_root, spf_doctree_root, spf_html_root |
tests/ext/pytest_fixtures/conftest.py |
Session roots for sphinx-pytest-fixture ext tests |
simple_parser, parser_with_groups, … |
tests/ext/argparse/conftest.py |
ArgumentParser permutations for argparse tests |
- No
class TestFoo:groupings — use descriptive function names and file organization instead - No
unittest.mock.patch— usemonkeypatch - No
tempfile.mkdtemp()— usetmp_path - No
Sphinx()instantiation in a unit test — build docutils nodes directly - No unannotated test functions — every parameter and
-> Nonemust be typed - No
# doctest: +SKIPin module doctests (see Doctests section) - No inline tuples in
parametrizewhen there are three or more fields — useNamedTuple - No function-scoped Sphinx build fixtures — always module- or session-scoped
All CSS classes, custom properties, and MyST directive names added by a
workspace package live under the gp-sphinx-* namespace:
- Tier A (shared concepts) —
gp-sphinx-<concept>(e.g.,gp-sphinx-badge,gp-sphinx-toolbar). Used by multiple packages. - Tier B (package-owned) —
gp-sphinx-<pkg>__<thing>BEM-style (e.g.,gp-sphinx-fastmcp__safety-readonly,gp-sphinx-pytest-fixtures__fixture-index). - Modifiers — axis-value pairs
--<axis>-<value>(e.g.,gp-sphinx-badge--size-xs,gp-sphinx-badge--type-function). - Custom properties — mirror the class namespace:
--gp-sphinx-<pkg>-<token>. Furo-owned variables (--color-api-*,--font-stack--*, etc.) stay untouched. - Specificity — prefer chained class selectors
(
.gp-sphinx-badge.gp-sphinx-badge--dense); keep selectors at 0,3,0 max.
Key highlights:
- Use namespace imports for standard library modules:
import enuminstead offrom enum import Enum- Exception:
dataclassesmodule may usefrom dataclasses import dataclass, fieldfor cleaner decorator syntax - This rule applies to Python standard library only; third-party packages may use
from X import Y
- Exception:
- For typing, use
import typing as tand access via namespace:t.NamedTuple, etc. - Use
from __future__ import annotationsat the top of all Python files
Prefer the typed accessors on env.domains over env.get_domain(<literal>):
env.domains.standard_domain— notenv.get_domain("std")env.domains.python_domain— notenv.get_domain("py")- Similarly:
c_domain,cpp_domain,javascript_domain,restructuredtext_domain,changeset_domain,citation_domain,index_domain,math_domain
The typed accessors return the concrete domain subclass
(StandardDomain, PythonDomain, etc.), so mypy sees subclass-specific
attributes (progoptions, add_program_option, data["objects"], …)
without t.cast or # type: ignore. The accessors were added in Sphinx
8.1 (_DomainsContainer), which is the workspace floor.
Follow NumPy docstring style for all functions and methods:
"""Short description of the function or class.
Detailed description using reStructuredText format.
Parameters
----------
param1 : type
Description of param1
param2 : type
Description of param2
Returns
-------
type
Description of return value
"""All functions and methods MUST have working doctests. Doctests serve as both documentation and tests.
CRITICAL RULES:
- Doctests MUST actually execute - never comment out function calls or similar
- Doctests MUST NOT be converted to
.. code-block::as a workaround (code-blocks don't run) - If you cannot create a working doctest, STOP and ask for help
Available tools for doctests:
doctest_namespacefixtures (from conftest.py):tmp_path- Ellipsis for variable output:
# doctest: +ELLIPSIS - Update
conftest.pyto add new fixtures todoctest_namespace
# doctest: +SKIP is NOT permitted - it's just another workaround that doesn't test anything.
When output varies, use ellipsis:
>>> result = merge_sphinx_config(project="test", version="1.0", copyright="2026")
>>> result["project"]
'test'
>>> len(result["extensions"]) > 10 # doctest: +ELLIPSIS
TrueThese rules guide future logging changes; existing code may not yet conform.
- Use
logging.getLogger(__name__)in every module - Add
NullHandlerin library__init__.pyfiles - Never configure handlers, levels, or formatters in library code -- that's the application's job
logger.debug("msg %s", val) not f-strings. Two rationales:
- Deferred string interpolation: skipped entirely when level is filtered
- Aggregator message template grouping:
"Running %s"is one signature grouped x10,000; f-strings make each line unique
When computing val itself is expensive, guard with if logger.isEnabledFor(logging.DEBUG).
| Level | Use for | Examples |
|---|---|---|
DEBUG |
Internal mechanics | Config merge steps, extension resolution |
INFO |
User-visible operations | Config loaded, extensions resolved |
WARNING |
Recoverable issues, deprecation | Unknown extension, deprecated option |
ERROR |
Failures that stop an operation | Invalid config, missing dependency |
- Lowercase, past tense for events:
"config merged","extension resolved" - No trailing punctuation
- Keep messages short; put details in
extra, not the message string
- Use
logger.exception()only insideexceptblocks when you are not re-raising - Use
logger.error(..., exc_info=True)when you need the traceback outside anexceptblock - Avoid
logger.exception()followed byraise-- this duplicates the traceback
Assert on caplog.records attributes, not string matching on caplog.text:
- Scope capture:
caplog.at_level(logging.DEBUG, logger="gp_sphinx.config") - Filter records rather than index by position
caplog.record_tuplescannot access extra fields -- always usecaplog.records
- f-strings/
.format()in log calls - Catch-log-reraise without adding new context
print()for diagnostics- Logging secret env var values (log key names only)
Format commit messages as:
Scope(type[detail]): concise description
why: Explanation of necessity or impact.
what:
- Specific technical changes made
- Focused on a single topic
The blank line between the why: block and the what: block is
optional — useful when the why: body runs to multiple lines and the
two sections benefit from visual separation.
Common commit types:
- feat: New features or enhancements
- fix: Bug fixes
- refactor: Code restructuring without functional change
- docs: Documentation updates
- chore: Maintenance (dependencies, tooling, config)
- test: Test-related updates
- style: Code style and formatting
- py(deps): Dependencies
- py(deps[dev]): Dev Dependencies
- ai(rules[AGENTS]): AI rule updates
- ai(claude[rules]): Claude Code rules (CLAUDE.md)
- ai(claude[command]): Claude Code command changes
Example:
config(feat[merge]): Add deep-merge support for theme options
why: Enable per-project theme overrides without replacing entire dict
what:
- Add deep_merge() helper for nested dict merging
- Update merge_sphinx_config() to deep-merge theme_options
- Add tests for nested override behavior
For multi-line commits, use heredoc to preserve formatting:
git commit -m "$(cat <<'EOF'
feat(Component[method]) add feature description
why: Explanation of the change.
what:
- First change
- Second change
EOF
)"When writing documentation (README, CHANGES, docs/), follow these rules for code blocks:
One command per code block. This makes commands individually copyable. For sequential commands, either use separate code blocks or chain them with && or ; and \ continuations (keeping it one logical command).
Put explanations outside the code block, not as comments inside.
Good:
Run the tests:
$ uv run pytestRun with coverage:
$ uv run pytest --covBad:
# Run the tests
$ uv run pytest
# Run with coverage
$ uv run pytest --covThese rules apply to shell commands in documentation (README, CHANGES, docs/), not to Python doctests.
Use console language tag with $ prefix. This distinguishes interactive commands from scripts and enables prompt-aware copy in many terminals.
Good:
$ uv run pytestBad:
uv run pytestSplit long commands with \ for readability. Each flag or flag+value pair gets its own continuation line, indented. Positional parameters go on the final line.
Good:
$ pipx install \
--suffix=@next \
--pip-args '\--pre' \
--force \
'gp-sphinx'Bad:
$ pipx install --suffix=@next --pip-args '\--pre' --force 'gp-sphinx'When stuck in debugging loops:
- Pause and acknowledge the loop
- Minimize to MVP: Remove all debugging cruft and experimental code
- Document the issue comprehensively for a fresh approach
- Format for portability (using quadruple backticks)