Skip to content

Latest commit

 

History

History
152 lines (117 loc) · 7.47 KB

File metadata and controls

152 lines (117 loc) · 7.47 KB

Coding Agent Instructions (repository-wide, language-agnostic)

These instructions guide coding agents to generate changes consistent with this repository's conventions, regardless of programming language.

Glossary

  • Tunables: user-adjustable parameters that shape behavior, exposed via options or configuration files.
  • Canonical defaults: the single, authoritative definition of all tunables and their defaults.

Implementation guidance

  • Before coding:
    • Perform a comprehensive inventory of the codebase. Search for and read:
      • README.md, CONTRIBUTING.md, and all other documentation files.
      • code files related to the task.
    • Identify existing code architecture, design patterns, canonical defaults, naming patterns and coding styles.
  • When coding:
    • Follow the core principles below.
    • Follow identified design patterns, naming patterns and coding styles.
  • After coding:
    • Ensure changes pass quality gates below.
  • When adding a tunable:
    • Add to canonical defaults with safe value.
    • Ensure the options and configuration section below is respected.
    • Update documentation and serialization.
  • When implementing simulation or synthetic data generation:
    • Ensure the simulation and synthetic data generation section below is respected.
  • When implementing analytical methods:
    • Follow statistical conventions below.
  • When refactoring:
    • Keep public APIs stable; provide aliases if renaming unless explicitly requested.
    • Update code, tests, and documentation atomically.
  • When documenting:
    • Follow documentation conventions below.

Core principles

  • Design patterns: prefer established patterns (e.g., factory, singleton, strategy) for code organization and extensibility.
  • Algorithmic: prefer algorithms or heuristics solving the problem while minimizing time and space complexity.
  • DRY: avoid duplication of logic, data, and naming. Factor out commonalities.
  • Single source of truth: maintain a canonical defaults map for configuration tunables. Derive all user-facing options automatically.
  • Naming coherence: prefer semantically accurate names across code, documentation, directories, and outputs. Avoid synonyms that create ambiguity.
  • English-only: code, tests, logs, comments, and documentation must be in English.
  • Small, verifiable changes: prefer minimal diffs that keep public behavior stable unless explicitly requested.
  • Tests-first mindset: add or update minimal tests before refactoring or feature changes.
  • Documentation standards: must follow established standards for programming languages.

Options and configuration

  • Dynamic generation: derive CLI and configuration options automatically from canonical defaults. Avoid manual duplication.
  • Merge precedence: defaults < user options < explicit overrides (highest precedence). Never silently drop user-provided values.
  • Validation: enforce constraints (choices, ranges, types) at the option layer with explicit typing.
  • Help text: provide concrete examples for complex options, especially override mechanisms.

Statistical conventions

  • Hypothesis testing: use a single test statistic (e.g., t-test) when possible.
  • Divergence metrics: document direction explicitly (e.g., KL(A||B) vs KL(B||A)); normalize distributions; add numerical stability measures.
  • Effect sizes: report alongside test statistics and p-values; use standard formulas; document directional interpretation.
  • Distribution comparisons: use multiple complementary metrics (parametric and non-parametric).
  • Correlations: prefer robust estimators; report the correlation estimate with a confidence interval (parametric or bootstrap) when feasible.
  • Uncertainty quantification: use confidence intervals or credible intervals methods when feasible.
  • Normality tests: combine visual diagnostics (e.g., QQ plots) with formal tests when assumptions matter.
  • Multiple testing: document corrections or acknowledge their absence.

Simulation and synthetic data generation

  • Realism: correlate synthetic outcomes with causal factors to reflect plausible behavior.
  • Reproducibility: use explicit random seeds; document them.
  • Edge cases: test empty datasets, constants, extreme outliers, boundary conditions.
  • Numerical safety: guard against zero/NaN/Inf; validate intermediate results.

Reporting conventions

  • Structure: start with run configuration, then stable section order for comparability.
  • Format: use structured formats (e.g., tables) for metrics; avoid free-form text for data.
  • Interpretation: include threshold guidelines; avoid overclaiming certainty.
  • Artifacts: timestamp outputs; include configuration metadata.

Documentation conventions

  • Clarity: plain, unambiguous language; avoid marketing jargon and speculation.
  • Concision: remove boilerplate; state facts directly without redundant phrasing.
  • Structure: use consistent section ordering; follow stable patterns for comparable content.
  • Timeliness: document current state; exclude historical evolution (except brief API breaking change notes).
  • Terminology: use correct and consistent terminology; distinguish clearly between related concepts.
  • Exhaustivity: cover all user-facing behavior and constraints; omit internal implementation details unless necessary for usage.
  • Pertinence: include information that aids understanding or usage; remove tangential content.
  • No duplication: maintain single authoritative documentation source; reference other sources rather than copying.

Documentation serves as an operational specification, not narrative prose.

Python conventions

  • Naming: Use snake_case for variables/functions/methods/modules, PascalCase for classes, SCREAMING_SNAKE_CASE for constants.
  • Type hints: Annotate all function signatures; use mypy for static type checking.
  • Enumerations: Prefer StrEnum for string-valued enumerations.
  • Error handling: Use specific exception types; avoid bare except.
  • Formatting and linting: Use ruff for formatting and linting; follow rules configured in pyproject.toml.
  • Testing: Use pytest; use plain assert and pytest.raises for error cases.

Quality gates

  • Documented build/lint/type checks pass (where applicable).
  • Documented tests pass (where applicable).
  • Documentation updated to reflect changes when necessary.
  • Logs use appropriate levels (error, warn, info, debug).
  • Pull request title and commit messages follow Conventional Commits format.

Examples

Naming coherence

Good (consistent style, clear semantics):

threshold_value = 0.06
processing_mode = "piecewise"

Bad (mixed styles, ambiguous):

thresholdValue = 0.06    # inconsistent case style
threshold_aim = 0.06     # synonym creates ambiguity

Dynamic option generation

DEFAULT_PARAMS = {
    "threshold_value": 0.06,
    "processing_mode": "piecewise",
}

def add_cli_options(parser):
    for key, value in DEFAULT_PARAMS.items():
        parser.add_argument(f"--{key}", type=type(value), default=value)

Statistical reporting

| Metric      | Value | Interpretation        |
| ----------- | ----- | --------------------- |
| KL(A‖B)     | 0.023 | < 0.1: low divergence |
| Effect size | 0.12  | small to medium       |

By following these instructions, coding agents should propose changes that are consistent and maintainable across languages.