These instructions guide coding agents to generate changes consistent with this repository's conventions, regardless of programming language.
- Tunables: user-adjustable parameters that shape behavior, exposed via options or configuration files.
- Canonical defaults: the single, authoritative definition of all tunables and their defaults.
- Before coding:
- Perform a comprehensive inventory of the codebase. Search for and read:
- README.md, CONTRIBUTING.md, and all other documentation files.
- code files related to the task.
- Identify existing code architecture, design patterns, canonical defaults, naming patterns and coding styles.
- Perform a comprehensive inventory of the codebase. Search for and read:
- When coding:
- Follow the core principles below.
- Follow identified design patterns, naming patterns and coding styles.
- After coding:
- Ensure changes pass quality gates below.
- When adding a tunable:
- Add to canonical defaults with safe value.
- Ensure the options and configuration section below is respected.
- Update documentation and serialization.
- When implementing simulation or synthetic data generation:
- Ensure the simulation and synthetic data generation section below is respected.
- When implementing analytical methods:
- Follow statistical conventions below.
- When refactoring:
- Keep public APIs stable; provide aliases if renaming unless explicitly requested.
- Update code, tests, and documentation atomically.
- When documenting:
- Follow documentation conventions below.
- Design patterns: prefer established patterns (e.g., factory, singleton, strategy) for code organization and extensibility.
- Algorithmic: prefer algorithms or heuristics solving the problem while minimizing time and space complexity.
- DRY: avoid duplication of logic, data, and naming. Factor out commonalities.
- Single source of truth: maintain a canonical defaults map for configuration tunables. Derive all user-facing options automatically.
- Naming coherence: prefer semantically accurate names across code, documentation, directories, and outputs. Avoid synonyms that create ambiguity.
- English-only: code, tests, logs, comments, and documentation must be in English.
- Small, verifiable changes: prefer minimal diffs that keep public behavior stable unless explicitly requested.
- Tests-first mindset: add or update minimal tests before refactoring or feature changes.
- Documentation standards: must follow established standards for programming languages.
- Dynamic generation: derive CLI and configuration options automatically from canonical defaults. Avoid manual duplication.
- Merge precedence: defaults < user options < explicit overrides (highest precedence). Never silently drop user-provided values.
- Validation: enforce constraints (choices, ranges, types) at the option layer with explicit typing.
- Help text: provide concrete examples for complex options, especially override mechanisms.
- Hypothesis testing: use a single test statistic (e.g., t-test) when possible.
- Divergence metrics: document direction explicitly (e.g., KL(A||B) vs KL(B||A)); normalize distributions; add numerical stability measures.
- Effect sizes: report alongside test statistics and p-values; use standard formulas; document directional interpretation.
- Distribution comparisons: use multiple complementary metrics (parametric and non-parametric).
- Correlations: prefer robust estimators; report the correlation estimate with a confidence interval (parametric or bootstrap) when feasible.
- Uncertainty quantification: use confidence intervals or credible intervals methods when feasible.
- Normality tests: combine visual diagnostics (e.g., QQ plots) with formal tests when assumptions matter.
- Multiple testing: document corrections or acknowledge their absence.
- Realism: correlate synthetic outcomes with causal factors to reflect plausible behavior.
- Reproducibility: use explicit random seeds; document them.
- Edge cases: test empty datasets, constants, extreme outliers, boundary conditions.
- Numerical safety: guard against zero/NaN/Inf; validate intermediate results.
- Structure: start with run configuration, then stable section order for comparability.
- Format: use structured formats (e.g., tables) for metrics; avoid free-form text for data.
- Interpretation: include threshold guidelines; avoid overclaiming certainty.
- Artifacts: timestamp outputs; include configuration metadata.
- Clarity: plain, unambiguous language; avoid marketing jargon and speculation.
- Concision: remove boilerplate; state facts directly without redundant phrasing.
- Structure: use consistent section ordering; follow stable patterns for comparable content.
- Timeliness: document current state; exclude historical evolution (except brief API breaking change notes).
- Terminology: use correct and consistent terminology; distinguish clearly between related concepts.
- Exhaustivity: cover all user-facing behavior and constraints; omit internal implementation details unless necessary for usage.
- Pertinence: include information that aids understanding or usage; remove tangential content.
- No duplication: maintain single authoritative documentation source; reference other sources rather than copying.
Documentation serves as an operational specification, not narrative prose.
- Naming: Use snake_case for variables/functions/methods/modules, PascalCase for classes, SCREAMING_SNAKE_CASE for constants.
- Type hints: Annotate all function signatures; use
mypyfor static type checking. - Enumerations: Prefer
StrEnumfor string-valued enumerations. - Error handling: Use specific exception types; avoid bare
except. - Formatting and linting: Use
rufffor formatting and linting; follow rules configured inpyproject.toml. - Testing: Use
pytest; use plainassertandpytest.raisesfor error cases.
- Documented build/lint/type checks pass (where applicable).
- Documented tests pass (where applicable).
- Documentation updated to reflect changes when necessary.
- Logs use appropriate levels (error, warn, info, debug).
- Pull request title and commit messages follow Conventional Commits format.
Good (consistent style, clear semantics):
threshold_value = 0.06
processing_mode = "piecewise"Bad (mixed styles, ambiguous):
thresholdValue = 0.06 # inconsistent case style
threshold_aim = 0.06 # synonym creates ambiguityDEFAULT_PARAMS = {
"threshold_value": 0.06,
"processing_mode": "piecewise",
}
def add_cli_options(parser):
for key, value in DEFAULT_PARAMS.items():
parser.add_argument(f"--{key}", type=type(value), default=value)| Metric | Value | Interpretation |
| ----------- | ----- | --------------------- |
| KL(A‖B) | 0.023 | < 0.1: low divergence |
| Effect size | 0.12 | small to medium |By following these instructions, coding agents should propose changes that are consistent and maintainable across languages.