This document contains enforceable review rules derived from the cube-harness Constitution.
Each rule maps to a specific directive in the constitution and includes severity levels and code examples for automated review.
All rules are advisory - they provide feedback without blocking merges.
| Severity | Description |
|---|---|
| WARNING | Should be addressed. Author acknowledgment expected. |
| SUGGESTION | Optional improvement. Nice to have. |
Reviews follow this structure:
- Summary - Brief assessment of changes
- Warnings - Issues that should be addressed (with file:line references)
- Suggestions - Optional improvements
- Constitution Compliance - Pass/Needs Attention for each pillar
Severity: WARNING
Breaking changes to the Core API require a Request For Comments (RFC).
Check: If a PR modifies public API signatures in core modules (agent.py, environment.py, tool.py, benchmark.py, core.py), verify there's an associated RFC document or issue reference.
Applies to: Changes that alter method signatures, remove public methods, or change return types in core abstractions.
Severity: WARNING
All imports must be at the top of the module. No imports inside functions or classes.
Bad:
def my_function():
import json # VIOLATION
return json.dumps({})Good:
import json
def my_function():
return json.dumps({})Severity: WARNING
No module-level mutable variables (lists, dicts, sets, or class instances that accumulate state). Logger instances are acceptable.
Bad:
CACHE = {} # VIOLATION - mutable global
def get_value(key):
return CACHE.get(key)Good:
class Cache:
def __init__(self):
self._data = {}
def get(self, key):
return self._data.get(key)Severity: WARNING
Prefer composition (has-a) over inheritance (is-a). Avoid deep inheritance trees where subclasses inherit many unused methods.
Bad:
class MyAgent(BaseAllKnowingAgent): # Inherits 50+ methods
passGood:
class MyAgent(Agent):
def __init__(self, planner: Planner, memory: Memory):
self.planner = planner
self.memory = memorySeverity: WARNING
Features must work in single-process mode for debugging. Agent logic must be testable on a laptop without a cluster.
Check: New features should not hard-require Ray or distributed execution. Provide local fallbacks or mocks where feasible.
Severity: WARNING
New operations should have appropriate logging and tracing. The "Trace" (logs, screenshots, tool outputs, reasoning steps) is a first-class data product.
Check: Long-running operations should be traceable. Errors should include sufficient context for debugging.
Severity: SUGGESTION
Abstractions should expose the raw underlying object when necessary for advanced use cases.
Good:
class BrowserTool:
@property
def raw(self) -> Page:
"""Access the underlying Playwright Page object."""
return self._pageSeverity: WARNING
Experiments must capture sufficient information for reproducibility:
- The exact git commit hash
- The full Configuration object (dumped as YAML/JSON)
- The Docker container ID/hash of the environment (if applicable)
Severity: WARNING
Use established standards instead of inventing new ones:
- LLM calls: Use LiteLLM abstractions (not direct OpenAI/Anthropic SDK calls)
- Tools: Support Model Context Protocol (MCP) where applicable
- Data: Use Agent Data Protocol (ADP) format for traces
Bad:
import openai
response = openai.chat.completions.create(...) # Direct SDK callGood:
from cube_harness.llm import LLM
llm = LLM(config)
response = llm.call(prompt) # Uses LiteLLM internallySeverity: WARNING
All functions must have type hints for parameters and return values, including test functions.
Bad:
def process(data): # Missing types
return dataGood:
def process(data: dict) -> dict:
return dataSeverity: WARNING
New features should have unit tests. The core test suite must run fast (< 5 mins). Slow tests go to nightly builds.
Check: PRs adding new functionality should include corresponding tests in tests/.
Severity: WARNING
AI-assisted code is welcome, but every line must be understood and curated. No blindly pasting large blocks of generated code.
Signs of vibe coding:
- Verbose, repetitive logic that could be simplified
- Unnecessary abstractions or over-engineering
- Code that doesn't match project patterns
- Unexplained "magic" constants or logic
Severity: SUGGESTION
Functions should perform one logical operation and fit on a standard screen (~50-80 lines). Prefer named helpers over inline comments explaining long blocks.
Bad:
def process_data(data):
# 100 lines of mixed responsibilities
...Good:
def process_data(data: dict) -> dict:
validated = _validate_input(data)
transformed = _apply_transformations(validated)
return _format_output(transformed)Severity: SUGGESTION
Prefer a smaller, simpler codebase. If a feature adds significant complexity but is rarely used, reject it. Refactoring to delete code is valuable.
Check: Does this PR add complexity that could be avoided? Could existing code be simplified instead of adding new abstractions?
Severity: WARNING
Auto-CUBE-produced fixes carry # auto-fix(N)↓ … # /auto-fix(N) markers and a
one-line machine-readable footnote at module bottom. N is the PR number
(L0/L1) or design-debt issue number (L2/L3). Spec:
openspec/specs/auto-fix/spec.md.
Check (Tier-2 semantic review; Tier-1 structural checks are in
scripts/auto_fix_lint.py): when a diff touches code inside an
auto-fix(N) region, crosses a marker boundary, or removes/edits a
marker or footnote, treat it as possibly rotten:
- Pull the PR or design-debt issue at
N; verify the fix's stated invariant still holds after the change. - If the change is unrelated churn → require the footnote
hash=be re-stamped (acknowledgement, not prevention — never silently leave drift). - If the fix now looks subsumed by surrounding code (the real fix landed) → recommend promoting the band-aid into permanent code and closing the design-debt issue, rather than keeping a dead marker.
- Orphaned marker (no footnote) or orphaned footnote (no marker) → flag.
Not a hard block: flag for acknowledgement. Hard-failing makes auto-fix blocks radioactive and invites marker deletion.
| Rule ID | Pillar | Severity | Summary |
|---|---|---|---|
| TC-001 | Team Contracts | WARNING | RFC for breaking API changes |
| EX-001 | Explicitness | WARNING | Module-level imports only |
| EX-002 | Explicitness | WARNING | No global mutable state |
| EX-003 | Explicitness | WARNING | Composition over inheritance |
| SR-001 | Scalability | WARNING | Local-dev compatible |
| SR-002 | Scalability | WARNING | Trace-first engineering |
| SR-003 | Scalability | SUGGESTION | Escape hatch for raw access |
| PS-001 | Protocols | WARNING | Hermetic reproducibility |
| PS-002 | Protocols | WARNING | Use LiteLLM, MCP, ADP standards |
| CC-001 | Code Craft | WARNING | Type hints required |
| CC-002 | Code Craft | WARNING | Testing pyramid |
| CC-003 | Code Craft | WARNING | No vibe coding |
| CC-004 | Code Craft | SUGGESTION | Function atomicity |
| CC-005 | Code Craft | SUGGESTION | Minimalist imperative |
| AF-001 | Code Craft | WARNING | Auto-fix marker integrity / rot detection |