- Phase 1: Foundational Documents
- 1. VISION.md (Orchestrator Initial Draft)
- 2. CONSTITUTION.md (Orchestrator Initial Draft)
- 3. ARCHITECTURE.md
- 4. CONSTRAINTS.md
- 5. IDENTITIES.md
- 6. DATA_MODEL.md
- Phase 2: Behavioral & Engine Specs
- 7. PREDICTION.md
- 8. POLICY_ENGINE.md
- 9. AGENT_CONTRACT.md
- 10. API_SPEC.md
- Phase 3: Interface & Workflow Specs
- 11. TUI_SPEC.md
- 12. WEB_UI_SPEC.md
- 13. WORKFLOWS.md
- 14. ACCEPTANCE.md
- Phase 4: Project Management
- Initialize PROGRESS.md
- Initialize PHASE_LEDGER.md
- Harden orchestration loop (loop.sh & LOOP_PROMPT.md)
- Define TEST_STRATEGY.md
Focus: Getting the process to run, manage its lifecycle, and handle signals correctly.
- M1.1: Project Skeleton
- Create directory structure (
cmd/ratelord-d,pkg/engine,pkg/store,pkg/api). - Dependency: None
- Create directory structure (
- M1.2: Daemon Entrypoint & Signal Handling
- Implement main process loop.
- Handle
SIGINT/SIGTERMfor graceful shutdown. - Acceptance:
D-03(Graceful Shutdown).
- M1.3: Logging & Observability
- Setup structured logging (stdout/stderr).
- Emit
system_startedlog on boot.
- M1.4: Configuration
- Implement configuration loader (env vars, defaults).
- Note: Split from M1.1 to ensure atomic commits.
Focus: The immutable SQLite ledger that serves as the source of truth.
- M2.1: SQLite Initialization
- Implement DB connection with WAL mode enabled.
- Create
eventstable schema (id, type, payload, dimensions, ts). - Acceptance:
D-01(Clean Start).
- M2.2: Event Writer (Append-Only)
- Implement
AppendEventfunction. - Ensure atomic writes.
- Acceptance:
D-04(Event Immutability).
- Implement
- M2.3: Event Reader & Replay
- Implement
ReadEventsiterator (from offset). - Implement basic replay loop to restore in-memory state on startup.
- Acceptance:
D-02(Crash Recovery),D-05(State Derivation).
- Implement
Focus: The HTTP/Socket interface for agents to negotiate intents.
- M3.1: HTTP Server Shell
- Bind listener to
127.0.0.1:8090. - Setup router and middleware (logging, panic recovery).
- Bind listener to
- M3.2: Intent Endpoint (Stub)
- Implement
POST /v1/intenthandler. - Validate JSON schema.
- Return mock "Approved" decision to verify connectivity.
- Acceptance:
A-01(Approve Intent),A-02(Latency).
- Implement
- M3.3: Health & Diagnostics
- Implement
GET /v1/health. - Implement
GET /v1/events(basic list).
- Implement
Focus: Allow registration of the first identity to prove the write-path works.
- M4.1: Identity Registration Command
- Implement CLI
ratelord identity add. - Emit
identity_registeredevent to storage. - Acceptance:
D-06(Identity Registration).
- Implement CLI
- M4.2: Basic State Projection
- Implement in-memory
IdentityProjectionbuilt during replay. - Serve
GET /v1/identitiesto list registered actors. - Acceptance:
T-03(Identity List).
- Implement in-memory
Focus: Implement the core logic for tracking usage against limits and making policy decisions.
- M5.1: Usage Tracking
- Create
pkg/engine/usage.go. - Implement
UsageProjectionto track usage by identity/scope/window. - Hook it into the
Replayloop. - Acceptance:
D-07(Usage Tracking).
- Create
- M5.2: Policy Enforcement
- Create
pkg/engine/policy.go. - Implement
Evaluate(intent)which checks usage against limits. - Update
POST /v1/intentto use the real policy engine. - Acceptance:
A-03(Policy Enforcement).
- Create
Focus: Connect to external sources (or mocks) to ingest real usage/limit data.
- M6.1: Provider Interface & Registry
- Create
pkg/provider/types.go(Provider interface). - Implement a
ProviderRegistryinpkg/engine. - Dependency: None.
- Create
- M6.2: Mock Provider
- Create
pkg/provider/mock/mock.go. - Implement a provider that emits synthetic usage/limit events.
- Acceptance:
T-02(Mock Data Ingestion).
- Create
- M6.3: Polling Orchestrator
- Create
pkg/engine/poller.go. - Implement the loop that ticks and calls
Provider.Poll(). - Ingest results into Event Log (
provider_poll_observed). - Acceptance:
D-08(Continuous Polling).
- Create
Focus: Translate raw usage history into time-to-exhaustion predictions.
- M7.1: Forecast Model Interface
- Create
pkg/engine/forecast/types.go. - Define
Modelinterface (Inputs -> Distribution).
- Create
- M7.2: Linear Burn Model
- Implement simple linear regression model.
- Calculate P99 time-to-exhaustion based on recent history.
- M7.3: Forecast Loop
- Trigger forecasts after
usage_observedevents. - Emit
forecast_computedevents. - Acceptance:
D-09(Forecast Emission).
- Trigger forecasts after
Focus: Visualize the state of the system for the operator.
- M8.1: TUI Foundation
- Initialize Bubbletea model.
- Connect to
GET /v1/eventsandGET /v1/identities.
- M8.2: Dashboard View
- Render Usage Bars per pool.
- Render recent Event Log.
- Acceptance:
T-04(Dashboard).
Focus: Improving the robustness of the existing components and enhancing the TUI.
- M9.1: TUI Drill-Down Views
- View detailed Event payloads.
- View active Policy rules and current Usage stats in detail.
- Acceptance:
T-01(Real-time Stream detailed view).
- M9.2: Error Handling & Reconnection
- Implement reconnection logic in TUI if Daemon restarts.
- Handle missing configuration or DB errors gracefully.
- Acceptance: Robustness during
D-02(Crash Recovery).
- M9.3: Configurable Policy Loading
- Load
policy.yamlfrom disk on startup. - Support
SIGHUPto reload policy. - Acceptance:
Pol-03(Policy Hot Reload).
- Load
Focus: Proving the system works as a cohesive whole using the strategies in TEST_STRATEGY.md.
- M10.1: End-to-End Simulation Script
- Create a script/tool to generate realistic mock workloads.
- Simulate multiple agents with different consumption patterns.
- M10.2: Verification of Drift Detection
- Manually inject usage into Mock Provider.
- Verify Daemon detects drift and adjusts.
- Acceptance:
P-03(Drift Detection).
- M10.3: Verification of Policy Enforcement
- Drive usage to limit.
- Verify Intents are denied.
- Acceptance:
Pol-01(Hard Limit),Pol-02(Load Shedding).
- M10.4: Final Acceptance Run
- Execute full suite of Acceptance Tests.
- Result: Partial Pass (See
ACCEPTANCE_REPORT.md).
Focus: Ensure policies are loaded, hot-reloaded, and correctly evaluated to enable denial/throttling.
- M11.1: Debug Policy Loading
- Investigate why
policy.yamlrules are not applying. - Fix
LoadPolicyConfigandEvaluate.
- Investigate why
- M11.2: Implement Wait/Modify Actions
- Ensure
approve_with_modificationsworks. - Implement shape/defer logic.
- Ensure
- M11.3: Verify Hot Reload
- Ensure SIGHUP updates rules without restart.
Focus: Ensure drift detection and provider state survive restarts.
- M12.1: Persist Provider State
- Ensure provider offsets/drift are saved to SQLite.
- M12.2: TUI Verification
- Manually verify TUI dashboard connects and displays data.
- M13.1: Release Tagging & Notes
- Tag v1.0.0.
- Write Release Notes.
- Tag v1.0.0.
- Write Release Notes.
- M13.2: Deployment Guide
- Write DEPLOYMENT.md (Systemd, Docker, K8s).
Focus: Implement the first real provider to track GitHub API rate limits (Core, Search, GraphQL, Integration Manifests).
- M14.1: GitHub Configuration
- Define config structure (PAT, Enterprise URL).
- Update
pkg/engine/config.go.
- M14.2: GitHub Poller
- Implement
pkg/provider/github/github.go. - Fetch limits via
GET /rate_limit. - Map
core,search,graphqlto pools.
- Implement
- M14.3: GitHub Integration Test
- Verify against public GitHub API (using a safe/dummy token or recorded mock).
Focus: Track OpenAI usage limits (RPM, TPM) via header inspection or Tier API.
- M15.1: OpenAI Configuration
- Define config structure (API Key, Org ID).
- M15.2: OpenAI Poller
- Implement
pkg/provider/openai/openai.go. - Note: OpenAI limits are often response-header based, necessitating a "Probe" or "Proxy" approach, or just polling the
dashboard/billinghidden APIs if available (unlikely stable). - Decision: Start with a "Probe" mode or just manual quota setting + local counting if API is unavailable.
- Refinement: OpenAI's headers
x-ratelimit-limit-requestsetc. are returned on requests. We might need a "Passive" provider that ingests data from a sidecar/proxy, or we proactively poll a lightweight endpoint to check headers.
- Implement
Focus: Internal usage to validate stability using real GitHub tokens.
- M16.1: Dogfood Environment Setup
- Create
deploy/dogfooddirectory. - Create
deploy/dogfood/policy.json(orpolicy.yaml) monitoring GitHub rate limits for the current user/token. - Create
deploy/dogfood/run.shto boot the daemon with this local configuration.
- Create
- M16.2: Operational Run
- Execute
run.shlocally. - Generate usage (via
ghCLI orratelordidentity) to populate the event log. - Verify: Ensure
provider_poll_observedandusage_observedevents are recorded.
- Execute
- M16.3: Analysis & Tuning
- Analyze the resulting event log to compare
forecast_computedvs actual usage trends. - Determine if the Linear Burn Model needs tuning for bursty traffic.
- Analyze the resulting event log to compare
Focus: Language-specific bindings for the Agent Contract (Intent Negotiation).
- M17.1: SDK Specification
- Draft
CLIENT_SDK_SPEC.md. - Define interfaces for
Ask,Propose,Feedback.
- Draft
- M17.2: Go SDK
- Implement
pkg/client. - Provide
NewClient(httpEndpoint). - Implement
Ask(ctx, intent).
- Implement
- M17.3: Python SDK
- Implement
ratelordpackage (PyPI structure). - Implement
Clientclass withask()method. - Ensure fail-closed behavior and auto-wait.
- Dependency: M17.1.
- Implement
Focus: Modern, graphical interface for observing system state.
- M18.1: Spec Refinement
- Refine
WEB_UI_SPEC.mdwith implementation details.
- Refine
- M18.2: Project Scaffold
- Initialize
web/with React + Vite + Tailwind. - Setup proxy to daemon API.
- Initialize
- M18.3: Dashboard Implementation
- Implement
AppShellandDashboardview. - Connect to
GET /v1/eventsandGET /v1/identities.
- Implement
- M18.4: Build Integration
- Create
Makefilerules for web build. - Use
//go:embedto serve UI fromratelord-d. - Acceptance:
ratelord-d --webserves the UI.
- Create
- M18.5: History View
- Implement
/historyroute withTimeRangePicker,EventTimeline, andEventList. - Support server-side filtering via URL params.
- Implement
- M18.6: Identity Explorer
- Implement
/identitiesroute. - Visualize hierarchy of Agents, Scopes, and Pools.
- Implement
Focus: Enable the largest ecosystem of agents (JS/TS) to use Ratelord.
- M19.1: SDK Specification
- Define TypeScript interfaces for Intent, Decision, and Client options.
- Create
sdk/js/SPEC.mdor updateCLIENT_SDK_SPEC.md.
- M19.2: Project Scaffold
- Initialize
sdk/jswithpackage.json,tsconfig.json. - Setup Jest/Vitest for testing.
- Initialize
- M19.3: Core Implementation
- Implement
RatelordClientclass. - Implement
ask(intent)with retries and fail-closed logic. - Acceptance: Unit tests pass.
- Implement
- M19.4: Integration Verification
- Create a sample script
sdk/js/examples/basic.ts. - Verify against running
ratelord-d.
- Create a sample script
- M19.5: Release Prep
- Configure
package.jsonexports/files. - Create
sdk/js/npmignore. - Document publish process.
- Configure
Focus: Export internal metrics to standard observability tools.
- M20.1: Prometheus Exporter
- Expose
/metricsendpoint. - Export
ratelord_usage,ratelord_limit,ratelord_forecast_secondsgauges. - Export
ratelord_intent_totalcounters.
- Expose
- M20.2: Logging Correlation
- Ensure
trace_id/intent_idis threaded through all logs for a request.
- Ensure
- M20.3: Grafana Dashboard
- Create
deploy/grafana/dashboard.json. - Visualize
ratelord_usageandratelord_limitper pool.
- Create
Focus: Production-grade configuration management.
- M21.1: Robust Config Loader
- Support
RATELORD_DB_PATH,RATELORD_POLICY_PATH,RATELORD_PORT. - Support CLI flags to override env vars.
- Resolve M1.4 debt.
- Support
Focus: More expressive governance rules.
- M22.1: Soft Limits & Shaping
- M22.1.1: Policy Action Types: Add
warnanddelayto Policy Action definition. - M22.1.2: Evaluator Updates: Update
Evaluateto handle soft limits (returnApprovewith warning, orApproveWithModificationswith wait). - M22.1.3: API Response Update: Ensure
v1/intentresponse captures warnings and wait instructions. - M22.2: Temporal Rules
- M22.2.1: TimeWindow Matcher: Add
time_window(start_time, end_time, days_of_week) to Policy Rule. - M22.2.2: Evaluator Time Check: Implement time checking in
Evaluate.
- M22.2.1: TimeWindow Matcher: Add
- M22.1.1: Policy Action Types: Add
Focus: Secure the daemon for production usage beyond localhost.
- M23.1: TLS Termination
- Support
RATELORD_TLS_CERTandRATELORD_TLS_KEYenv vars. - Serve HTTPS if configured.
- Support
- M23.2: API Authentication
- M23.2.1: Auth Token Management: Extend
identity addto generate/accept an API token (store hashed). - M23.2.2: Auth Middleware: Validate
Authorization: Bearer <token>against registered identities.
- M23.2.1: Auth Token Management: Extend
- M23.3: Secure Headers
- Add HSTS, CSP, and other security headers to HTTP responses.
Focus: Move beyond static limits to dynamic flow control.
- M24.1: Dynamic Delay Controller
- Implement a PID or AIMD controller to calculate wait times.
- Inputs: Current burn rate, remaining budget, time to reset.
- Outputs: Suggested wait time (duration).
- M24.2: Feedback Loop Integration
- Feed "actual consumption" back into the delay calculator.
- Adjust aggression based on "drift" (forecast vs actual).
- M24.3: Configuration & Tuning
- Allow configuration of controller parameters (Kp, Ki, Kd) via policy.
Focus: Efficient querying for historical data.
- M25.1: Aggregation Schema
- Update
pkg/store/sqlite.gomigrate()to includeusage_hourly,usage_daily, andsystem_state. - Add
GetSystemState(key)andSetSystemState(key, val)methods toStore. - Add
UpsertUsageStats(batch)method toStore.
- Update
- M25.2: Rollup Worker Core
- Create
pkg/engine/rollup.go. - Implement
RollupWorkerstruct withRun(ctx)loop. - Implement aggregation logic (bucketing and delta calculation).
- Integrate worker into
cmd/ratelord-d/main.gostartup.
- Create
- M25.3: Trend API
- Add
GetTrendsmethod toStore(query with filters). - Implement
GET /v1/trendshandler inpkg/api. - Add query param parsing and validation.
- M25.4: Integration Test
- Generate synthetic usage events.
- Force a rollup cycle.
- Verify
GET /v1/trendsreturns expected aggregates.
- Add
Focus: Push alerts to external systems.
- M26.1: Webhook Registry
- Create
webhook_configstable. - Implement
POST /v1/webhooksto register URLs.
- Create
- M26.2: Dispatcher
- Async worker to send HTTP POST payloads to registered webhooks.
- Handle retries and backoff.
- M26.3: Security (HMAC)
- Sign webhook payloads with a shared secret.
- Include
X-Ratelord-Signatureheader.
Focus: Improve startup time and manage disk usage.
- M27.1: Snapshot Schema
- Create
snapshotstable (snapshot_id, timestamp, payload blob).
- Create
- M27.2: Snapshot Worker
- Implement a worker that periodically serializes the
Projectionstate (Usage, Limits, etc.) to a snapshot. - M27.3: Startup Optimization
- Update
Loaderto load the latest snapshot first. - Replay events only after the snapshot timestamp.
- Acceptance: Startup time is O(1) + O(recent_events) instead of O(all_events).
- Update
- M27.4: Event Pruning
- Implement a command or worker to delete events older than retention policy (if they are snapshotted).
- Implement a worker that periodically serializes the
Focus: Validate complex scenarios and stress test the system (as per ADVANCED_SIMULATION.md).
- M28.1: Simulation Engine Upgrade
- Refactor
ratelord-simto support configurable scenarios (JSON/YAML). - Implement deterministic seeding for RNG.
- Implement
AgentBehaviorinterface (Greedy, Poisson, Periodic).
- Refactor
- M28.2: Scenario Definitions
- S-01: Implement "Thundering Herd" scenario config.
- S-02: Implement "Drift & Correction" scenario (requires Drift Saboteur).
- S-03: Implement "Priority Inversion Defense" scenario.
- S-04: Implement "Noisy Neighbor" (Shared vs Isolated) scenario.
- S-05: Implement "Cascading Failure Recovery" scenario.
- M28.3: Reporting & Verification
- Output structured JSON results (latency histograms, approval rates).
- Add assertions to verify scenario success criteria.
Focus: Elevating "Cost" to a first-class constraint alongside Rate Limits (FINANCIAL_GOVERNANCE.md).
- M29.1: Currency Types & Usage Extension
- Create
pkg/engine/currencypackage. - Add
MicroUSDtype (int64). - Update
Usagestruct to includeCost MicroUSD.
- Create
- M29.2: Pricing Registry
- Update
pkg/engine/config.goto includePricingmap. - Implement lookup logic:
GetCost(provider, model, units). - Update
UsageProjectionto calculate cost on ingestion.
- Update
- M29.3: Cost Policy
- Add
budget_caprule type topkg/policy. - Implement
Evaluatelogic for cost-based rejections. - Add
cost_efficiencyrule type for provider selection suggestions.
- Add
- M29.4: Forecast Cost
- Update
ForecastModelto predictCostexhaustion. - Emit
forecast_cost_computedevents.
- Update
Focus: Expanding from single-node daemon to distributed fleet governance (CLUSTER_FEDERATION.md).
- M30.1: Grant Protocol Definition
- Define protocol in
API_SPEC.mdandCLUSTER_FEDERATION.md. - Define
GrantRequestandGrantResponsestructs (Implementation). - Implement
POST /v1/federation/grantendpoint on Leader.
- Define protocol in
- M30.2: Follower Mode
- Add
--mode=followerflag toratelord-d. - Implement
RemoteProviderthat requests grants from Leader instead of direct token bucket. - Implement local cache for granted tokens.
- Add
- M30.3: Leader State Store
- Abstract
TokenBucketstorage (See M32.1). - Implement Leader Election (See M33.1).
- Abstract
Focus: Zero-touch versioning and artifact publication (RELEASING.md).
- M31.1: CI Workflows
- Create
.github/workflows/test.yaml(Go test, lint). - Create
.github/workflows/release.yaml(Trigger on tag).
- Create
- M31.2: Release Script / Goreleaser
- Configure
.goreleaser.yaml. - Ensure cross-compilation (Darwin/Linux, AMD64/ARM64).
- Configure Docker build and push.
- Configure
- M31.3: Documentation & Changelog
- Configure changelog generation from Conventional Commits.
- Auto-update
RELEASE_NOTES.mdor GitHub Release body.
Focus: Allow the Leader to persist state in shared storage (Redis/Etcd) for stateless deployments.
- M32.1: Usage Store Interface
- Refactor
UsageProjectionto useUsageStoreinterface. - Implement
MemoryUsageStore(default).
- Refactor
- M32.2: Redis Implementation
- Implement
RedisUsageStoreusinggo-redis. - Add
RATELORD_REDIS_URLconfig.
- Implement
- M32.3: Atomic Operations
- Refactor
PoolStatestorage to Redis Hash (HSET) to support partial updates. - Implement
IncrementUsedwith Lua scripts for atomicity.
- Refactor
Focus: Automatic Leader Election for failover.
- M33.1: Leader Election
- Define
LeaseStoreinterface. - Define
Leasestruct (HolderID, Expiry). - Implement
RedisLeaseStore. - Implement
SqliteLeaseStore(as fallback). - Implement
ElectionManagerwith Acquire/Renew loop.
- Define
- M33.2: Standby Mode
- Implement
ElectionManagerstruct. - Implement
StandbyLoop(Polls lease, if free -> Acquire). - Handle
OnPromote(Load state, start Policy Engine). - Handle
OnDemote(Stop Policy Engine, flush state).
- Implement
- M33.3: Client Routing
- Implement
HTTP Middlewareto check Leader status. - Proxy requests from Followers to Leader (or return 307 Redirect).
- Implement
- M33.4: Split-Brain Protection
- Add
EpochtoLeaseandElectionManager. - Include
EpochinEventmetadata. - Validate
Epochon critical state transitions.
- Add
Focus: Visualize the entire cluster.
- M34.1: Cluster View
- M34.1.1: API: Implement
GET /v1/cluster/nodesandClusterTopologyprojection. - M34.1.2: UI: Add "Cluster" tab (Node Table) in Web UI.
- M34.1.1: API: Implement
- M34.2: Node Diagnostics
- Visualize Replication Lag & Election Status (Implemented via Metadata & UI Update).
Focus: Formalizing the constraint graph taxonomy as defined in ARCHITECTURE.md.
- M35.1: Graph Schema Definition
- M35.1.1: Node Types: Define
Agent,Identity,Workload,Resource,Pool,Constraintstructs inpkg/graph. - M35.1.2: Edge Types: Define
Owns,Triggers,Limits,Depletes,AppliesTo,Boundsedge definitions. - M35.1.3: Graph Interface: Define the
Graphinterface for adding nodes/edges and traversing.
- M35.1.1: Node Types: Define
- M35.2: In-Memory Graph Projection
- M35.2.1: Projection Struct: Implement
GraphProjectionthat holds the graph state. - M35.2.2: Event Handlers: Implement handlers for
IdentityRegistered(PolicyUpdated pending). - M35.2.3: Replay Integration: Hook
GraphProjectioninto the mainLoaderreplay loop. - M35.2.4: Policy Graph Population: Handle
PolicyUpdatedevents (or load from config) to createConstraintandPoolnodes andAppliesTo/Limitsedges.
- M35.2.1: Projection Struct: Implement
- M35.3: Policy Matcher on Graph
- M35.3.1: Traversal Logic: Implement
GetConstraintsForIdentity(id)(ImplementedFindConstraintsForScope). - M35.3.2: Engine Integration: Wire
GraphProjectionintoPolicyEngineto replace linear search with graph traversal.
- M35.3.1: Traversal Logic: Implement
- M35.4: Graph Visualization
- M35.4.1: API: Add
GET /v1/graphendpoint (JSON/Dot format). - M35.4.2: UI: Visualize in Web UI (Force-directed graph).
- M35.4.1: API: Add
Focus: Managing long-term storage and compliance.
- M36.1: Retention Policy Engine
- Allow configuring TTL per Event Type.
- Implement
PruneWorker(Refinement of M27.4).
- M36.2: Cold Storage Offload
- Implement S3/GCS adapter for archiving old events/snapshots (Implemented
LocalBlobStoreas first adapter). - Implement "Hydrate from Archive" for historical analysis (Partial - ArchiveWorker implemented).
- Implement S3/GCS adapter for archiving old events/snapshots (Implemented
- M36.3: Compliance & Deletion
- Implement
DeleteIdentity(GDPR "Right to be Forgotten"). - Prune all events associated with a specific Identity ID.
- Implement
Focus: Answering "Why?" for every decision.
- M37.1: Decision Explainability
- M37.1.1: Trace Structs: Define
RuleTrace(RuleID, Input, Result) inpkg/engine. - M37.1.2: Evaluator Trace: Update
Evaluateto capture trace of all checked rules. - M37.1.3: Event Enrichment: Add
TracetoDecisionevent payload (Available in Result, Event pending if needed). - M37.1.4: API Exposure: Return trace in
POST /v1/intentresponse (debug mode).
- M37.1.1: Trace Structs: Define
- M37.2: Compliance Reports
- M37.2.1: Report Engine: Create
pkg/reportswith interfaceGenerator. - M37.2.2: CSV Generator: Implement
CSVGeneratorfor flat tabular data. - M37.2.3: Access Log Report: Implement
AccessLogReport(Date, Identity, Intent, Decision, RuleTrace). - M37.2.4: Usage Report: Implement
UsageReport(Date, Pool, Usage, Limit, Cost). - M37.2.5: API Endpoint: Implement
GET /v1/reportswithtypeandformatparams.
- M37.2.1: Report Engine: Create
- M37.3: Policy Debugging
- Implement "Trace Mode" for Policy Engine (logs every rule result).
- Web UI: Visualize Policy Evaluation Tree.
Focus: Unifying subsystems and paying down technical debt.
- M38.1: Constraint Graph Integration
- Refactor Policy Engine to use Graph (Done in M35.3).
- Ensure Federation Grant logic respects Graph constraints.
- Ensure Usage Projection handles Grant consumption.
- M38.2: Unified Store Audit
- Verify Redis/SQLite parity for Usage Store.
- Consolidate "TokenBucket" vs "UsageStore" abstractions if any diverge.
Focus: Allow LLMs (Claude, Gemini, etc.) to natively discover and query Ratelord constraints.
- M39.1: MCP Server Core
- M39.1.1: Dependency: Run
go get github.com/mark3labs/mcp-go. - M39.1.2: Package Structure: Create
pkg/mcpand implementation stub. - M39.1.3: CLI Integration: Update
cmd/ratelord/main.goto addmcpsubcommand (supports--urland--tokenflags). - M39.1.4: Client Wrapper: Create a simple internal HTTP client helper in
pkg/mcp/client.goto standardise API calls.
- M39.1.1: Dependency: Run
- M39.2: Resource Exporter
- M39.2.1: Events Resource: Implement
ratelord://eventsfetching fromGET /v1/events(limit 50). - M39.2.2: Usage Resource: Implement
ratelord://usagefetching fromGET /v1/trends(orGET /v1/graphfor structure). - M39.2.3: Config Resource: Implement
ratelord://configto expose current policy rules (read-only).
- M39.2.1: Events Resource: Implement
- M39.3: Tool Exporter
- M39.3.1: Ask Intent Tool: Implement
ask_intenttool wrappingPOST /v1/intent. - M39.3.2: Check Usage Tool: Implement
check_usagetool that allows querying specific pools/identities.
- M39.3.1: Ask Intent Tool: Implement
- M39.4: Prompts
- M39.4.1: System Prompt: Implement
ratelord-awareMCP prompt.
- M39.4.1: System Prompt: Implement
Focus: Standardize retry/backoff logic across SDKs to prevent thundering herds.
- M40.1: Go SDK Resilience
- Add Backoff & Jitter.
- M40.2: JS SDK Resilience
- Add
bottleneckor custom backoff.
- Add
- M40.3: Python SDK Resilience
- Add
tenacityintegration.
- Add
- [x] **M41.1: API Package Coverage**
- Target: `pkg/api` > 80% coverage.
- Implement tests for handlers, middleware, and validation.
- [x] **M41.2: Store Package Coverage**
- [x] Target: `pkg/store` > 80% coverage.
- [x] Implement tests for SQLite store, event reading/writing.
- [x] **M41.3: Engine Package Coverage**
- [x] Target: `pkg/engine` > 80% coverage.
- [x] Improve tests for policy evaluation and state management.
Focus: Ensure the system runs correctly in containerized environments.
- M48.1: Docker Composition
- Create
Dockerfilefor multi-stage build (Web + Go). - Create
docker-compose.ymlfor local stack (Daemon + Redis).
- Create
- M48.2: End-to-End Testing
- Create
tests/e2esuite. - Verify full flow: Identity -> Policy -> Intent -> Decision.
- Verify Web UI availability.
- Create
Focus: Create user-facing documentation to explain concepts and usage.
- M42.1: Concept Guides
- Draft
docs/concepts/architecture.md(Simplified "How it works"). - Draft
docs/concepts/core-model.md(Identity, Scope, Pool, Constraint).
- Draft
- M42.2: API Reference
- Draft
docs/reference/api.md(Endpoints and Client Behavior).
- Draft
- M42.3: User Guides
- Update
docs/guides/cli.md(MCP, Identity, Mode). - Update
docs/guides/web-ui.md(Graph, Cluster, Reports). - Create
docs/guides/mcp.md(MCP Integration).
- Update
Focus: Address technical debt, stubbed features, and missing tests identified during final assessment.
- M43.1: Complete Reporting Engine
- Implement actual CSV generation logic in
pkg/reports/csv.go. - Add unit tests for
pkg/reports.
- Implement actual CSV generation logic in
- M43.2: Complete Graph Projection
- Handle
PolicyUpdatedevents inpkg/graph/projection.go. - Optimize graph traversal (Index for O(1) lookup).
- M43.3: Hardening & Configuration
- Remove hardcoded
resetAtinpkg/engine/forecast/service.go. - Fix
pkg/api/server.gocorrect pool ID usage. - Add unit tests for
pkg/mcp(Server and Handlers). - Add unit tests for
pkg/blob(Local Store). - M43.6: Inject Provider Version
- Inject
ProviderVersionat build time (ldflags) or derive from package. - M43.4: Codebase Cleanup (Assessment Findings)
-
pkg/graph/projection.go: ImplementProviderObservedhandler. -
pkg/provider/federated/provider.go: Fix TODOs (error reporting, pre-seeding note). -
pkg/engine/poller.go: Verified configurable units implementation. -
pkg/api/federation.go: VerifiedRemainingGloballookup.
- Handle
Focus: Implement the sophisticated shaping behaviors from the original vision.
- M44.1: Adaptive Actions
- Implement
routeaction (load balancing across identities). - Implement
switch_protocolaction (REST <-> GraphQL).
- Implement
- M44.2: Client Negotiation
- Update
Intentresponse to include detailed modification instructions beyondwait_seconds. - Update SDKs to handle complex negotiation.
- Update
Focus: Bring the simulation capabilities into the visual domain.
- M45.1: Web UI Simulation Lab
- Create "Simulation" tab in Web UI.
- Implement frontend for configuring scenarios (Agents, Policies, Bursts).
- Run
ratelord-sim(wasm or server-side) and visualize results.
- M45.2: Simulation Integration
- Connect UI to
ratelord-simbackend (or wasm). - Visualize real-time results.
- Connect UI to
Focus: Hardening the distributed system guarantees.
- [ ] **M46.1: Global State Aggregation**
- Implement CRDT or gossip protocol for accurate global limit tracking in Federation.
- Move beyond "Leader Local = Global" simplification.
- [ ] **M46.2: Graph Concurrency**
- Refactor `GraphProjection` for Copy-On-Write or safe concurrent access.
Focus: Smarter provider integration.
- [ ] **M47.1: OpenAI Smart Probing**
- Implement "Probe" mode that hits a cheap endpoint (e.g. chat completion with max_tokens=1) if `/models` doesn't return relevant rate limit headers.
- Handle model-specific rate limits (gpt-4 vs gpt-3.5).