How FormicOS extracts, trusts, retrieves, evolves, and maintains knowledge. Read this end-to-end before operating the knowledge system.
When a colony completes successfully, the system automatically extracts knowledge entries:
- LLM extraction — An archivist pass distills the colony's output into typed entries: skills (reusable techniques) and experiences (observed outcomes with polarity).
- Transcript harvest (hook position 4.5) — Before structured extraction,
the system scans the raw colony transcript for bug root causes,
conventions, and tool configurations. Deduplicates against existing
entries at a 0.82 cosine threshold. Entries use a
:harvestsuffix for replay safety. - Inline dedup — Before emitting each entry, a cosine check (threshold 0.92) detects near-duplicates. If a match exists, the system reinforces the existing entry's confidence instead of creating a duplicate.
- Security scan — Each entry passes a 5-axis scan (prompt injection,
data exfiltration, credential leakage, code safety, credential detection
via detect-secrets). Credential scanning uses dual-config: prose-mode
plugins exclude entropy-based detectors to reduce false positives.
Findings with
scan_status="high"result instatus="rejected". Credential values in tool outputs are redacted as[REDACTED:type]. - Event emission — A
MemoryEntryCreatedevent is emitted per entry. The projection store indexes it; the memory store (Qdrant) persists the embedding for vector search.
Extraction is automatic. No operator action is required.
The Forager adds a second knowledge input channel: bounded web acquisition. When retrieval exposes a gap (low-confidence results, coverage gaps, stale clusters), the system can search the web, fetch pages, extract content, and admit the results through the same lifecycle as colony-produced knowledge.
Key invariants:
- Forager-admitted entries reuse
MemoryEntryCreatedatcandidatestatus. There is no separate "proposed knowledge" event type. - Web-sourced entries start with conservative priors (low-to-moderate confidence) and never outrank colony-earned knowledge by default.
- All fetched content passes through the same 5-axis security scan and admission scoring pipeline as colony-extracted knowledge.
- Entries carry auditable forager provenance:
source_url,fetch_timestamp,fetch_level,forager_trigger,forager_query, andquality_score. - The existing lifecycle (candidate → active → verified, decay, Thompson Sampling retrieval, co-occurrence reinforcement, operator overlays) applies unchanged.
The Forager does not get a privileged admission path. It translates fetched content into ordinary candidate entries, and usage decides whether those entries earn trust.
Each entry has a scan_status, an admission score, and a status.
Wave 38 added a real admission gate so knowledge is not accepted purely on
semantic relevance or source success.
New entries are evaluated across seven signals:
- confidence / posterior quality
- scanner findings
- provenance richness
- federation trust
- observation mass
- content type prior
- recency
Hard rejects occur on critical/high scanner findings or extremely weak composite scores. Borderline entries are usually demoted rather than deleted, so operators can still inspect what the system chose not to trust.
Each entry also has a status:
| scan_status | Resulting status | Meaning |
|---|---|---|
high or critical |
rejected |
Security concern — entry is excluded from retrieval |
safe, low, or medium |
candidate |
Awaiting validation |
| (source colony succeeded) | verified |
Promoted automatically when the source colony completed successfully |
Only verified, active, and candidate entries participate in retrieval.
Rejected and stale entries are excluded. Weak federated entries are further
discounted at retrieval time so local verified knowledge remains dominant
under ordinary conditions.
Every entry carries a thread_id linking it to the workflow thread where it
was created.
- Thread-scoped entries receive a thread_bonus of 1.0 (weighted at 0.07 in the composite score) when the searching colony belongs to the same thread.
- Workspace-wide entries have no thread_id — they are available to all colonies in the workspace.
To promote a thread-scoped entry to workspace-wide, use the Knowledge
browser's "Promote" button or emit a MemoryEntryScopeChanged event. This
clears the entry's thread_id, making it globally available.
When a colony or agent searches knowledge, results are ranked by a 6-signal composite score (ADR-044) that balances exploitation of proven knowledge with exploration of uncertain entries.
score = 0.38 * semantic
+ 0.25 * thompson
+ 0.15 * freshness
+ 0.10 * status_bonus
+ 0.07 * thread_bonus
+ 0.05 * cooccurrence
| Signal | Range | Source |
|---|---|---|
semantic |
[0, 1] | Cosine similarity from Qdrant vector search |
thompson |
[0, 1] | Random sample from Beta(alpha, beta) — the entry's confidence posterior |
freshness |
[0, 1] | Exponential decay with 90-day half-life: 2^(-age_days/90) |
status_bonus |
[0, 1] | verified=1.0, active=0.8, candidate=0.5, stale=0.0, unknown/missing=0.0 |
thread_bonus |
{0.0, 1.0} | 1.0 if entry's thread matches the searching colony's thread, 0.0 otherwise |
cooccurrence |
[0, 1] | Sigmoid-normalized max co-occurrence weight with other results: 1 - e^{-0.6w} |
Results are returned at one of four detail tiers to minimize token usage:
| Tier | ~Tokens/result | Fields |
|---|---|---|
summary |
~15 | id, title, summary (100 chars), confidence_tier |
standard |
~75 | + content_preview (200 chars), domains, decay_class |
full |
~200+ | + full content, conf_alpha/beta, merged_from, co-occurrence cluster |
auto |
varies | Starts at summary; escalates if coverage is thin |
Auto-escalation logic: if ≥2 unique source colonies and top score >0.5 → summary. If ≥1 source and score >0.35 → standard. Otherwise → full.
Colony context is assembled with per-scope token budgets:
| Scope | Budget % |
|---|---|
| task_knowledge | 35% |
| observations | 20% |
| structured_facts | 15% |
| round_history | 15% |
| scratch_memory | 15% |
-
High-confidence entries (high alpha, low beta) produce Thompson samples near their posterior mean — they are exploited reliably.
-
Uncertain entries (alpha ≈ beta, low total) produce high-variance Thompson samples — they are explored occasionally, getting a chance to prove themselves.
-
Low-confidence entries (low alpha, high beta) produce low Thompson samples — they fade from retrieval naturally.
-
Co-accessed entries that frequently appear together in successful colonies get a small but meaningful co-occurrence boost (~5%).
-
Pinned entries receive a local retrieval boost on the current instance.
-
Muted or invalidated entries are filtered out locally without mutating shared confidence truth.
Note: All signals are normalized to [0, 1]. Weights sum to 1.0.
Each entry carries a Beta distribution posterior: Beta(alpha, beta).
All entries start with Beta(5.0, 5.0). This prior was deliberately chosen
(ADR-039) to match the legacy DEFAULT_PRIOR_STRENGTH = 10.0. It requires
approximately 10 observations before data dominates the prior.
- Colony completes.
- The system matches knowledge access traces (which entries were retrieved and used) to the colony outcome.
- For each accessed entry, a
MemoryConfidenceUpdatedevent is emitted:- Colony succeeded →
delta_alpha = clip(0.5 + quality_score, 0.5, 1.5) - Colony failed → bounded
delta_betabased on failure penalty
- Colony succeeded →
- The projection store updates the entry's alpha/beta.
- Future Thompson samples reflect the updated posterior.
Current repo state: the old flat alpha += 1 / beta += 1 rule has been
replaced by quality-weighted deltas. Successful colonies apply a clipped
alpha gain derived from quality_score, while failure paths apply a bounded
beta penalty.
Pin, mute, invalidate, and annotation actions are replayable operator
overlays. They do not silently change conf_alpha or conf_beta, and they
do not emit MemoryConfidenceUpdated on their own. Shared epistemic truth
and local editorial authority are intentionally separate.
If entries plateau at ~0.5 confidence after 20+ observations, the prior is dominating. Consider:
- Using the confidence reset handler (see §8) to reset stuck entries.
- In a future wave, reducing the prior to
Beta(2, 2).
Three deterministic maintenance services run on a daily schedule and can be triggered manually.
- Auto-merge (cosine similarity ≥ 0.98): Emits a
MemoryEntryMergedevent. Content strategy iskeep_longerif the source content exceeds 1.2x the target length, otherwisekeep_target. Domains are unioned. Themerged_fromprovenance chain accumulates all absorbed entry IDs. - LLM-confirmed (similarity in [0.82, 0.98)): An LLM judges whether the
pair is truly duplicate. Confirmed →
MemoryEntryMergedwithcontent_strategy="llm_selected". Denied → pair is marked dismissed (durable, skipped on future runs). - Dismissed pairs are tracked via
last_status_reasonand excluded from future dedup runs.
Wave 33 change: Auto-merge and LLM-confirmed merges now emit
MemoryEntryMergedinstead ofMemoryEntryStatusChanged(new_status="rejected"). This preserves full provenance and content selection history.
Entries not accessed in 90 days (and not already rejected/stale) are
transitioned to stale status. Additionally, entries with
prediction_error_count >= 5 and access_count < 3 are swept as stale.
When a search returns a semantically-weak top result (cosine < 0.38), the
entry's prediction_error_count is incremented and the query is recorded.
This is projection-only (lossy on replay). High prediction error counts
feed the stale sweep trigger above.
Entries with:
- Opposite polarity (positive vs negative)
- Overlapping domains (Jaccard similarity > 0.3)
are flagged as contradictions. The Knowledge browser shows these for operator resolution.
Current repo state: archival is a lifecycle transition, not a separate hidden confidence mutation path. Long-term forgetting comes from query-time gamma decay, stale sweep logic, and operator/editorial overlays. Promoted knowledge remains workspace-visible after thread archival.
When a thread is archived, unpromoted thread-scoped entries undergo confidence decay:
alpha *= 0.8
beta *= 1.2
With hard floors: alpha >= 1.0 and beta >= 1.0.
This formula is asymmetric — it reduces the success count while increasing the failure count, actively biasing the posterior mean downward rather than just widening uncertainty.
When gamma-decay ships in Wave 32, this formula must be redesigned. Three options under consideration:
- Symmetric decay:
alpha *= 0.9, beta *= 0.9— widens uncertainty without directional bias. - Lower-gamma variant for archived entries:
gamma_archived=0.85vsgamma_active=0.98— faster forgetting for archived context. - Subsumption into gamma-decay with hard floor: Archived entries use the
same gamma-decay mechanism but with a lower gamma and enforce
alpha >= alpha_0, beta >= beta_0after any decay.
Use the Queen's query_service tool:
query_service(service_type="service:consolidation:dedup")
query_service(service_type="service:consolidation:stale_sweep")
query_service(service_type="service:consolidation:contradiction")
query_service(service_type="service:consolidation:confidence_reset")
The confidence reset handler resets entries that are stuck at mediocre
confidence: entries with 50+ observations beyond the prior and a posterior
mean between 0.35 and 0.65 are reset to Beta(5.0, 5.0). This is
manual-only — it does not run on the daily schedule.
The Knowledge browser also exposes Dedup and Stale Sweep buttons directly.
confidence = alpha / (alpha + beta)
A value of 0.72 means "72% of observations led to successful outcomes."
The sum alpha + beta indicates how much data backs the estimate. Subtract
the prior (10.0) to get the effective observation count.
| alpha + beta | Effective observations | Interpretation |
|---|---|---|
| 10 | 0 | Prior only — no data yet |
| 15 | 5 | Early — still uncertain |
| 30 | 20 | Moderate — data starting to dominate |
| 60 | 50 | Strong — posterior is tight |
The Knowledge browser displays:
- Confidence percentage — the posterior mean as a percentage.
- Bar width — proportional to
min((alpha + beta) / 50, 1.0), showing certainty. A narrow bar means little data; a full-width bar means high certainty.
Thread-scoped entries are visible only to colonies in the same thread. To make an entry available workspace-wide:
- Knowledge browser: Click the "Promote" button next to any thread-scoped entry.
- Programmatically: The system emits a
MemoryEntryScopeChangedevent that clears the entry'sthread_id.
After promotion, the entry loses its thread bonus (1.0 at weight 0.07) but becomes available to all colonies in the workspace.
- The knowledge is generally useful beyond the current thread's scope.
- The thread is about to be archived and you want to preserve high-value entries.
- Thread-specific context that would be misleading outside the original workflow.
- Low-confidence entries that haven't proven themselves yet.
Each knowledge entry carries a decay_class that controls how fast its
confidence fades when not observed:
| Decay class | Gamma (γ) | Half-life | Use case |
|---|---|---|---|
ephemeral |
0.98 | ~34 days | Transient context, tool configs |
stable |
0.995 | ~139 days | Core skills, patterns |
permanent |
1.0 | ∞ | Architectural decisions, invariants |
Gamma-decay is applied at query time via the ObservationCRDT, not stored
in the entry. The formula: effective_alpha = prior + Σ(γ^elapsed_days × count).
A hard cap of MAX_ELAPSED_DAYS = 180 prevents gamma from collapsing
observations beyond recovery. For ephemeral entries at 180 days,
0.98^180 ≈ 0.027 — observations retain ~2.7% weight, not zero.
Default decay class for new entries is ephemeral.
When a colony accesses multiple knowledge entries and succeeds, the system reinforces co-occurrence weights between each pair of accessed entries.
- Result-result reinforcement: 1.05x weight multiplier on query co-access.
- Decay: Co-occurrence weights decay at γ=0.995 per day. Weights below 0.1 are pruned.
- Weight cap: 10.0 (prevents runaway reinforcement).
- Scoring: Co-occurrence is a scoring signal (weight 0.05, ADR-044).
Sigmoid normalization:
1 - e^{-0.6w}maps raw weight to [0, 1].
Data lives in ProjectionStore.cooccurrence_weights as CooccurrenceEntry
dataclass instances keyed by canonical (id_a, id_b) tuples.
Dense co-occurrence clusters (≥5 entries, avg weight >3.0) are identified
during the co-occurrence decay maintenance pass and stored in
ProjectionStore.distillation_candidates. Wave 35 will use these to
trigger archivist colonies that synthesize cluster knowledge.
Agents can provide explicit quality feedback on retrieved entries via the
knowledge_feedback tool (available to coder, reviewer, researcher castes):
helpful=true→ emitsMemoryConfidenceUpdatedwithreason="agent_feedback_positive"(alpha increases).helpful=false→ incrementsprediction_error_countand emitsMemoryConfidenceUpdatedwithreason="agent_feedback_negative"(beta increases).
This closes the agent-to-knowledge feedback loop, allowing agents to directly improve the knowledge system's accuracy.
The system generates deterministic intelligence briefings (no LLM calls) from projection signals. The current rule set spans 14 actionable insights.
Knowledge rules (7):
| Rule | Category | Severity | Trigger | suggested_colony? |
|---|---|---|---|---|
| 1. Confidence decline | confidence |
attention | Alpha dropped >20% from peak in 7 days | No |
| 2. Contradiction | contradiction |
action_required | Two verified entries with opposite polarity and >30% domain overlap | Yes (researcher) |
| 3. Federation trust drop | federation |
attention | Peer trust score <0.5 | No |
| 4. Coverage gap | coverage |
attention/info | 3+ entries in a domain with 3+ prediction errors | Yes (researcher) |
| 5. Stale cluster | staleness |
attention | Co-occurrence cluster where all entries have >3 prediction errors | Yes (researcher) |
| 6. Merge opportunity | merge |
info | Two entries with >50% domain overlap and >50% title similarity | No |
| 7. Federation inbound | inbound |
info | Foreign entries in a domain with no local coverage | No |
Performance rules (4):
| Rule | Category | Severity | Trigger |
|---|---|---|---|
| 8. Strategy efficiency | performance |
info | Strategy averages >15% lower quality than best (≥3 colonies each) |
| 9. Diminishing rounds | performance |
attention | ≥2 colonies ran 10+ rounds with <40% quality |
| 10. Cost outlier | performance |
info | Colony cost >2.5x workspace median (≥5 colonies) |
| 11. Knowledge ROI | performance |
attention | >30% of spend on successful colonies that extracted zero entries |
Performance rules analyze ColonyOutcome projections. All deterministic,
no LLM. The Queen references these as recommendations, not automatic tuning.
Current repo state: the briefing layer also includes adaptive evaporation, branching-factor stagnation, and earned-autonomy recommendations on top of the original 7 knowledge rules and 4 performance rules.
Rules 2, 4, 5 include suggested_colony — a structured colony configuration
for auto-dispatch. Briefings are injected into the Queen's system prompt
and exposed via formicos://briefing/{workspace_id} MCP resource and
/api/v1/briefing/{workspace_id} REST endpoint.
Entries carry a granular sub_type within their category:
| Entry type | Sub-types |
|---|---|
skill |
technique, pattern, anti_pattern |
experience |
decision, convention, learning, bug |
Sub-types are classified during LLM extraction and transcript harvest.
The Knowledge browser shows sub-type badges. API and MCP resources
support sub_type filtering.
FormicOS instances can exchange knowledge via push/pull replication.
Each knowledge entry is backed by an ObservationCRDT (core/crdt.py):
- G-Counters for success/failure observation counts (per-instance, grow-only)
- LWW Registers for content, entry_type, and decay_class
- G-Sets for domain tags and archival markers
All CRDT components merge independently: counters use pairwise-max, LWW registers use timestamp (with node_id tie-breaking), sets use union. Gamma-decay is applied at query time, preserving monotonic CRDT invariants.
Current repo state: peer failures are asymmetric (they add 2.0 to beta),
foreign knowledge is discounted by trust * 0.7^hop with a 0.5 cap, and
retrieval applies an additional federated status penalty so weak foreign
candidate entries do not outrank strong local verified knowledge.
Trust between peers uses PeerTrust (surface/trust.py) — the 10th
percentile of a Beta posterior, not the mean. This penalizes uncertainty:
a new peer with little history scores low even if its mean is high.
PeerTrust(α, β).score= 10th percentile ofBeta(α, β)PeerTrust(α, β).mean=α / (α + β)(always ≥ score)- Foreign knowledge is discounted by hop:
trust * 0.7^hopwith a0.5cap
When contradictory entries arrive from different instances, resolution uses three phases:
- Pareto dominance — If one entry dominates on 2+ criteria (evidence, provenance, recency), it wins immediately.
- Adaptive threshold — Composite score comparison with a threshold that widens when evidence is low (encouraging exploration).
- Competing — If neither entry clearly wins, both are kept as competing
hypotheses for the operator. Competing pairs are tracked in projection
state (
ProjectionStore.competing_pairs) and lazily rebuilt when memory entries change. At standard and full retrieval tiers, results are annotated with acompeting_withfield listing competitor IDs, titles, and confidence means.
Federation uses CouchDB-style push/pull replication (surface/federation.py):
- Push: Local CRDT events are sent to peers, filtered by replication filter (domain allowlist, min confidence, entry types, thread exclusions).
- Pull: Remote events are received, cycle-prevented (skip own instance events), and applied to projections.
- Feedback: Validation results (success/failure) update peer trust.
Five events support federation (ADR-042):
CRDTCounterIncremented— G-Counter incrementCRDTTimestampUpdated— LWW observation timestampCRDTSetElementAdded— G-Set element additionCRDTRegisterAssigned— LWW register assignmentMemoryEntryMerged— Entry merge with provenance
Defining workflow steps after colonies have already completed will not retroactively bind completed colonies to steps. This is expected — steps are Queen scaffolding for future work, not a retrospective classification tool.
Current repo state: generate_briefing() now covers knowledge health,
performance, evaporation, branching, and earned-autonomy rules. The
"7 rules" framing below only describes the original knowledge-health subset.
The self-maintenance loop connects proactive intelligence insights to automatic colony dispatch:
- Insight generation —
generate_briefing()identifies issues via 7 rules - SuggestedColony — 3 rules (contradiction, coverage gap, stale cluster) include ready-to-dispatch colony configurations
- Policy check —
MaintenanceDispatcherevaluates the workspace'sMaintenancePolicy:suggest: no auto-dispatch (default)auto_notify: dispatch only categories listed inauto_actionsautonomous: dispatch all eligible categories
- Budget + cap —
daily_maintenance_budgetandmax_maintenance_coloniesprevent runaway spending - Dispatch — Colony spawns with
[maintenance:{category}]task prefix - Outcome — Colony outcome feeds back into confidence updates
Dense co-occurrence clusters are synthesized into higher-order entries:
- Cluster detection — During co-occurrence decay maintenance, BFS finds connected components with edge weight > 2.0
- Density check — Clusters with ≥5 entries and avg weight > 3.0 qualify
- Archivist synthesis — When maintenance policy allows (
distillationinauto_actions), an archivist colony synthesizes the cluster - KnowledgeDistilled event — Records
distilled_entry_id,source_entry_ids, andcluster_avg_weight - Elevated entry — Distilled entry gets
decay_class="stable"and elevated alpha (capped at 30)
When a knowledge entry's confidence has decayed significantly but was historically strong:
- Trigger:
current_alpha < peak_alpha * 0.5ANDdecay_classisstableorpermanent - Bonus:
gap * 0.2wheregap = peak_alpha - current_alpha - Effect: Successful re-observations restore confidence faster than cold-start entries, reflecting that re-learned knowledge is more reliable
- Tracking:
peak_alphastored in projections viaMemoryConfidenceUpdatedhandler
Composite retrieval weights can be tuned per workspace:
- Tool:
configure_scoringMCP tool sets custom weights - Storage: Persisted via
WorkspaceConfigChangedevent - Fallback: Missing workspaces use global defaults (ADR-044)
- Invariants: All 6 signals must be present; weights must sum to 1.0
- Impact: Setting
cooccurrence=0.0disables co-occurrence boost; highersemanticweight prioritizes embedding similarity over exploration
Current repo state: outcome history now also feeds escalation reporting and configuration recommendation surfaces, not just the four original performance rules.
Colony outcomes close the loop between execution and knowledge intelligence:
-
Outcome derivation —
ColonyOutcomeis computed silently from existing events during replay (no new event types, ADR-047):ColonySpawned→ workspace, thread, strategy, caste_compositionRoundCompleted→ total_rounds, total_costColonyCompleted/ColonyFailed/ColonyKilled→ succeeded, durationMemoryEntryCreated→ entries_extractedKnowledgeAccessRecorded→ entries_accessed
-
Performance rules — Four deterministic rules analyze accumulated outcomes per workspace (no LLM):
- Strategy efficiency (compare quality across strategies)
- Diminishing rounds (flag long-running low-quality colonies)
- Cost outlier (flag colonies costing >2.5x median)
- Knowledge ROI (flag spend without knowledge extraction)
-
Queen briefing — Performance insights appear in the proactive briefing alongside knowledge insights. The Queen references these as recommendations, not automatic overrides.
-
REST surface —
GET /api/v1/workspaces/{id}/outcomes?period=7dreturns aggregate outcome metrics.
Outcomes are replay-derived — the event log is the sole source of truth.
ColonyOutcome is rebuilt during replay and stored in projection memory.
No outcome editing, no separate persistence.
The Forager added exactly 4 event types to the closed union in Wave 44 (58 → 62):
| Event | Purpose | Replay role |
|---|---|---|
ForageRequested |
System decided to forage | When/why a cycle started |
ForageCycleCompleted |
Summary of cycle results | What the cycle accomplished |
DomainStrategyUpdated |
Learned fetch-level preference | Durable domain strategy |
ForagerDomainOverride |
Operator domain trust action | Operator co-authorship |
Individual search requests, fetch attempts, and content rejections are audit details. They are structured-logged but do not become first-class replay events. This keeps the event surface minimal and justified.
- Domain strategies — per-workspace, per-domain preferred fetch level, success/failure counts. Survives replay.
- Forage cycle summaries — per-workspace list of completed cycles with mode, reason, counts, and duration. Links back to the originating request.
- Domain overrides — per-workspace, per-domain operator trust/distrust
actions.
resetremoves the override entirely.
| Mode | Trigger | Description | Status |
|---|---|---|---|
reactive |
Low-confidence live-task retrieval | Highest-value; gap detected during colony work | Operational |
proactive |
Briefing rule (stale cluster, coverage gap, etc.) | Background; lower priority than reactive | Operational |
operator |
Manual operator request | Direct operator control | Operational |
Current state: Proactive foraging runs through the scheduled maintenance loop.
proactive_intelligenceemits boundedforage_signalmetadata,MaintenanceDispatcherevaluates workspace briefings, and eligible signals are handed toForagerServicefor background execution. Reactive, proactive, and operator-triggered foraging are operational.