You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,25 +11,25 @@
11
11
**Agents that remember, forge their own tools, and survive long-running sessions.** Persistent cognitive memory, optional HEXACO personality, multi-agent orchestration, and one dispatch interface across 11 LLM providers. Apache-2.0.
AgentOS is an open-source TypeScript framework for AI agents that **remember, adapt, and write their own tools**.
31
31
32
-
-**85.6% on [LongMemEval-S](https://github.com/framersai/agentos-bench/blob/master/results/LEADERBOARD.md)** at $0.0090 per correct answer (gpt-4o reader): +1.4 points over Mastra OM gpt-4o (84.23%), 0.4 behind Emergence.ai's 86% closed-source SOTA.
32
+
-**85.6% on [LongMemEval-S](https://github.com/framerslab/agentos-bench/blob/master/results/LEADERBOARD.md)** at $0.0090 per correct answer (gpt-4o reader): +1.4 points over Mastra OM gpt-4o (84.23%), 0.4 behind Emergence.ai's 86% closed-source SOTA.
33
33
-**70.2% on LongMemEval-M** (1.5M-token haystacks, 500 sessions per question): the only open-source library on the public record above 65% on M with publicly reproducible methodology.
34
34
-**Runtime tool forging.** An agent writes a TypeScript function with a Zod schema, an LLM judge approves it, and it runs in a hardened `node:vm` sandbox before joining the catalog for the rest of the session. Multi-agent teams spawn judge-reviewed specialists the same way.
35
35
-**Persistent [cognitive memory](https://docs.agentos.sh/features/cognitive-memory)** with 8 neuroscience-backed mechanisms: Ebbinghaus decay, retrieval-induced forgetting, reconsolidation, and source-confidence decay, grounded in published cognitive-science literature.
@@ -47,7 +47,7 @@ AgentOS is an open-source TypeScript framework for AI agents that **remember, ad
47
47
width="900" />
48
48
</picture>
49
49
50
-
<sub>Runtime tool forging + multi-agent collaboration. Reproduce with <code>node <ahref="https://github.com/framersai/agentos/blob/master/examples/emergent-hierarchical-spawning.mjs">examples/emergent-hierarchical-spawning.mjs</a></code>.</sub>
50
+
<sub>Runtime tool forging + multi-agent collaboration. Reproduce with <code>node <ahref="https://github.com/framerslab/agentos/blob/master/examples/emergent-hierarchical-spawning.mjs">examples/emergent-hierarchical-spawning.mjs</a></code>.</sub>
51
51
52
52
</div>
53
53
@@ -185,7 +185,7 @@ M's haystacks exceed every production context window; most vendors only publish
185
185
186
186
At matched Top-5 retrieval, +4.5 above the round-level paper baseline (65.7%) and 1.2 below the session-level (71.4%); the paper's overall strongest GPT-4o result is 72.0% at Top-10. Of open-source libraries with publicly reproducible runs, AgentOS is the only one above 65% on M.
Methodology stack: bootstrap 95% CIs at 10k Mulberry32 resamples (seed 42), per-benchmark judge-FPR probes (S 1%, M 2%, LOCOMO 0%), per-case run JSONs, single-CLI reproduction. The [transparency audit](https://agentos.sh/en/blog/memory-benchmark-transparency-audit/) covers what the headline numbers don't: LOCOMO's ~6.4% answer-key error rate, the LongMemEval-S context-window confound, and the Mem0-vs-Zep comparison gaming case study, alongside which vendors disclose which methodology dimensions.
|[`@framers/agentos-extensions-registry`](https://www.npmjs.com/package/@framers/agentos-extensions-registry)| Discovery + auto-loader layer for the extensions catalog. Hosts pull the index without pulling every implementation; the runtime resolves and registers packs at startup. |
201
201
|[`@framers/agentos-skills`](https://www.npmjs.com/package/@framers/agentos-skills)| 88 curated `SKILL.md` skills covering common tasks. |
202
202
|[`@framers/agentos-skills-registry`](https://www.npmjs.com/package/@framers/agentos-skills-registry)| Discovery + auto-loader layer for the skills catalog. Also the surface where promoted forged tools land after `SkillExporter`. |
203
-
|[`@framers/agentos-bench`](https://github.com/framersai/agentos-bench)| Open benchmark harness. Bootstrap 95% CIs at 10k resamples, judge false-positive-rate probes, per-case run JSONs at fixed seed. MIT (the rest of AgentOS is Apache 2.0). |
203
+
|[`@framers/agentos-bench`](https://github.com/framerslab/agentos-bench)| Open benchmark harness. Bootstrap 95% CIs at 10k resamples, judge false-positive-rate probes, per-case run JSONs at fixed seed. MIT (the rest of AgentOS is Apache 2.0). |
|**Reproducibility**| Per-case run JSONs at `--seed 42`, single-CLI reproduction, Apache-2.0 bench at [github.com/framersai/agentos-bench](https://github.com/framersai/agentos-bench)|
220
+
|**Reproducibility**| Per-case run JSONs at `--seed 42`, single-CLI reproduction, Apache-2.0 bench at [github.com/framerslab/agentos-bench](https://github.com/framerslab/agentos-bench)|
221
221
222
-
**[Join Discord for the announcement ->](https://wilds.ai/discord)** * **[Read the benchmarks now ->](https://github.com/framersai/agentos-bench/blob/master/results/LEADERBOARD.md)**
222
+
**[Join Discord for the announcement ->](https://wilds.ai/discord)** * **[Read the benchmarks now ->](https://github.com/framerslab/agentos-bench/blob/master/results/LEADERBOARD.md)**
223
223
224
224
---
225
225
@@ -369,7 +369,7 @@ Define any scenario as JSON. Run it with AI commanders that have different HEXAC
[Contributing Guide](https://github.com/framersai/agentos/blob/master/CONTRIBUTING.md) * We use [Conventional Commits](https://www.conventionalcommits.org/).
475
+
[Contributing Guide](https://github.com/framerslab/agentos/blob/master/CONTRIBUTING.md) * We use [Conventional Commits](https://www.conventionalcommits.org/).
Copy file name to clipboardExpand all lines: docs/ADAPTIVE_MEMORY_ROUTER.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@ Self-calibrating extension of [Memory Router](./MEMORY_ROUTER.md). Derives the r
4
4
5
5
## When to use this
6
6
7
-
Use [`MemoryRouter`](https://github.com/framersai/agentos/blob/master/src/orchestration/pipeline/memory/MemoryRouter.ts) directly with a shipping preset (`minimize-cost`, `balanced`, `maximize-accuracy`) when your workload is similar to LongMemEval-S — conversational memory with the six standard categories.
7
+
Use [`MemoryRouter`](https://github.com/framerslab/agentos/blob/master/src/orchestration/pipeline/memory/MemoryRouter.ts) directly with a shipping preset (`minimize-cost`, `balanced`, `maximize-accuracy`) when your workload is similar to LongMemEval-S — conversational memory with the six standard categories.
8
8
9
-
Use [`AdaptiveMemoryRouter`](https://github.com/framersai/agentos/blob/master/src/orchestration/pipeline/memory/adaptive.ts) when:
9
+
Use [`AdaptiveMemoryRouter`](https://github.com/framerslab/agentos/blob/master/src/orchestration/pipeline/memory/adaptive.ts) when:
10
10
11
11
1. Your category distribution diverges from LongMemEval-S (e.g., heavy on temporal, light on multi-session).
12
12
2. Your reader / judge / cost profile differs (different LLM, different judge rubric, different per-call cost).
@@ -30,7 +30,7 @@ Three steps:
30
30
31
31
1.**Aggregate**: roll samples up by `(category, backend)` into mean cost + mean accuracy + sample count.
32
32
2.**Per-category select**: apply a preset rule per category to pick a backend.
33
-
3.**Build table**: assemble the per-category picks into a frozen [`RoutingTable`](https://github.com/framersai/agentos/blob/master/src/orchestration/pipeline/memory/routing-tables.ts). Categories with insufficient calibration fall back to the static preset's default.
33
+
3.**Build table**: assemble the per-category picks into a frozen [`RoutingTable`](https://github.com/framerslab/agentos/blob/master/src/orchestration/pipeline/memory/routing-tables.ts). Categories with insufficient calibration fall back to the static preset's default.
34
34
35
35
Three preset rules:
36
36
@@ -99,7 +99,7 @@ For each candidate backend, run your workload through it on a Phase A subset:
99
99
1. Sample N queries from your workload (typically N ≈ 100-300 per category — stratified if some categories are rare).
100
100
2. Dispatch each query to each candidate backend.
101
101
3. Score each outcome with your judge.
102
-
4. Emit one [`CalibrationSample`](https://github.com/framersai/agentos/blob/master/src/orchestration/pipeline/memory/adaptive.ts) per (query × backend × outcome) tuple.
102
+
4. Emit one [`CalibrationSample`](https://github.com/framerslab/agentos/blob/master/src/orchestration/pipeline/memory/adaptive.ts) per (query × backend × outcome) tuple.
103
103
104
104
Total spend: roughly N × 3 × per-backend-cost-per-query. For LongMemEval-S Phase A this was ~$30 per backend.
-[`aggregateCalibration`](https://github.com/framersai/agentos/blob/master/src/orchestration/pipeline/memory/adaptive.ts) — pure aggregator
144
-
-[`selectByPreset`](https://github.com/framersai/agentos/blob/master/src/orchestration/pipeline/memory/adaptive.ts) — pure per-category selector
145
-
-[`buildAdaptiveRoutingTable`](https://github.com/framersai/agentos/blob/master/src/orchestration/pipeline/memory/adaptive.ts) — pure full-table constructor
146
-
-[`AdaptiveMemoryRouter`](https://github.com/framersai/agentos/blob/master/src/orchestration/pipeline/memory/adaptive.ts) — class extending [`MemoryRouter`](https://github.com/framersai/agentos/blob/master/src/orchestration/pipeline/memory/MemoryRouter.ts) with calibration-derived table
-[`aggregateCalibration`](https://github.com/framerslab/agentos/blob/master/src/orchestration/pipeline/memory/adaptive.ts) — pure aggregator
144
+
-[`selectByPreset`](https://github.com/framerslab/agentos/blob/master/src/orchestration/pipeline/memory/adaptive.ts) — pure per-category selector
145
+
-[`buildAdaptiveRoutingTable`](https://github.com/framerslab/agentos/blob/master/src/orchestration/pipeline/memory/adaptive.ts) — pure full-table constructor
146
+
-[`AdaptiveMemoryRouter`](https://github.com/framerslab/agentos/blob/master/src/orchestration/pipeline/memory/adaptive.ts) — class extending [`MemoryRouter`](https://github.com/framerslab/agentos/blob/master/src/orchestration/pipeline/memory/MemoryRouter.ts) with calibration-derived table
0 commit comments