Skip to content

Commit f0a35a5

Browse files
Shawclaude
andcommitted
chore: capture concurrent worktree edits (capability-router, benchmarks, release workflow)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 966ba15 commit f0a35a5

15 files changed

Lines changed: 1651 additions & 72 deletions

packages/agent/docs/capability-router-remote-plugins.md

Lines changed: 43 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -994,7 +994,11 @@ metadata, duplicate endpoint ids, duplicate provider reports, malformed endpoint
994994
ids, non-lowercase provider names, invalid Cloud API base URLs, Cloud API base
995995
URLs with query or fragment components, non-2xx route results, non-JavaScript
996996
view asset paths/content types, missing, malformed, or empty-content view asset
997-
SHA-256 digests, and credential-shaped field names or string values such as tokens,
997+
SHA-256 digests, missing model results, failed lifecycle calls, unhandled event
998+
calls, asset integrity values that do not match the recorded asset digest,
999+
empty action/provider/evaluator/response-handler outputs, missing
1000+
service/app-bridge results, and credential-shaped field names or string values
1001+
such as tokens,
9981002
authorization headers, API keys, passwords, secrets, bearer/basic auth values,
9991003
and URLs with embedded credentials anywhere in the artifact. Every exercised RPC
10001004
target must also start with one of the module ids observed in the live manifest,
@@ -1004,6 +1008,16 @@ target recorded in `conformance.moduleExercises`. The conformance harness keeps
10041008
the required surface summary in `conformance.exercised`, then performs
10051009
additional cheap RPC calls for untouched modules so multi-module endpoints still
10061010
produce per-module exercise evidence without overwriting the summary target.
1011+
The harness fails at observation time when action, provider, evaluator,
1012+
response-handler evaluator, response-handler field evaluator, service, or app
1013+
bridge calls return empty success-shaped payloads, and when lifecycle or event
1014+
calls do not report success.
1015+
When a view asset includes subresource integrity metadata, the harness verifies
1016+
that value against the fetched bundle bytes before recording the observation.
1017+
The live report writer rejects unknown report kinds before writing, only accepts
1018+
lowercase hyphenated report names, enforces `cloud.json` for Cloud and
1019+
`<provider>.json` for provider reports, and writes with exclusive create so a
1020+
second artifact cannot overwrite the first observation.
10071021
`sync.registered` and `sync.registeredModules` must not contain duplicate
10081022
materialized plugin/module identities, and every registered module must have a
10091023
unique trusted `sync.trustDecisions` entry, so full-surface evidence is tied
@@ -1132,11 +1146,13 @@ packages/agent/src/services/remote-capability-endpoint-conformance.test.ts
11321146
building a temporary remote view bundle, starting the reference endpoint,
11331147
running CLI conformance against it with bearer auth, importing the returned
11341148
bundle as JavaScript, and tearing it down.
1135-
- `bun run --cwd packages/agent test:remote-capabilities` passed with 158
1136-
tests passing and 1 skipped after adding registered-remote component
1137-
ownership checks, cross-module/local model collision checks, and stale
1138-
contribution cleanup coverage for disappearing remote modules, plus runtime
1139-
app route-module collision protection for remote app bridges.
1149+
- `bun run --cwd packages/agent test:remote-capabilities` passed with 188
1150+
tests passing and 3 skipped. The canonical suite covers registered-remote
1151+
component ownership checks, cross-module/local model collision checks, stale
1152+
contribution cleanup coverage for disappearing remote modules, runtime app
1153+
route-module collision protection for remote app bridges, and live report
1154+
writer safety for report names, identity, duplicate artifacts, and weak
1155+
conformance result rejection.
11401156
- `bun run --cwd packages/agent test:remote-capabilities:source-build` passed
11411157
with 2 focused tests passing and 35 adapter tests skipped by name filter.
11421158
- `bun run --cwd packages/agent test:remote-capabilities:provider-live` found
@@ -1207,10 +1223,13 @@ packages/agent/src/services/remote-capability-cloud-sandbox.cloud-smoke.test.ts
12071223
strict scheduled/manual observation, and that the final `test-status` gate
12081224
treats scheduled runs as strict, with required provider endpoints, strict
12091225
live report validation, required artifact upload, and matching live report
1210-
directories between smoke producers, validators, and uploaded artifacts.
1226+
directories between smoke producers, validators, and uploaded artifacts. It
1227+
also audits the package-level `test:remote-capabilities` script so live report
1228+
writer safety remains in the canonical remote-capability suite.
12111229
- `bun run test:remote-capabilities:live-ci-audit:self-test` mutates those
1212-
report-directory env vars, artifact upload paths, final `test-status` live
1213-
job gating, scheduled/manual live observation gates, Cloud
1230+
report-directory env vars, artifact upload paths, package-level remote
1231+
capability suite membership, final `test-status` live job gating,
1232+
scheduled/manual live observation gates, Cloud
12141233
freshness/identity validation flags, provider primary endpoint secret
12151234
enforcement, provider allowed/required lists, and provider GitHub-env
12161235
matching, and proves the live-CI audit fails when smoke output no longer
@@ -1223,18 +1242,30 @@ packages/agent/src/services/remote-capability-cloud-sandbox.cloud-smoke.test.ts
12231242
configured transport URL. The fingerprint helper also strips query/fragment
12241243
components and rejects embedded URL credentials before hashing, matching the
12251244
URL-backed endpoint provider's accepted base URL shape.
1245+
- Live report writers only accept lowercase report names with numbers or
1246+
hyphens, require Cloud reports to be named `cloud`, require provider reports
1247+
to be named after their provider, and create report files with exclusive
1248+
writes, so a duplicate Cloud or provider report cannot silently overwrite an
1249+
earlier artifact before validation/upload.
12261250
- Conformance reports include an `rpcCalls` ledger that records every canonical
12271251
protocol method used for each exercised surface and module. The live report
12281252
validator requires this ledger to cover every `moduleExercises` entry, every
12291253
summarized required surface, and every evaluator phase (`shouldRun`,
12301254
`prepare`, `prompt`, `process`, response-handler `evaluate`, and field
12311255
`parse`/`handle`), so live evidence proves the endpoint was exercised through
12321256
the standard RPC-like protocol, not only materialized in a manifest.
1257+
- Model, lifecycle, event, service, and app-bridge conformance results must
1258+
carry their required protocol success fields: `modelResult.result`,
1259+
`lifecycleResult.ok: true`, `eventResult.handled: true`,
1260+
`serviceResult.result`, and `appBridgeResult.result`.
12331261
- View-asset conformance now preserves manifest-declared asset metadata and
12341262
rejects fetched bundles whose content type or integrity value contradicts the
1235-
manifest. The live report validator also rejects artifacts whose recorded
1236-
manifest asset metadata disagrees with the fetched asset metadata or whose
1237-
fetched JavaScript bundle digest is the empty SHA-256 digest.
1263+
manifest, whose integrity value does not include a SHA-256 token, or whose
1264+
integrity value does not match the fetched bytes. The live report validator
1265+
also rejects artifacts whose recorded manifest asset metadata disagrees with
1266+
the fetched asset metadata, whose integrity value lacks or does not match the
1267+
recorded SHA-256 digest, or whose fetched JavaScript bundle digest is the empty
1268+
SHA-256 digest.
12381269
- Runtime live summaries include `runtime.remotePlugins`, keyed by plugin name,
12391270
endpoint id, and module id. The validator requires this runtime identity list
12401271
to match `sync.registeredModules` exactly, so count totals cannot stand in for

packages/agent/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@
4646
"format:fix": "bunx @biomejs/biome format --write src",
4747
"pack:dry-run": "cd dist && npm pack --dry-run",
4848
"test": "vitest run --config vitest.config.ts",
49-
"test:remote-capabilities": "cd ../.. && bunx vitest run packages/agent/src/services/remote-plugin-adapter.test.ts packages/agent/src/services/remote-capability-router.test.ts packages/agent/src/services/remote-capability-endpoint-provider.test.ts packages/agent/src/services/remote-capability-endpoint-conformance.test.ts packages/agent/src/services/remote-capability-cloud-sandbox.test.ts packages/agent/src/api/remote-capability-routes.test.ts packages/agent/src/__tests__/views-registry-integration.test.ts packages/core/src/capabilities/index.test.ts --coverage.enabled=false",
49+
"test:remote-capabilities": "cd ../.. && bunx vitest run packages/agent/src/services/remote-plugin-adapter.test.ts packages/agent/src/services/remote-capability-router.test.ts packages/agent/src/services/remote-capability-endpoint-provider.test.ts packages/agent/src/services/remote-capability-endpoint-conformance.test.ts packages/agent/src/services/remote-capability-cloud-sandbox.test.ts packages/agent/src/services/remote-capability-live-report.test.ts packages/agent/src/api/remote-capability-routes.test.ts packages/agent/src/__tests__/views-registry-integration.test.ts packages/core/src/capabilities/index.test.ts --coverage.enabled=false",
5050
"test:remote-capabilities:source-build": "cd ../.. && bunx vitest run packages/agent/src/services/remote-plugin-adapter.test.ts -t \"builds a remote plugin from source|loads a built remote plugin from a separate capability server process\" --coverage.enabled=false",
5151
"test:remote-capabilities:docker": "cd ../.. && ELIZA_REMOTE_CAPABILITY_DOCKER_SMOKE=1 bunx vitest run packages/agent/src/services/remote-plugin-adapter.test.ts -t \"Docker container capability server\" --coverage.enabled=false",
5252
"test:remote-capabilities:cloud-live": "cd ../.. && ELIZA_REMOTE_CAPABILITY_CLOUD_LIVE=1 bunx vitest run packages/agent/src/services/remote-capability-cloud-sandbox.cloud-smoke.test.ts --coverage.enabled=false",

packages/agent/src/services/remote-capability-endpoint-conformance.test.ts

Lines changed: 218 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -489,6 +489,108 @@ describe("remote capability endpoint conformance", () => {
489489
);
490490
});
491491

492+
it.each([
493+
[
494+
"plugin.action.invoke",
495+
"action",
496+
{},
497+
'Capability endpoint "remote-endpoint" returned an empty action result.',
498+
],
499+
[
500+
"plugin.provider.get",
501+
"provider",
502+
{},
503+
'Capability endpoint "remote-endpoint" returned an empty provider result.',
504+
],
505+
[
506+
"plugin.model.invoke",
507+
"model",
508+
{},
509+
"result is required.",
510+
],
511+
[
512+
"plugin.lifecycle.call",
513+
"lifecycle",
514+
{ ok: false },
515+
'Capability endpoint "remote-endpoint" returned a failed lifecycle result.',
516+
],
517+
[
518+
"plugin.event.handle",
519+
"event",
520+
{ handled: false },
521+
'Capability endpoint "remote-endpoint" returned an unhandled event result.',
522+
],
523+
[
524+
"plugin.service.call",
525+
"service",
526+
{},
527+
'Capability endpoint "remote-endpoint" returned an empty service result.',
528+
],
529+
[
530+
"plugin.appBridge.call",
531+
"appBridge",
532+
{},
533+
'Capability endpoint "remote-endpoint" returned an empty app bridge result.',
534+
],
535+
] as const)(
536+
"fails when %s returns weak conformance evidence",
537+
async (method, surface, result, message) => {
538+
installMinimalFixtureFetch({ [method]: result });
539+
540+
await expect(
541+
assertRemoteCapabilityEndpointConformance({
542+
endpoint: {
543+
id: "remote-endpoint",
544+
baseUrl: "https://remote.example.test",
545+
},
546+
requiredSurfaces: [surface],
547+
}),
548+
).rejects.toThrow(message);
549+
},
550+
);
551+
552+
it.each([
553+
[
554+
"plugin.evaluator.process",
555+
"evaluator",
556+
{},
557+
'Capability endpoint "remote-endpoint" returned an empty evaluator process result.',
558+
],
559+
[
560+
"plugin.responseHandlerEvaluator.evaluate",
561+
"responseHandlerEvaluator",
562+
{},
563+
'Capability endpoint "remote-endpoint" returned an empty response-handler evaluator result.',
564+
],
565+
[
566+
"plugin.responseHandlerFieldEvaluator.parse",
567+
"responseHandlerFieldEvaluator",
568+
{},
569+
'Capability endpoint "remote-endpoint" returned an empty response-handler field evaluator parse result.',
570+
],
571+
[
572+
"plugin.responseHandlerFieldEvaluator.handle",
573+
"responseHandlerFieldEvaluator",
574+
{},
575+
'Capability endpoint "remote-endpoint" returned an empty response-handler field evaluator handle result.',
576+
],
577+
] as const)(
578+
"fails when %s returns weak staged conformance evidence",
579+
async (method, surface, result, message) => {
580+
installMinimalFixtureFetch({ [method]: result });
581+
582+
await expect(
583+
assertRemoteCapabilityEndpointConformance({
584+
endpoint: {
585+
id: "remote-endpoint",
586+
baseUrl: "https://remote.example.test",
587+
},
588+
requiredSurfaces: [surface],
589+
}),
590+
).rejects.toThrow(message);
591+
},
592+
);
593+
492594
it("fails when remote view conformance returns a non-JavaScript asset", async () => {
493595
installMinimalFixtureFetch({
494596
"plugin.asset.get": {
@@ -567,6 +669,83 @@ describe("remote capability endpoint conformance", () => {
567669
'Capability endpoint "remote-endpoint" returned a view asset integrity value that does not match its manifest.',
568670
);
569671
});
672+
673+
it("fails when remote view asset integrity does not match the returned bytes", async () => {
674+
installMinimalFixtureFetch(
675+
{
676+
"plugin.asset.get": {
677+
...CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.asset,
678+
integrity: "sha256-deadbeef",
679+
},
680+
},
681+
{
682+
modules: [
683+
{
684+
...CAPABILITY_ROUTER_PROTOCOL_FIXTURE.module,
685+
views: [
686+
{
687+
...CAPABILITY_ROUTER_PROTOCOL_FIXTURE.module.views[0],
688+
integrity: "sha256-deadbeef",
689+
},
690+
],
691+
},
692+
],
693+
},
694+
);
695+
696+
await expect(
697+
assertRemoteCapabilityEndpointConformance({
698+
endpoint: {
699+
id: "remote-endpoint",
700+
baseUrl: "https://remote.example.test",
701+
},
702+
requiredSurfaces: ["viewAsset"],
703+
}),
704+
).rejects.toThrow(
705+
'Capability endpoint "remote-endpoint" returned a view asset integrity value that does not match its bytes.',
706+
);
707+
});
708+
709+
it("fails when remote view asset integrity lacks a sha256 token", async () => {
710+
const assetBytes = Buffer.from(
711+
CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.asset.bodyBase64,
712+
"base64",
713+
);
714+
const integrity = `sha384-${createHash("sha384").update(assetBytes).digest("base64")}`;
715+
installMinimalFixtureFetch(
716+
{
717+
"plugin.asset.get": {
718+
...CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.asset,
719+
integrity,
720+
},
721+
},
722+
{
723+
modules: [
724+
{
725+
...CAPABILITY_ROUTER_PROTOCOL_FIXTURE.module,
726+
views: [
727+
{
728+
...CAPABILITY_ROUTER_PROTOCOL_FIXTURE.module.views[0],
729+
integrity,
730+
},
731+
],
732+
},
733+
],
734+
},
735+
);
736+
737+
await expect(
738+
assertRemoteCapabilityEndpointConformance({
739+
endpoint: {
740+
id: "remote-endpoint",
741+
baseUrl: "https://remote.example.test",
742+
},
743+
requiredSurfaces: ["viewAsset"],
744+
}),
745+
).rejects.toThrow(
746+
'Capability endpoint "remote-endpoint" returned a view asset integrity value without a sha256 digest.',
747+
);
748+
});
570749
});
571750

572751
function installMinimalFixtureFetch(
@@ -575,6 +754,43 @@ function installMinimalFixtureFetch(
575754
modules?: unknown[];
576755
} = {},
577756
): void {
757+
const results: Record<string, unknown> = {
758+
"plugin.action.invoke": CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.action,
759+
"plugin.provider.get": CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.provider,
760+
"plugin.route.call": CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.route,
761+
"plugin.asset.get": CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.asset,
762+
"plugin.model.invoke": CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.model,
763+
"plugin.lifecycle.call":
764+
CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.lifecycle,
765+
"plugin.event.handle": CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.event,
766+
"plugin.service.call": CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.service,
767+
"plugin.appBridge.call":
768+
CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.appBridge,
769+
"plugin.evaluator.shouldRun":
770+
CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.evaluatorShouldRun,
771+
"plugin.evaluator.prepare":
772+
CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.evaluatorPrepare,
773+
"plugin.evaluator.prompt":
774+
CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.evaluatorPrompt,
775+
"plugin.evaluator.process":
776+
CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results.evaluatorProcess,
777+
"plugin.responseHandlerEvaluator.shouldRun":
778+
CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results
779+
.responseHandlerEvaluatorShouldRun,
780+
"plugin.responseHandlerEvaluator.evaluate":
781+
CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results
782+
.responseHandlerEvaluatorEvaluate,
783+
"plugin.responseHandlerFieldEvaluator.shouldRun":
784+
CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results
785+
.responseHandlerFieldEvaluatorShouldRun,
786+
"plugin.responseHandlerFieldEvaluator.parse":
787+
CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results
788+
.responseHandlerFieldEvaluatorParse,
789+
"plugin.responseHandlerFieldEvaluator.handle":
790+
CAPABILITY_ROUTER_PROTOCOL_FIXTURE.results
791+
.responseHandlerFieldEvaluatorHandle,
792+
...resultsByMethod,
793+
};
578794
globalThis.fetch = vi.fn(async (url: string | URL, init?: RequestInit) => {
579795
const body = init?.body
580796
? (JSON.parse(String(init.body)) as { method?: string })
@@ -600,8 +816,8 @@ function installMinimalFixtureFetch(
600816
},
601817
});
602818
}
603-
if (body?.method && body.method in resultsByMethod) {
604-
return jsonResponse({ ok: true, result: resultsByMethod[body.method] });
819+
if (body?.method && body.method in results) {
820+
return jsonResponse({ ok: true, result: results[body.method] });
605821
}
606822
return jsonResponse({ ok: false, error: { message: "unexpected" } }, 404);
607823
}) as unknown as typeof fetch;

0 commit comments

Comments
 (0)