Skip to content

test(devnet): all-oxigraph-server 4c/2e fleet + devnet-test-everything.sh (total e2e gate)#1029

Open
branarakic wants to merge 3 commits into
mainfrom
test/devnet-total-e2e
Open

test(devnet): all-oxigraph-server 4c/2e fleet + devnet-test-everything.sh (total e2e gate)#1029
branarakic wants to merge 3 commits into
mainfrom
test/devnet-total-e2e

Conversation

@branarakic

Copy link
Copy Markdown
Contributor

Makes the devnet a uniform, robust, production-parity fleet and adds a single total end-to-end gate that exercises everything from both core and edge nodes.

1. Uniform oxigraph-server fleet (devnet.sh)

Every node now runs the daemon-managed oxigraph-server (rc.12+ production default, per-node port, no Docker), replacing the old mixed fleet (embedded oxigraph on cores 3-4, oxigraph-worker on edges 5-6). Those single-threaded embedded stores wedge under heavy SWM-sync / big-promote load and silently drop a core below the StorageACK quorum — reproduced live: 2 embedded cores timed out, leaving an edge publish unable to dial 3 core ACKs (QuorumUnmetError). A homogeneous oxigraph-server fleet is robust under load.

2. 4-core / 2-edge e2e default (playwright.config.ts)

Bumped the Playwright devnet default 4 → 6 (4 core + 2 edge) so the suite exercises the real mixed-fleet shape — edge nodes publishing/sharing through cores — instead of an all-core devnet. (devnet.sh start already defaulted to 6/4-core.)

3. scripts/devnet-test-everything.sh — the total gate

One script, asserted green on the devnet:

  • Topology preflight + mesh warm-up (absorbs the cold-boot first-publish SWM-hosting race).
  • Every valid CG config variant × node role: public-open (core + edge), private-curated-eoa, private-open, public-curated (core), local-only-open + local-only-curated (edge, WM-only/unregistered) — each create → import → promote → publish (kaId>0) or WM-write.
  • Big file (~400 entities / 225KB) full lifecycle on oxigraph-server. Note: "big" is bounded by SWM-gossip+ACK publish time, not the 10MB gossip limit — a ~0.9MB publish exceeds 300s (a recorded finding).
  • Cross-node sharing (a core consumes an edge-published CG), random sampling, staking (cores staked / edges identity-less), UI smoke.

Robustness baked in: publish retry for new-CG cold-gossip + policy-not-confirmed transients; raw-RDF-literal COUNT parsing; curator-only allowlists for curated (encrypted) CGs.

🤖 Generated with Claude Code

Branimir Rakic and others added 2 commits June 5, 2026 22:32
- devnet.sh: EVERY node now runs the daemon-managed oxigraph-server (rc.12+
  production default) on a per-node port (7900+N, no Docker), replacing the old
  mixed fleet (embedded `oxigraph` on cores 3-4, `oxigraph-worker` on edges 5-6).
  Those single-threaded embedded stores wedge under heavy SWM-sync / big-promote
  load and silently drop a core below the StorageACK quorum — observed: 2
  embedded cores timed out, leaving an edge publish unable to dial 3 core ACKs
  (QuorumUnmetError). A homogeneous oxigraph-server fleet is robust under load and
  production-parity. Dropped the now-unused blazegraph / external-Oxigraph Docker
  probes + fixed the status backend labels.

- playwright.config.ts: bump the e2e devnet default 4 → 6 nodes (4 core + 2 edge)
  so the suite exercises the real mixed-fleet shape — edge nodes publishing/
  sharing through cores — instead of an all-core devnet.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
One script that exercises the full DKG V10 knowledge lifecycle across the
heterogeneous 4-core/2-edge all-oxigraph-server fleet, from BOTH core and edge
nodes, over every valid context-graph config variant, with small + big files,
plus random sampling, staking, and the node-UI.

Scenarios (all asserted, green on the devnet):
- Topology preflight (4 core / 2 edge).
- Mesh warm-up (absorbs the cold-boot first-publish SWM-hosting race).
- CG variant matrix × node role: public-open (core + EDGE), private-curated-eoa,
  private-open, public-curated (core), local-only-open + local-only-curated
  (edge, WM-only/unregistered). Each: create → import → promote → publish (or
  WM-write for unregistered), asserting a confirmed on-chain kaId>0.
- BIG file (~400 entities / 225KB): full create → promote → publish lifecycle on
  oxigraph-server. "Big" is bounded by SWM-gossip+ACK publish time (not the 10MB
  gossip limit) — a ~0.9MB publish exceeds 300s, a finding worth recording.
- Cross-node sharing: a core consumes an EDGE-published CG (subscribe + sync).
- Random sampling prover active; cores staked / edges identity-less; UI smoke.

Robustness: publish_with_retry tolerates new-CG cold-gossip + policy-not-confirmed
transients (configurable budget/timeout); /api/query COUNT bindings parsed from
raw RDF literals; curated CGs use a curator-only allowlist (encryption setup).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
// mixed-fleet shape — edge nodes publishing/sharing THROUGH cores — not an
// all-core devnet. node1 sees 5 peers (satisfies peer-connectivity.spec.ts
// ">1 peer"). Override the count with PLAYWRIGHT_DEVNET_NUM_NODES.
const NUM_NODES = process.env.PLAYWRIGHT_DEVNET_NUM_NODES || '6';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This default is now out of sync with e2e/helpers/devnet.ts and e2e/helpers/real-node.ts, which still fall back to 4 when PLAYWRIGHT_DEVNET_NUM_NODES is not present in the runner process. webServer.command only exports the env var to the bootstrap/Vite subprocess, so the specs still assert a 4-node mesh (MIN_PEERS=3) and treat node5/6 as outside the topology even though bootstrap started 6 nodes. Thread the chosen node count into the test runner or update the shared fallback in the same PR.


# ── Scenario 4: private-open, core lifecycle ────────────────────────────────
sec "SCENARIO 4 — private-open CG (curated members, any member publishes)"
CG_PRIVOPEN=$(create_cg 2 "ev-privopen-$TS" 1 1 true "[\"$(agent_addr 2)\"]")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: This private-open scenario is supposed to prove that any allowed member can publish, but the allowlist only contains node2 and the lifecycle also runs from node2. That means the script never exercises the non-creator member-publish path and can go green while the actual private-open membership flow is broken. Add at least one second allowed agent and publish from that node.

CG_LOPEN=$(create_cg 6 "ev-localopen-$TS" 0 1 false)
if [ -n "$CG_LOPEN" ]; then
ok "edge node 6 created local-only-open CG: $CG_LOPEN"
run_lifecycle 6 "$CG_LOPEN" "localopen-edge" 4 wm

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: run_lifecycle ... wm sets finalize=false and returns before /promote, so this 'WM→SWM' local-only scenario never touches shared memory at all. If local-only CGs are meant to cover host-mode SWM, switch this to a real SWM write/promote path; if WM-only is intentional, rename the scenario so the script does not claim coverage it does not provide.

Comment thread scripts/devnet.sh Outdated
# silently dropping a core below the StorageACK quorum (an edge publish then
# can't dial 3 core ACKs). The Docker-gated blazegraph / external-Oxigraph
# matrix backends are no longer wired into any node.
log "Store backend: ALL ${NUM_NODES} nodes → daemon-managed oxigraph-server (per-node port 7900+N, no Docker)"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Hard-wiring every devnet node to oxigraph-server removes the only end-to-end devnet coverage for the other supported backends (blazegraph and generic sparql-http) while the repo still documents and ships those paths. Without a replacement matrix lane, backend-specific regressions will now slip through a green devnet run. Keep an opt-in mixed-backend mode or add a dedicated CI job before dropping this coverage.

…ty + opt-in mixed backend

- 🔴 playwright/helpers node-count out of sync: playwright.config.ts bumped the
  devnet to 6 (4 core + 2 edge) but e2e/helpers/{real-node,devnet}.ts fall back to
  4 in the RUNNER process (the env var only reached the bootstrap/Vite subprocess),
  so specs skipped node5/6 (edges) as "outside topology" and under-counted
  EXPECTED_MIN_PEERS. Fix: playwright.config.ts now threads
  PLAYWRIGHT_DEVNET_NUM_NODES into the runner process; the helper fallbacks are
  bumped 4→6 to match.
- 🟡 private-open scenario only published from the creator: clarified — a
  non-curator private-CG member needs sender-key provisioning via the INVITE flow
  (a bare allowlist member even breaks the curator's own publish), so this scenario
  is the curator lifecycle and points to devnet-test-invite-flow.sh for
  member-publish rather than falsely claiming that coverage.
- 🟡 local-only "WM→SWM" never promoted: renamed S6/S7 to "WM write only" —
  unregistered CGs cannot promote/publish, and the name now matches what's asserted.
- 🟡 all-oxigraph-server dropped other-backend coverage: added opt-in
  DEVNET_MIXED_BACKEND=1 (per-node rotation across oxigraph-server / oxigraph /
  oxigraph-worker / blazegraph / sparql-http); default stays the robust uniform
  oxigraph-server fleet.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex review skipped: filtered diff is 73320 lines (cap: 5,000). Please consider splitting this into smaller PRs for reviewability.

@branarakic

Copy link
Copy Markdown
Contributor Author

Addressed all review feedback in 180a319:

  • 🔴 playwright/helpers node-count out of sync → playwright.config.ts now threads PLAYWRIGHT_DEVNET_NUM_NODES into the RUNNER process (it previously reached only the bootstrap/Vite subprocess), and the real-node.ts / devnet.ts fallbacks are bumped 4→6 so node5/6 (edges) are in-topology and EXPECTED_MIN_PEERS is correct.
  • 🟡 private-open only published from the creator → clarified: a non-curator private-CG member needs sender-key provisioning via the INVITE flow (a bare allowlist member breaks even the curator's publish), so this is the curator lifecycle and points to devnet-test-invite-flow.sh for member-publish instead of falsely claiming it.
  • 🟡 local-only "WM→SWM" never promoted → renamed S6/S7 to "WM write only" (unregistered CGs can't promote/publish; the name now matches what's asserted).
  • 🟡 all-oxigraph-server dropped other-backend coverage → added opt-in DEVNET_MIXED_BACKEND=1 (rotates oxigraph-server / oxigraph / oxigraph-worker / blazegraph / sparql-http); default stays the robust uniform fleet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant