Date: 2026-06-04
Scope: packages/api/src/{channel,conversation,conversation-subgroup,semantic-relationship,topic,conversation/util}.ts
Goal: identify which raw perspective.querySparql<T>() call sites can be
expressed via Ad4mModel, which can't, and what we'd need to add to
Ad4mModel to close the gap. Comparisons against AD4M PRs
#837, #842,
#846.
28 raw SPARQL call sites in production code (test mocks excluded), grouped by file. The shape, intent, and inputs/outputs are summarised here for reference; cross-references back to source lines preserved.
| # | Method | Lines | Shape | Intent |
|---|---|---|---|---|
| 1 | allItems() |
101–118 | ?channel ad4m:has_child ?id + reifier metadata + type filter + OPTIONAL property bag |
Channel content timeline (Message / Post / Task), with author/timestamp from reifier, content body from OPTIONAL property triples |
| 2 | unprocessedItems() query 1 |
158–185 | ?channel ad4m:has_child ?id + type filter |
All item IDs in channel (preparation for set-difference) |
| 3 | unprocessedItems() query 2 |
174–186 | ?sg flux:has_item ?id + type filter |
All item IDs that are in any conversation subgroup (set of processed) |
| 4 | unprocessedItems() query 3 |
198–216 | VALUES ?id { ... } + reifier metadata + OPTIONAL property bag |
Full data for unprocessed IDs only |
| 5 | totalItemCount() |
263–271 | COUNT(DISTINCT ?id) aggregate |
Cardinality of channel items |
| 6 | recentConversations() (static) |
294–306 | Channel + is_conversation + OPTIONAL conversation child |
List conversation channels (no reifier joins by design — was 60 s in earlier impl) |
| 7 | pinnedConversations() (static) |
357–370 | Channel + is_pinned = true + OPTIONAL conversation child |
Pinned conversation channels |
| 8 | (covered in #2) |
| # | Method | Lines | Shape | Intent |
|---|---|---|---|---|
| 9 | stats() (subgroups) |
60–74 | ?conv ad4m:has_child ?sg + flag |
Total subgroup count |
| 10 | stats() (participants) |
67–75 | ?conv flux:participant ?did |
Participant DIDs |
| 11 | topics() |
91–106 | SemanticRelationship → Topic, with UNION on ?expr = ?conv OR ?conv ad4m:has_child ?expr |
Topics for this conversation OR any of its subgroups |
| 12 | subgroupsData() first |
144–157 | ?conv ad4m:has_child ?id + reifier timestamp + OPTIONAL property bag |
Subgroup names/summaries/timestamps |
| 13 | subgroupsData() batch |
179–192 | VALUES ?sg { ... } + reifier-traversal to channel ancestor + OPTIONAL transcript start |
Per-subgroup item timestamps for sorting |
| # | Method | Lines | Shape | Intent |
|---|---|---|---|---|
| 14 | stats() (items) |
53–69 | ?sg flux:has_item ?item + FILTER IN on type |
Total item count |
| 15 | stats() (participants) |
62–70 | ?sg flux:participant ?did |
Participant DIDs |
| 16 | topics() |
86–96 | SemanticRelationship → Topic | Topic list for this subgroup |
| 17 | itemsData() |
126–148 | ?sg flux:has_item ?id + reifier timestamp + reifier author + OPTIONAL property bag + OPTIONAL channel ancestor reifier |
Subgroup item timeline with author/timestamp/body/title |
| 18 | topicsWithRelevance() |
232–243 | SemanticRelationship → Topic + has_relevance |
Topic list with per-SR relevance score |
| # | Method | Lines | Shape | Intent |
|---|---|---|---|---|
| 19 | itemEmbedding(itemId) |
27–38 | ?sr has_expression itemId + ?sr has_tag ?embed + ?embed flux:embedding ?vec + LIMIT 1 |
Resolve embedding URL for an item |
| 20 | allConversationEmbeddings() |
51–65 | All ?conv = Conversation + Channel-via-has_child ancestor + SR + Embedding (4-way join) |
Synergy embedding corpus for conversations |
| 21 | allSubgroupEmbeddings() |
87–103 | All ?sg = Subgroup + Conversation parent + Channel grandparent + SR + Embedding (5-way join) |
Synergy embedding corpus for subgroups |
| 22 | allItemEmbeddings() |
125–140 | All Message/Post/Task + Channel ancestor + SR + Embedding (4-way join) | Synergy embedding corpus across all item types |
| 23 | allItemEmbeddingsByType(type) |
176–190 | Same as 22 but single specific type | Per-type variant |
| # | Method | Lines | Shape | Intent |
|---|---|---|---|---|
| 24 | linkedConversations() |
21–35 | Topic → reverse SR → Subgroup → reverse has_child → Conversation → reverse has_child → Channel |
All Conversations linked to this Topic, with channel context |
| 25 | linkedSubgroups() |
60–74 | Same, but returns Subgroup not Conversation | Same as 24 at one less hop |
| # | Method | Lines | Shape | Intent |
|---|---|---|---|---|
| 26 | findEmbeddingSRId(itemId) |
12–21 | SR by has_expression = itemId AND has_tag.entry_type = has_embedding + LIMIT 1 |
Find SR ID for an item's embedding (cleanup path) |
(28 total = 26 unique sites + 2 collapsed under #2/#3 in the table; the
unprocessedItems() chain reuses query 1 inside query 2's filtering.)
Verified from the model classes in packages/api/src/ and the AD4M
@coasys/ad4m SDK:
@Flag({ through, value })— equality test on a "tag" predicate (e.g.entry_type = flux://has_channel). Discriminates entity types.@Property({ through })— scalar property via a predicate. Stored as a literal-encoded link target.@HasMany(() => Class)/@HasMany({ through })— relation that resolves to an array of related instances or raw IRIs.findAll(perspective, { where, include, limit, offset, order })— query for instances matchingwhereclauses, optionally eager-loading relations viainclude, with pagination + ordering.where: { property: value }/where: { property: [v1, v2] }— String equality, StringArray (IN), Number, Bool, Ops (gt/lt/between/contains/not), NumberArray.include: { relation: true }— eager-load named relations.include: { relation: { properties, where, include, limit, order } }— deep-include with per-relation filters and projections.projections: { $key: { from, where, count, limit, target_class_name } }—$key-prefixed lightweight relation aggregations.parent: { model, id }/parent: { id, predicate }— scope query results to children of a specific parent instance.save(batchId)/delete()— CRUD with batch coordination.
- Reverse relations on
HasMany/HasOneat decoration time — i.e. declaring "find my parent Channel viaad4m:has_childdirection=reverse". Partial:direction: 'reverse'exists for@HasManybut it requires the parent class to be expressible up-front. There is no@BelongsTo()-style parent decorator. - Cross-class WHERE conditions joining two unrelated models — e.g.
"find all
Embeddinginstances whose ID is the tag of some SemanticRelationship whose expression is this Conversation". This is what the Synergy queries do via multi-hop SPARQL; Ad4mModel currently has no pattern for that other than two separatefindAllcalls glued in JS. - Multi-level
includechains —include: { rel1: { include: { rel2: true } } }is supported, but the relation graph has to be declared on both ends and the predicates have to match the storage model exactly. Where Flux's data storage uses one-off predicates with implicit hops, the decorator path doesn't capture it. UNIONof two query shapes against the same target type — e.g. Conversation'stopics()does{ ?expr = ?conv } UNION { ?conv has_child ?expr }to grab topics belonging directly to the conversation OR to any of its subgroups in one query.- Reifier metadata as queryable fields —
author/timestampof a specific link (not the entity). Ad4mModel exposescreatedAt,updatedAt, andauthorsynthesised across an instance's reifiers during hydration, but you cannot ask "give me the timestamp of THIS specific link" through Ad4mModel's where clause. - Set-difference / NOT EXISTS — the
unprocessedItemspattern.
Sites where the query is a single-class lookup with a simple where clause.
Direct findAll mapping. Note: the existing code still wins on raw-SPARQL
because the model-query builder adds conformance joins this query doesn't
need. Whether to convert is a maintainability/perf trade-off.
| # | Method | Convert to |
|---|---|---|
| 5 | Channel.totalItemCount() |
findAll(Message) + findAll(Post) + findAll(Task) w/ parent: { model: Channel, id }, sum lengths. Or findAll(Channel, { id: this.id, include: { messages: { count: true }, posts: { count: true }, tasks: { count: true } } }). |
| 6 | Channel.recentConversations() |
findAll(Channel, { where: { isConversation: true }, include: { conversations: true } }) + Rust hydration synthesises updatedAt. |
| 7 | Channel.pinnedConversations() |
findAll(Channel, { where: { isPinned: true }, include: { conversations: true } }). |
| 10 | Conversation.stats() participants |
findAll(Conversation, { id, properties: ['participants'] }) then instance.participants. |
| 15 | Subgroup.stats() participants |
Same shape as 10. |
Sites that join 2–4 model classes via existing forward relations. Convertible
if the model classes get a @HasMany/@HasOne declaring the reverse direction.
| # | Method | What's needed |
|---|---|---|
| 9 | Conversation.stats() subgroups |
findAll(Subgroup, { parent: { model: Conversation, id } }). Today: already supported via subgroups() method on the model. The SPARQL is redundant. |
| 14 | Subgroup.stats() items |
findAll([Message, Post, Task], { parent: { model: Subgroup, id, predicate: 'flux://has_item' } }). Needs multi-class polymorphic findAll. |
| 16 | Subgroup.topics() |
findAll(SemanticRelationship, { where: { expression: this.id }, include: { tag: true } }) → filter where tag.entry_type = has_topic. Needs **tag decorated as `@HasOne(() => Topic |
| 18 | Subgroup.topicsWithRelevance() |
Same as 16 with relevance property already on SR. |
| 11 | Conversation.topics() |
Same as 16 but with the UNION; can be expressed as two findAll calls in JS, dedup. Or convert to a single SPARQL query (no good Ad4mModel shape today). |
| 19 | SemanticRelationship.itemEmbedding(id) |
findAll(SR, { where: { expression: id }, include: { tag: true }, limit: 1 }). Needs tag as @HasOne(Embedding). |
| 20 | SR.allConversationEmbeddings() |
findAll(Conversation, { include: { /* parent channel */, /* incoming SR */ : { include: { tag: { properties: ['embedding'] } } } } }). Needs incoming-relation declarations (reverse has_expression). |
| 21 | SR.allSubgroupEmbeddings() |
Same shape with one more parent hop. |
| 22 | SR.allItemEmbeddings() |
Same as 20 but across three item classes. Needs multi-class polymorphic findAll or three separate calls. |
| 23 | SR.allItemEmbeddingsByType(type) |
Single-class variant of 22. Convertible with current Ad4mModel + the tag decoration upgrade. |
Sites where the SPARQL is doing reifier-metadata reads (?_reifier
ad4m:ontology/timestamp / author). Ad4mModel already synthesises
createdAt/updatedAt/author per instance during hydration — but only
once per instance, not per individual link.
| # | Method | Why raw SPARQL is correct |
|---|---|---|
| 1 | Channel.allItems() |
Wants timestamp of the has_child link, not of the message entity. The link timestamp is when the message was added to the channel, which differs from when the message entity was created (e.g. message edited after add). Ad4mModel's hydrated createdAt refers to the entity, not the link. |
| 4 | Channel.unprocessedItems() data fetch |
Same as 1. |
| 12 | Conversation.subgroupsData() first |
Same: wants ?conv has_child ?sg link timestamp. |
| 17 | Subgroup.itemsData() |
Joins reifier-on-has_item AND reifier-on-entry_type to extract author at type-tag time. Even more complex link-level semantics. |
Potential Ad4mModel feature: include: { rel: { meta: ['timestamp', 'author'] } }
— eager-load per-link reifier metadata as a sidecar on each related instance.
| # | Method | Convertibility |
|---|---|---|
| 2 + 3 | Channel.unprocessedItems() set-difference |
Best left as SPARQL. FILTER NOT EXISTS in Oxigraph was 60 s — the code already migrated to the set-difference workaround. Ad4mModel doesn't support either pattern natively. Adding where: { NOT: { … } } could work but the underlying SPARQL would have the same planner cliff (until named graphs from #812 land). |
Sites that join entities via a predicate that's not declared as a relation on the model class.
| # | Method | Why |
|---|---|---|
| 13 | Conversation.subgroupsData() batch (Subgroup → channel ancestor) |
Joins subgroup → its grandparent Channel via two ad4m:has_child hops. Ad4mModel models this as ascending parents, which the current decorator API can't express. |
| 24 | Topic.linkedConversations() |
Topic → reverse SR → reverse has_child chain to Conversation and Channel. Bidirectional traversal through multiple relations not declared on Topic. |
| 25 | Topic.linkedSubgroups() |
Same. |
| 26 | findEmbeddingSRId(itemId) |
SR.tag must dereference to an Embedding instance + filter on its entry_type. Today returns SR ID only; the tag-as-relation upgrade would let findAll(SR, { where: { expression: id, tag: { type: 'flux://has_embedding' } } }). Nested-where on a relation is a missing capability. |
Inferred from the gaps above, ordered by how many call sites each unlocks:
Currently SemanticRelationship.tag is @Property(string) storing a raw IRI.
Upgrading to @HasOne(() => Embedding | Topic, { through: 'flux://has_tag' })
with type discrimination on entry_type would let every embedding/topic
traversal in semantic-relationship/, topic/, conversation/, and
conversation-subgroup/ flow through include: { tag: true }.
This is mechanical and small. Highest-leverage Ad4mModel addition.
Today's @HasMany({ direction: 'reverse' }) works but requires the parent
class to be expressed in the decorator. A cleaner story would be:
@BelongsTo(() => Conversation, { through: 'ad4m://has_child' })
parentConversation: Conversation;Then queries like Subgroup.findAll({ include: { parentConversation: { include: { parentChannel: true } } } }) become natural.
findAll([Message, Post, Task], { parent: { model: Subgroup, ... } })Today you have to enumerate three calls and union the results client-side. The Ad4mModel runtime knows enough about SHACL shapes to dispatch this in one SPARQL execution.
Right now Channel.allItems() and related sites want the link author and
timestamp, not the entity author and timestamp. Adding a meta: projection
on the include relation would replace the reifier-walking SPARQL.
findAll(SR, { where: { expression: id, tag: { type: 'flux://has_embedding' } } })Probably not worth a first-class API. The Conversation.topics() UNION pattern
can be rewritten as two findAlls + JS dedup at the cost of one extra RTT.
Recommended permanent exemptions:
-
Channel.unprocessedItems()set-difference (sites 2+3) — theFILTER NOT EXISTSplanner cliff is documented in theac57680b9warning. The current set-difference workaround (3 SPARQL queries + JS set) is the right shape for this. Ad4mModelwhere: { NOT: { … } }would degrade to the sameFILTER NOT EXISTSplan. -
Conversation.subgroupsData()batch timestamp lookup (site 13) — the two-hop ascendant walk to find a subgroup's channel is genuinely model- shape-bending. Until the SHACL DSL gets bidirectional path support, leaving this as a single targeted SPARQL is simpler than the equivalent Ad4mModel composition. -
The reifier-timestamp queries (sites 1, 4, 12, 17) — if the per-link
meta:sidecar is not added.
The investigation deliberately stopped short of running a benchmark suite on dev. Expected behaviour based on what we know about the planner + hydration paths:
-
Trivially-convertible sites (Category A) likely come out slightly worse in Ad4mModel because the model-query builder pays for SHACL shape resolution + conformance joins that the targeted SPARQL skips. The trade-off is type safety and one fewer place to maintain. Recommendation: micro-bench any conversion before committing.
-
Multi-hop convertible sites (Category B) likely come out better in Ad4mModel because the batched-
includepath is one round-trip with the hydration done in Rust, whereas the current pattern is "raw SPARQL + per-rowgetExpression()calls in JS" (visible insemantic-relationship/index.tslines 41 / 69 / 107 / 150 / 194). That's a textbook N+1 already, and the deep include eliminates it. -
Reifier-metadata sites (Category C) are status-quo SPARQL. Without the
meta:projection feature, no conversion is worth attempting. -
Set-difference (Category D) stays SPARQL.
Concrete bench plan for follow-up:
- Build a Node.js harness that runs each query both ways against a seeded executor (e.g. Channel with N=10/100/1000 messages, M=2/20/200 conversations).
- Measure wall-clock + round-trip count for each pair.
- Tabulate.
- Recommend per-site keep-as-SPARQL vs convert-to-Ad4mModel based on the data.
This benchmarking is out of scope for the inventory phase and is the natural next deliverable on this branch.
- Stage 1 (this PR): inventory + analysis (this document). No code changes.
- Stage 2: add the
tag-as-relation upgrade toSemanticRelationship(touchespackages/api/src/semantic-relationship/index.tsand any callers that read.tagas a string). One PR, mechanical, no Ad4mModel-side changes required (uses existing@HasOne). - Stage 3: Ad4mModel-side:
@BelongsTo()decorator for clean reverse relations. AD4M PR. - Stage 4: convert Category B sites that benefit from
include: { tag: true }. - Stage 5: benchmark suite against
devto validate each conversion. - Stage 6: decide on per-link reifier
meta:sidecar based on whether Category C sites are visibly slow in real Flux usage.
While starting Stage 2, verified that @coasys/ad4m's core/src/model/decorators.ts already exports HasOne, BelongsToOne, and BelongsToMany, with where + filter options on every relation. This significantly re-scopes the recommendation table — three of the six items I had marked as needing AD4M SDK work are actually feasible flux-side:
| # | Recommendation | Original assumption | Re-checked status |
|---|---|---|---|
| 1 | tag as typed @HasOne(Embedding | Topic) |
flux-only | ✅ flux-only, confirmed |
| 2 | @BelongsTo() / first-class reverse relations |
needs AD4M SDK PR | ✅ already in AD4M as @BelongsToOne / @BelongsToMany — flux-only |
| 3 | Multi-class polymorphic findAll |
needs AD4M SDK PR | ❌ needs AD4M (target is () => Ad4mModelLike, a single class) |
| 4 | Per-link reifier metadata sidecar | needs AD4M SDK PR | ❌ needs AD4M (no meta: projection on include) |
| 5 | Nested where on relations |
needs AD4M SDK PR | ✅ already in AD4M — RelationOptions.where is wired into @HasOne/@HasMany/@BelongsTo* |
| 6 | UNION across query shapes | maybe AD4M | ❌ workaround via two findAlls + JS dedup |
Implemented: SemanticRelationship.tag upgrade with two same-predicate @HasOne relations:
@HasOne(() => Embedding, { through: 'flux://has_tag' })
embeddingTag?: Embedding;
@HasOne(() => Topic, { through: 'flux://has_tag' })
topicTag?: Topic;The conformance filter on each target class's @Flag discriminates at hydration time — only Embedding instances bind to embeddingTag, only Topic instances bind to topicTag. The pre-existing tag: string @Property is kept for back-compat (callers that want the raw IRI).
Demonstrator conversion: SemanticRelationship.itemEmbeddingViaModel(itemId) shows the converted shape side-by-side with the original raw-SPARQL itemEmbedding(itemId). Behavioural parity caveat is documented in the method's TSDoc: the model variant returns the embedding-vector URL the same way the SPARQL variant does, then both call perspective.getExpression() for the actual vector — the model-query layer does not yet inline-resolve resolveLanguage properties on @HasOne-loaded instances.
scripts/bench-sparql-vs-ad4m.ts checked in as a documented skeleton: connection helper + timeIt(label, fn, runs) + the bench-case enumeration. Seed + connection are stubs — implementing them requires (a) a multi-user-mode executor running locally, (b) a JWT for that executor, (c) seed code that creates ~10 model classes' worth of related instances at scale. Estimated 200 LOC of additional work to make runnable. Tracked as Stage 5.
The benchmark depends on a running executor with the Flux subject classes registered + a sizeable seeded perspective. The wind-tunnel scenarios in coasys/ad4m-wind-tunnel are a heavier alternative (they would need to cross-import flux's @coasys/flux-api, which they currently don't). Three options for getting to numbers, in increasing order of work:
- Manual bench: spin a local executor, seed via a one-off script, run the bench harness above. ~1 hour wall clock per scale point.
- Vitest-based integration test in flux: extend
packages/api/src/conversation/conversation.test.ts-style infrastructure to boot a real executor. ~half-day of test-infra plumbing. - New wind-tunnel scenario (s11) that cross-imports flux-api: pleasant for repeat comparisons, but requires resolving the cross-repo dep + making the wind tunnel reproducibly drive an Ad4mModel-aware path. ~1-2 days.
This PR leaves it at option 1 documented; the harness skeleton + the converted itemEmbeddingViaModel are enough to make the bench a copy-paste-and-run exercise once the seed is in place.
- Stage 3 (next commit): add
@BelongsToOne/@BelongsToManydecorators to Channel, Conversation, Subgroup, Topic models for the reverse traversals that Synergy queries currently express via SPARQL. Unlocks 8 sites. - Stage 4: write
findAll-shaped variants ofallConversationEmbeddings/allSubgroupEmbeddings/allItemEmbeddings/linkedConversationsusingembeddingTag/topicTag+ the new BelongsTo declarations. - Stage 5: flesh out the bench harness seed; run; record numbers per converted method; update this section with the table.
- Stage 6 (separate AD4M PR): polymorphic
findAll+ per-link reifiermeta:projection — unlocks the remaining sites.
If you only have 10 minutes:
- Read this implementation log section to see what's actually in the branch.
- Skim
packages/api/src/semantic-relationship/index.tsfor the @HasOne upgrade and the*ViaModeldemonstrator. - The categorisation table above is the load-bearing decision artifact — challenge it.
Lives in the AD4M Wind Tunnel as scenario s16-sparql-vs-model
(ad4m-wind-tunnel/src/scenarios/s16-sparql-vs-model.ts) — not an ad hoc script. The scenario seeds a Flux-shaped graph (channel → messages with body/author/timestamp; embeddings; semantic-relationship reifiers linking each message to an embedding; topics tagging some messages), registers SHACL subject classes inline (Message / Embedding / Topic / SemanticRelationship), and for each candidate query times raw querySparql against the equivalent perspective.modelQuery call back-to-back on the same perspective.
Reproduce:
cd ad4m-wind-tunnel
./run.sh --branch dev --scenario s16 \
--executor-path /path/to/ad4m/target/release/ad4m-executor
# Results land in results/dev/s16-sparql-vs-model.json.
# S16_RUNS=N overrides per-case runs (default 10).Correction. A first v1 of S16 reported "include doesn't fire" and 14–150× ratios — those numbers are now superseded. The SHACL JSON the scenario sent to the executor put
@HasOnerelations in a separate top-levelrelations: []array, which Rust'sSHACLShapedeserializer silently dropped, so the relation was never registered andresolve_includes_recursivehad nothing to do. Once relations are emitted insideproperties:withrelation_kind: "hasOne"(the canonical form from@coasys/ad4m'sSHACLShape.toJSON()), include fires and the ratios collapse 3–10×. Both the old and new numbers are kept below for the review trail; treat the post-fix numbers as the ground truth.
Results below are 10 runs/case (+ 1 warm-up each), Apple Silicon (48 GB / 14 CPU), against dev (1f29d0b17 fix(ci): clear stale bootstrap-language build cache before rebuild).
| Case | raw SPARQL avg | modelQuery avg |
ratio | (was, pre-fix) |
|---|---|---|---|---|
sr_by_expression_limit1 (1-row, WHERE expression=… + LIMIT 1) |
0.23 ms | 0.56 ms | 2.4× | 14.2× |
sr_by_expression_with_include (same + include: { embeddingTag }) |
0.22 ms | 0.69 ms | 3.1× | 15.0× |
sr_all (scan all SRs, no where) |
0.62 ms | 5.25 ms | 8.5× | 21.5× |
embeddings_all (scan all embeddings) |
0.41 ms | 3.57 ms | 8.8× | 15.5× |
topics_all (scan all topics — smallest set) |
0.17 ms | 0.95 ms | 5.7× | 31.1× |
| Case | raw SPARQL avg | modelQuery avg |
ratio | (was, pre-fix) |
|---|---|---|---|---|
sr_by_expression_limit1 |
0.51 ms | 2.58 ms | 5.0× | 56.0× |
sr_by_expression_with_include |
0.55 ms | 2.99 ms | 5.5× | 55.5× |
sr_all |
7.29 ms | 68.25 ms | 9.4× | 20.5× |
embeddings_all |
4.09 ms | 43.29 ms | 10.6× | 18.4× |
topics_all |
0.28 ms | 7.08 ms | 25.0× | 150.7× |
Re-ran S16 against an executor patched with MODEL_QUERY_PROFILE=1 instrumentation that emits per-phase wall-clock timings + the literal SPARQL strings (patch lives at query.rs in a throwaway temp clone — not part of any PR; intended to graduate into a tracing span scaffolding follow-up). The patch times each step in execute_model_query_inner: SPARQL build, two-phase pagination subquery exec, properties subquery exec, count exec, hydration, language transforms, getters, recursive includes.
sr_by_expression_with_include (model 2.99 ms end-to-end; raw 0.55 ms):
| phase | ms | what |
|---|---|---|
build_instance_sparql |
0.002 | string concatenation |
twophase-pagination-exec |
0.164 | SELECT ?source ?_first_ts … ORDER BY ?_first_ts LIMIT 1 |
twophase-properties-exec |
0.101 | VALUES ?source { … } VALUES ?predicate { 4 props } … + reifier metadata (3 rows) |
count-exec |
0.118 | SELECT COUNT(DISTINCT ?source) — fired unconditionally when pagination is on |
| Rust orchestration (group + hydrate + lang transforms) | ~0.01 | trivial |
include sub-query (Embedding @d1, single-instance + recursive overhead) |
0.741 | 2-row SPARQL + resolve_includes_recursive |
sum of model_query work |
~1.13 | |
| RPC roundtrip + JSON marshalling | ~1.86 | WS frame, capability check, perspective lookup, outer JSON serialize |
sr_all (model 68.25 ms; raw 7.29 ms):
| phase | ms | what |
|---|---|---|
single-instance-exec |
~65 | One big SPARQL fetching 3090 rows (1030 SRs × 3 properties). Per-row work is ~3.4× heavier than raw because the reifier-metadata pattern adds 3 triple-pattern matches per result row (rdf:reifies + author + timestamp). |
| Rust orchestration (group + hydrate) | ~1.5 | |
| RPC + marshal | ~1.7 |
The SPARQL emitted by build_instance_sparql for the Single-plan branch always selects ?source ?predicate ?target ?author ?timestamp and joins each property row against its RDF 1.2 reifier metadata — unconditionally, so that hydration can compute author / createdAt / updatedAt and apply last-write-wins. The join adds 3–5× work per row over the raw SPARQL equivalent that selects only ?source ?value.
The current upstream stack (#837, #842, #846) pushes WHERE conditions into SPARQL — that has expanded the envelope substantially for filter pushdown. But the remaining ratios are not about WHERE evaluation; they're about three things the model-query orchestrator does unconditionally that the caller may not need. All three are reachable from the same family of PRs:
- Make the reifier-metadata join opt-in. S16 shows it accounts for ~3.4× the per-row SPARQL cost on scan-all queries. Gate behind a
with_metadata: bool(or "include keys" intersection with{author, createdAt, updatedAt, timestamp}) onModelQueryInput. When omitted, emit:Drops theSELECT ?source ?predicate ?target WHERE { conformance + where + ?source ?predicate ?target . }
?_reifier reifies + author + timestamptriples entirely. Expected impact: ~10× → ~2× on scan-all queries. - Skip the COUNT query unless the caller reads
total_count. Currently fired any timesparql_pagination.is_some(). ~0.12 ms per call on medium today (4–6 % ofsr_by_expression_limit1's budget) but unbounded as the perspective grows. Same shape as the aggregate work in #846's scaffoldedbuild_aggregate_sparql— wirecountto fire only when requested. - Collapse two-phase pagination into a single plan when the
WHEREfilter is already selective enough. Forsr_by_expression_limit1, the where clause restricts to exactly 1 row before ORDER BY — there's nothing for the timestamp probe to sort over, so phase 1 is wasted. Heuristic: ifWHEREincludes an equality on a flag/unique property, skip the timestamp probe and emit a Single plan with the property-filterVALUESclause. Saves ~0.16 ms per call (~6 % of the single-row case).
None of these are SPARQL-language extensions — they're orchestrator changes that reduce SPARQL work. Natural follow-up PR to #846.
Given the post-fix data:
| Original rank | Reality | What it actually means |
|---|---|---|
#1 tag as @HasOne polymorphic |
Works with canonical SHACL emission | Update flux's SemanticRelationship decorators to emit the canonical form — the executor's polymorphic-on-same-predicate path is fine |
#2 @BelongsTo |
Decorators exist; runtime behaviour unverified at scale | Bench before relying on it for any conversion (S16 follow-up case) |
#3 Polymorphic findAll |
Independent need | Reaffirmed |
#4 Per-link reifier meta: |
Confirmed AD4M-side need | Reaffirmed |
#5 Nested where on relations |
Decorator option exists; runtime not benched | Same caveat as #2 |
| #6 UNION across queries | Not blocking | Same |
Per-site verdict:
| Category | Original verdict | Bench-grounded verdict (post-fix) |
|---|---|---|
| A. Trivially convertible (5 sites) | "Slight perf regression, trade-off for type safety" | Modest perf regression (5–10×). Acceptable in isolation; problematic at scale. Worth converting if the three model_query orchestrator fixes above land first. |
| B. Convertible with new features (10 sites) | "Likely a perf win because it collapses N+1" | Plausibly a wash or win once the reifier-metadata join is opt-in. Need bench cases that drive multi-row hydration + @BelongsTo traversal (next S16 iteration). |
| C. Reifier-metadata reads (4 sites) | "Keep as SPARQL" | Reaffirmed |
| D. Set-difference (2 sites) | "Keep as SPARQL" | Reaffirmed |
| E. Inter-class joins (4 sites) | "Mixed" | Lean toward SPARQL until the orchestrator fixes are wired |
Bottom line for this PR's stated goal — "convert flux raw SPARQL to Ad4mModel where possible": the bench data argues against most conversions until the AD4M-side model_query layer's per-instance overhead is brought down. The right work isn't migrating call sites in flux — it's investigating why findAll is 14-150× slower than raw SPARQL even for a single-row lookup, and fixing it in coasys/ad4m. S16 will land as a regression gate against that work: any future model_query change can re-run it and watch the ratios collapse toward 1×.
The remaining 5–25× gap (and the orchestrator overhead generally) comes from a fan-out pattern: one perspective.modelQuery RPC dispatches 1 + N + M + K SPARQL queries through the same SparqlEvaluator, with most of the tree-shaping work happening between queries in Rust rather than inside SPARQL. This section enumerates every fan-out site, explains why each one exists, and proposes how it could collapse into either (a) a single SPARQL query or (b) a streaming subgraph extraction.
(All counts at dev@1f29d0b17. "Why separate" = the reason it isn't already fused into the main instance query.)
| # | Phase | Site | Fires when | Cost in S16 | Why separate today |
|---|---|---|---|---|---|
| 1 | Shape resolution | shape.rs:61, 116, 327, 340 |
First-ever query for a class in this perspective | cold-miss only | Shape is cached Arc<ModelShape> per (perspective, class); queries run before the main query because the SPARQL builder needs the shape. |
| 2 | Main instance — Single plan | query.rs:187 |
No limit/offset |
65 ms (sr_all med) |
The main query. Where the bulk of work happens. |
| 3 | Main instance — TwoPhase phase 1 (pagination) | query.rs:203 |
limit/offset set |
0.16 ms (sr_by_expr_limit1 med) |
Need an ORDER BY ?_first_ts so the limit cuts the right rows. The timestamp probe joins reifier metadata; can't be combined with phase 2 because phase 2's VALUES ?source is driven by phase 1's ?source bindings. |
| 4 | Main instance — TwoPhase phase 2 (properties) | query.rs:236 |
After phase 1 returns ≥1 source | 0.10 ms | Same reason — VALUES ?source { … } is the dynamic bridge between the two phases. |
| 5 | Total count | query.rs:109 (fast path) / query.rs:290 |
limit==0 OR sparql_pagination.is_some() |
0.12 ms | COUNT(DISTINCT ?source) needs aggregation; the planner can't fold it into a SELECT that also returns rows without grouping artefacts. Fires unconditionally whenever a limit is set, even if the caller never reads total_count. |
| 6 | Reverse relations (@BelongsTo) |
relations.rs:69 |
shape has reverse-direction properties | varies per relation | Each reverse predicate runs its own batched VALUES ?target { … } ?source <pred> ?target. Could be fused via UNION but the planner pays for the extra branches. |
| 7 | Include sub-query (forward) | recursive execute_model_query_inner via relations.rs:200 |
include: { rel: … } |
0.74 ms (@d1 med) |
Forward includes call the whole pipeline recursively on the target class with where: { id: [collected target IRIs] }. Each level of nesting fires its own 1–4 queries. |
| 8 | Reverse include lookup | relations.rs:297 |
include: { reverseRel: … } |
n/a in S16 (no @BelongsTo) |
One ?source <pred> ?target lookup to find the source IRIs, then a recursive execute_model_query_inner on those sources. Doubles the round-trips of forward includes. |
| 9 | ASK getters | getters.rs:226 |
shape has properties with ASK { … } getters |
per-property | Each getter expression is translated to a batched SELECT with VALUES ?source { … }. Could lift into the main query as BIND(EXISTS { … } AS ?<name>) but the executor never tries. |
| 10 | SELECT getters | getters.rs:255 |
shape has properties with SELECT { … } getters |
per-property | Each one fires its own batched SELECT. Lifting into the main query would need careful subquery composition. |
| 11 | Relation where_filter |
getters.rs:403 |
shape relation has where_filter |
per filter predicate | For each predicate in the filter, one batched SELECT ?source ?val WHERE { VALUES ?source { … } ?source <pred> ?val } — then Rust matches per-target. N filter predicates → N round-trips. |
| 12 | Projection (count) |
projection.rs:115 |
projections: { $foo: { count: true, … } } |
per projection | One SELECT ?parent (COUNT(DISTINCT ?t) AS ?n) GROUP BY ?parent. |
| 13 | Projection (list) |
projection.rs:159 |
projections: { $foo: { count: false, … } } |
per projection | One SELECT ?parent ?t WHERE { … } ORDER BY … LIMIT … per projection. If target_class_name is set, also recurses into execute_model_query_inner. |
Plus one non-SPARQL fan-out:
| # | Phase | Site | Fires when |
|---|---|---|---|
| 14 | resolveLanguage transforms |
query.rs:412 (resolve_language_transforms) |
shape has properties with resolve_language set |
Total round-trip count for a non-trivial findAll:
- Cold first call: 1 shape query + 1–3 main + 1 count + R reverse + I include sub-queries + G getters + F filter predicates + P projections
- Warm: same minus the shape query
- For a query that hydrates 1 SR via
include: { embeddingTag: true }ondevtoday: shape (warm cache) + 2 main (TwoPhase) + 1 count + 1 nested include (Embedding) = 4 SPARQL round-trips. - For a query like
Conversation.findAll({ include: { subgroups: { include: { items: true, $topicCount: { count: true } } } } }): ~10–15 round-trips per outer call.
This is the real reason model_query ratios don't collapse all the way to 1×. The SPARQL inside each query is fast; the fan-out is what costs.
Going site-by-site:
Unconditional join in the main instance query for author + timestamp + rdf:reifies triple. Cost: ~3.4× per-row SPARQL overhead on scan-all queries. Fix: gate on with_metadata: bool in ModelQueryInput. Easy, ~50 LOC PR.
Even when the caller doesn't use total_count, query.rs:290 runs a separate SELECT (COUNT(DISTINCT ?source) AS ?cnt) …. Currently gated only on sparql_pagination.is_some(). Fix: thread a count: bool flag through ModelQueryInput and skip the query unless it's truthy or the caller explicitly asks for total_count. Easy, ~30 LOC PR.
sr_by_expression_limit1 has where: { expression: id } which restricts to exactly one row. The TwoPhase plan still emits ORDER BY ?_first_ts LIMIT 1 over a reifier-metadata-joined subquery — wasted work because there's nothing to sort. Fix: heuristic — when WHERE includes equality on a unique property (id, base, flag-target), skip the timestamp probe and emit Single with the equality VALUES. Medium, ~80 LOC PR with a new test.
A model with multiple @BelongsTo relations fires one batched lookup per reverse predicate. These can fuse into a single SPARQL with one ?source ?p ?target row per matched edge:
SELECT ?target ?predicate ?source WHERE {
VALUES ?target { … instance IRIs … }
VALUES ?predicate { <pred1> <pred2> … }
?source ?predicate ?target .
}Then Rust splits by ?predicate post-hoc. Saves R-1 round-trips for shapes with R reverse predicates. Easy, ~60 LOC PR.
This is the structurally interesting one. Today include: { embeddingTag: true } causes a full pipeline recursion on the target class — meaning the include's own SPARQL queries (main + count + maybe its own includes) fire as a separate fan-out. The recursion is what makes deep includes (include: { a: { include: { b: { include: { c: true } } } } }) blow up.
Two paths to fix:
a) Lift the include into the main query. Replace ?source ?predicate ?target (returning IRIs) with a wider main query that also drags in target properties:
SELECT ?source ?predicate ?target ?author ?timestamp
?target_predicate ?target_value WHERE {
# … conformance + where + property fetch as today …
OPTIONAL {
?target ?target_predicate ?target_value .
VALUES ?target_predicate { … target's predicates … }
}
}Then group + hydrate the target in the same pass. Works for shallow (depth-1) includes. Saves 1 SPARQL per included relation per level.
b) Use SPARQL CONSTRUCT to return the entire subgraph in one query, then re-shape the resulting triples into a JSON tree in Rust:
CONSTRUCT {
?source ?p ?o .
?source <ad4m:include/tag> ?tag .
?tag ?tp ?to .
} WHERE {
# main conformance + where + property fetch + include traversal
}The CONSTRUCT returns a Graph (subset of triples); a generic subgraph → tree algorithm walks the shape and lifts it to JSON. Works for arbitrary depth. Single SPARQL round-trip regardless of include depth. This is the elegant pipeline endpoint — see "What the perfectly elegant pipeline looks like" below.
Today each getter — ASK { … } or SELECT { … } — fires its own batched-VALUES query. The transformation that's actually wanted:
ASK { ?source <flag-pred> <flag-value> }getter →BIND(EXISTS { ?source <flag-pred> <flag-value> } AS ?<getterName>)inside the main SELECTSELECT ?value WHERE { ?source <pred> ?value }getter →OPTIONAL { ?source <pred> ?<getterName> }(or a subquery if the getter is multi-row)
Folding M getters into the main SELECT saves M round-trips. Medium-effort PR (need a getter→SPARQL-fragment compiler). Open question: does Oxigraph's planner cope well with many BIND/EXISTS clauses? Worth benching before committing.
getters.rs:apply_where_filter_to_relation is a textbook N+1 case: for each predicate in where_filter, fetch target's value, then filter targets in Rust. The SPARQL equivalent already exists — just push the filter clauses into the original include's WHERE block:
?source <relPred> ?target .
?target <filterPred1> ?v1 . FILTER(?v1 = "X") .
?target <filterPred2> ?v2 . FILTER(?v2 > 5) .Easy, ~100 LOC PR. Removes the entire apply_where_filter_to_relation helper.
Each projection key fires its own grouped SPARQL. SPARQL 1.1 supports subqueries with their own ORDER BY + LIMIT, so a projection can fold in as:
SELECT ?source ?topicCount WHERE {
# main conformance + where …
{
SELECT ?source (COUNT(DISTINCT ?t) AS ?topicCount) WHERE {
?source <topicPred> ?t .
} GROUP BY ?source
}
}Saves P round-trips for queries with P projections. Medium PR.
This calls LanguageController.get_expression(lang, expr_addr) which dispatches a Holochain RPC to fetch expression data from outside the perspective. The data doesn't live in the RDF store; it lives in the language's Holochain cell. No SPARQL extension can reach it.
But the orchestration is fixable:
- Today the implementation is sequential per-instance per-property (
query.rs:432–438walks instances in aforloop, awaits eachcontroller.get_expression(...)call). - Could be batched: collect all (lang, expr_addr) pairs across all instances, fire them in parallel via
futures::join_allortokio::spawn-fan-out, then map results back. - For repeated lookups in the same query, deduplicate by expression URL first.
This is the only correct "Rust orchestration" cost. Even there, parallelism would save 5–50× on workloads with many resolveLanguage properties.
Used when all_where_pushable returns false. The remaining cases — after #842 / #846 — are: Ops conditions on getter-derived properties, and conditions on collection counts. The first can be pushed once getters are inlined (above). The second is a HAVING clause on a GROUP BY ?source.
The pagination plan only pushes the first sort key to SPARQL. Multi-key sort happens in Rust. SPARQL supports ORDER BY key1 ASC, key2 DESC natively — the limit is the build_query_patterns builder, not the language. Easy PR.
Quick audit of the six prioritised additions vs current state and what new evidence S16 surfaces:
| Rank | Recommendation | Status now | What S16 / profile data adds |
|---|---|---|---|
| #1 | tag as typed @HasOne polymorphic |
Works at the executor level (s16 confirmed include fires for two @HasOne on the same predicate, conformance-discriminated). Open in flux: emit canonical SHACL in SemanticRelationship. |
False alarm in v1 — the runtime path was always there; only flux's decorator emission was wrong (or wrong in the s16 mirror). Doc still flags it as flux-side work. |
| #2 | @BelongsTo() cleaner reverse-relation decorator |
Decorators exist in @coasys/ad4m. Runtime behaviour benched only indirectly via include. |
Not yet covered by S16. Next S16 case (belongsto_traversal) to add. |
| #3 | Multi-class polymorphic findAll |
Not implemented. | Reaffirmed by allItemEmbeddings() (sites 22+23). |
| #4 | Per-link reifier metadata sidecar | Not implemented; today's metadata join is unconditional on instance rows but absent on relation target rows. | Profile data adds urgency — the unconditional metadata join is what makes scan-all queries 3.4× slower per row. Making it opt-in is the same fix from two angles. |
| #5 | Nested where on relations |
Decorators exist (where_filter + where_predicates plumbed through SHACL parser → shape loader → apply_where_filter_to_relation). Runtime is N+1 SPARQL today (one query per filter predicate). |
Not benched. Next S16 case (relation_where_filter) to add. Pushdown into main SPARQL is the elegant fix. |
| #6 | UNION across query shapes | Not blocking. | No change. |
What was NOT in the original list and is now clearly open:
- Opt-in reifier-metadata join (orchestrator change, ~50 LOC). New from profile data.
- Opt-in
total_count(orchestrator change, ~30 LOC). New from profile data. - Single-plan when WHERE is selective (orchestrator change, ~80 LOC). New from profile data.
- Reverse-relation UNION fusion (orchestrator change, ~60 LOC). Surfaced by inventory audit.
- Forward-include collapse via SPARQL CONSTRUCT or subgraph extraction (the big one, ~500 LOC). Surfaced by inventory audit.
- Getter pushdown via
BIND(EXISTS {...})(medium PR, depends on Oxigraph planner behaviour). Surfaced by inventory audit. - Relation
where_filterpushdown (~100 LOC). Surfaced by inventory audit. - Projection inlining via SPARQL subqueries (medium PR). Surfaced by inventory audit.
- Multi-key sort pushdown (small PR). Surfaced by inventory audit.
- Parallel resolveLanguage batching (~100 LOC, not SPARQL). Surfaced by inventory audit.
- JSON streaming or
Solutions→Valuedirect (small refactor insparql_store.rs:query). Surfaced by inventory audit.
The endpoint is a single SPARQL CONSTRUCT round-trip per model query, regardless of include depth or projection count. The orchestrator:
- Walks the model's
ModelShapeand the query'sModelQueryInput.includeto build a single SPARQL CONSTRUCT query that materialises the entire subgraph needed — instance triples, included relations, getters lifted intoBIND/EXISTS, projections folded into subqueries, where-clauses inlined into the WHERE block. - Fires that one query against the store.
- The store returns a graph of triples (Oxigraph supports this natively as
QueryResults::Graph). - A
subgraph → treewalker in Rust consumes the triples and emits the JSON tree the TS client wants, using the model'sModelShapeas the schema for the walk. - If the shape has
resolve_languageproperties, fire a parallel batchedLanguageControllerfetch over all (lang, addr) pairs — after the SPARQL phase, but in a single concurrent batch. - Serialize the final tree once and ship over the WS RPC.
Round-trip count: 1 SPARQL + 1 batched RPC (if applicable), total — independent of N, M, K, include depth, or model complexity.
What this requires:
- Subgraph CONSTRUCT planner in the model_query builder. Rewrite
build_instance_sparqlto emit a CONSTRUCT that captures the entire requested tree. The shape + query input together determine which triples to materialise. - Tree-shape walker in hydration. Replace
group_results_by_source+hydrate_instances+resolve_includes_recursivewith a single walker that takes the triple graph + shape and emits the JSON tree directly. - Streaming where possible. Use Oxigraph's
QuerySolutionIterdirectly rather than the current "materialise to JSON string, parse it back" round-trip insparql_store.rs:query. - Holochain expression-resolution batching. Add a
LanguageController::get_expressions_batch(pairs: Vec<(lang, addr)>) → HashMap<addr, ExprJson>and use it inresolve_language_transforms. - Reified
?author/?timestampas opt-inmeta:projections (recommendation #4). Same fix as the opt-in reifier metadata above but applied recursively to relation target instances.
The result is a pipeline that:
- Hydrates one row in 1 round-trip (current: 3–4 round-trips).
- Hydrates a 3-deep include tree in 1 round-trip (current: ~10 round-trips).
- Doesn't pay reifier overhead unless the client asks for metadata.
- Doesn't pay COUNT overhead unless the client asks for total_count.
- Scales linearly with result-set size, not query-plan complexity.
Expected post-state in S16:
| Case | dev today (medium) | with all fixes | reason |
|---|---|---|---|
sr_by_expression_limit1 |
5.0× | ~1.5× | Drop count, single-plan, RPC roundtrip floor |
sr_by_expression_with_include |
5.5× | ~1.5× | Same + include via CONSTRUCT subgraph |
sr_all (no metadata requested) |
9.4× | ~2× | Drop reifier-metadata join |
embeddings_all (no metadata) |
10.6× | ~2× | Same |
topics_all |
25× | ~3× | Same; RPC floor dominates because raw is sub-ms |
Ordered by impact-per-LOC; each builds on the previous:
| PR | Scope | Effort | Expected ratio change |
|---|---|---|---|
| A. Opt-in reifier metadata | Add with_metadata: bool to ModelQueryInput, gate the ?_reifier reifies + author + timestamp clauses in build_instance_sparql. |
~50 LOC + tests | 9–25× → ~2–3× on scan-all |
B. Opt-in total_count |
Add count: bool, gate the COUNT query. |
~30 LOC + tests | -0.1ms per call (small but free) |
| C. Single-plan when WHERE selective | Heuristic in query.rs to skip TwoPhase when WHERE includes equality on a unique property. |
~80 LOC + tests | 5× → 3.5× on sr_by_expression_limit1 |
| D. Reverse-relation UNION fusion | Rewrite resolve_reverse_relations to emit one UNION SPARQL. |
~60 LOC + tests | -R round-trips per call |
| E. Multi-key sort pushdown | Extend build_instance_sparql to emit multi-key ORDER BY. |
~40 LOC + tests | Eliminates a Rust sort phase |
F. Relation where_filter pushdown |
Push apply_where_filter_to_relation into the include's SPARQL WHERE. |
~100 LOC + tests | -F round-trips |
| G. Getter inlining | Compile ASK getters into BIND(EXISTS{…}), SELECT getters into OPTIONAL{…} in main query. |
~200 LOC + tests | -G round-trips |
| H. Projection subquery inlining | Fold projections into main query as sub-SELECTs. | ~150 LOC + tests | -P round-trips |
| I. CONSTRUCT-based hydration | Replace the current SELECT + recursive include pipeline with a single CONSTRUCT + subgraph walker. | ~500 LOC + tests + reshape hydration.rs and relations.rs |
Constant 1 round-trip regardless of include depth |
| J. Parallel resolveLanguage batching | Add batched LanguageController::get_expressions_batch, use in resolve_language_transforms. |
~100 LOC + Holochain plumbing | Eliminates N×k sequential get_expression await chain |
| K. Streaming Solutions → Value | Replace sparql_store::query's "Solutions → String → from_str → Vec" with direct Solutions → Vec<Value>. |
~50 LOC + tests | -1 JSON parse round-trip per SPARQL call |
A through F are pure quick wins (~360 LOC across six small PRs). G through K are the structural rebuild. The investigation argues that A+B+C alone would close 60–80% of the S16 gap; G+I would close the rest.
Each PR adds (or extends) one S16 case so the regression gate sees the ratio collapse cleanly:
- A → s16
embeddings_all_no_metadata - C → s16
sr_by_expression_eq_no_orderby - D → s16
multi_reverse_relations - F → s16
relation_where_filter - G → s16
class_with_ask_getter - H → s16
class_with_projections - I → s16
deep_include_3_levels
Realised wins — coasys/ad4m#846 landed A/B/C/D/E/F/G/J/K
The orchestrator overhaul shipped in a single PR rather than the eleven-PR sequence the audit sketched. Items A–G + J + K all land together; H (projection inlining) and I (CONSTRUCT subgraph hydration) are deferred.
S16 ratios — fresh dev (HEAD 1f29d0b1) vs refactor/sparql-pushdown-last-write-wins (HEAD 376d4b1b). 10 runs/case + warm-up, Apple Silicon, both binaries built from the same Rust toolchain into a shared CARGO_TARGET_DIR. Improvement = dev_ratio / branch_ratio.
Earlier (now-superseded) numbers in this section used a stale dev binary cached at
~/workspaces/coasys/ad4m/target/release/ad4m-executorfrom 2026-05-22 (test-2). The fresh-vs-fresh comparison below is what the PR ships against.
| Case | dev model avg | #846 model avg | dev ratio | #846 ratio | improvement |
|---|---|---|---|---|---|
sr_by_expression_limit1 |
3.96 ms | 4.36 ms | 4.6× | 4.5× | 1.03× |
sr_by_expression_with_include |
4.02 ms | 4.28 ms | 4.8× | 4.6× | 1.05× |
sr_all |
107.21 ms | 103.11 ms | 9.0× | 8.5× | 1.06× |
embeddings_all |
68.79 ms | 68.98 ms | 8.7× | 8.8× | 0.98× |
topics_all |
14.63 ms | 14.96 ms | 25.1× | 29.7× | 0.85× — raw is sub-ms, RPC floor dominates |
embeddings_all_no_metadata (A) |
69.60 ms | 28.09 ms | 9.0× | 3.7× | 2.44× ✅ |
sr_by_expression_limit1_no_count (B) |
3.49 ms | 2.46 ms | 4.1× | 2.8× | 1.47× ✅ |
sr_by_id_single_plan (C + A + B) |
0.98 ms | 0.26 ms | 4.1× | 1.2× | 3.36× ✅ |
sr_all_no_metadata_no_count (A + B) |
115.42 ms | 38.40 ms | 9.3× | 3.3× | 2.80× ✅ |
| Case | dev ratio | #846 ratio | improvement |
|---|---|---|---|
embeddings_all_no_metadata (A) |
7.0× | 3.3× | 2.11× ✅ |
sr_by_id_single_plan (C + A + B) |
2.5× | 1.3× | 1.97× ✅ |
sr_all_no_metadata_no_count (A + B) |
7.6× | 2.8× | 2.74× ✅ |
sr_by_expression_limit1_no_count (B) |
2.4× | 1.8× | 1.32× |
| (other five cases) | — | — | parity (within ±15% noise — back-compat preserved) |
To confirm the orchestrator changes don't regress paths that don't opt in, ran S5 (queryLinks scaling) and S8 (raw querySparql over a 58k-link Flux community graph). Both use legacy code paths the orchestrator surface doesn't touch directly, but they share the underlying SparqlStore whose query helper now delegates to query_values (audit item K).
S5 — queryLinks at 100/500/1000 links:
| dataSize | queryAll dev | queryAll #846 | ratio | queryBySource dev | queryBySource #846 | ratio |
|---|---|---|---|---|---|---|
| 100 | 4.11 ms | 3.75 ms | 0.91× | 4.11 ms | 3.70 ms | 0.90× |
| 500 | 22.39 ms | 19.46 ms | 0.87× | 22.21 ms | 19.44 ms | 0.88× |
| 1000 | 46.43 ms | 45.48 ms | 0.98× | 46.55 ms | 45.40 ms | 0.98× |
S8 — Flux community graph (small = 1865 links):
| Query | dev avg | #846 avg | ratio |
|---|---|---|---|
totalItemCount |
0.51 ms | 0.44 ms | 0.86× |
allItems |
1.86 ms | 1.67 ms | 0.90× |
unprocessedItems |
0.72 ms | 0.62 ms | 0.86× |
recentConversations |
0.42 ms | 0.30 ms | 0.71× |
pinnedConversations |
0.18 ms | 0.15 ms | 0.83× |
subgroupItemsData |
0.36 ms | 0.29 ms | 0.81× |
subgroupTopics |
0.24 ms | 0.23 ms | 0.96× |
messageHydration |
0.21 ms | 0.19 ms | 0.90× |
paginatedMessages |
1.98 ms | 1.76 ms | 0.89× |
S8 — medium = 58460 links: every query within ±8% of dev — parity dominates as per-call SPARQL execution cost dwarfs per-RPC overhead.
- Opt-in cases see 1.5–3.4× ratio improvement across both tiers in S16.
sr_by_id_single_planat medium drops from 4.1× to 1.2× — essentially parity with raw SPARQL on a single-row lookup. - Back-compat S16 cases stay within run-to-run noise of
dev. The cheaper paths only engage when the caller passes the new flags (withMetadata: false,count: false, or a uniquely-selectiveidequality WHERE). - Pre-existing query paths are unaffected at large data sizes (S8 medium tier within ±8%) and see incidental 5–30% wins at small sizes (S5 100/500 + S8 small) where K's
Solutions → Vec<Value>cuts a JSON serialise+parse round trip that was a meaningful fraction of total latency. - The remaining 3–5× residual on the scan-all cases is what audit items H and I would close. H (projection inlining) and I (CONSTRUCT-based subgraph hydration) are deferred for a follow-up PR — the diff for I is large enough that landing it on top of clean A–G/J/K orchestrator changes is the cleaner path.
The model_query baseline now supports the opt-in flags that close most of the original 5–25× gap on the convert-candidate sites. Re-reading the per-category table with #846 in hand:
| Category | Pre-#846 verdict | Post-#846 verdict |
|---|---|---|
| A. Trivially convertible (5 sites) | "Major perf regression — 5–10× slower" | Convertible with withMetadata: false + count: false. Expected ratio 1.5–3×, in line with the per-call RPC floor. |
| B. Convertible with new features (10 sites) | "Plausibly a wash or win" | Convert + opt out of metadata for the read-only branches. Continued unverified for the BelongsTo traversals — that's an S16 follow-up case. |
| C. Reifier-metadata reads (4 sites) | "Keep as SPARQL" | Reaffirmed — these sites want metadata, so the opt-in toggle doesn't help. |
| D. Set-difference (2 sites) | "Keep as SPARQL" | Reaffirmed. |
| E. Inter-class joins (4 sites) | "Lean toward SPARQL" | Convertible with the same opt-in flags once the deep-include path is exercised in S16. |
Bottom line: the structural answer to "should flux migrate to Ad4mModel?" changed once #846 landed. For most call sites that don't want link-level metadata or unpaginated counts, the answer is now yes — the orchestrator no longer charges 5–10× for the privilege.
Every querySparql call site in packages/api/src/{channel,conversation, conversation-subgroup,topic,semantic-relationship} (23 sites in production
code, 12 unique methods after dedup) was audited and one of three
decisions taken:
| Site | File | New shape |
|---|---|---|
Channel.pinnedConversations() |
channel/index.ts |
findAll(Channel, { where: { isPinned: true }, include: { conversations: { limit: 1, withMetadata: false } }, withMetadata: false, count: false }) — engages the post-#846 single-plan path for the isPinned == true @Property flag. |
Conversation.stats() |
conversation/index.ts |
Subgroup count → ConversationSubgroup.findAllAndCount({ parent: { model: Conversation, id }, limit: 0, count: true, withMetadata: false }) (count-only SPARQL fast-path). Participants → perspective.get(new LinkQuery({ source: this.id, predicate: FLUX_PARTICIPANT })) (indexed link lookup, no SPARQL). |
ConversationSubgroup.stats() |
conversation-subgroup/index.ts |
Both queries replaced with parallel LinkQuery via perspective.get(...). No SPARQL roundtrip needed for the simple link enumeration; the Flux invariant that subgroup→item targets are always Message/Post/Task means the multi-type FILTER from the old SPARQL is implicit. |
SemanticRelationship.itemEmbedding(itemId) |
semantic-relationship/index.ts |
findAll(SR, { where: { expression: itemId }, include: { embeddingTag: { withMetadata: false } }, limit: 1, withMetadata: false, count: false }). The polymorphic-on-same-predicate @HasOne discrimination resolves to an Embedding only when conformance matches — verified working in S16 (include actually fires: yes). |
findEmbeddingSRId(itemId) |
conversation/util.ts |
Same shape as above; checks embeddingTag on the result instances rather than fetching the raw SR-tag triple. |
| Site | Why kept |
|---|---|
Channel.allItems() |
Cat C: wants ?_reifier timestamp of the has_child link (when the message was added to the channel), not the message entity's createdAt. |
Channel.unprocessedItems() data fetch |
Cat C: same link-level timestamp semantics. |
Channel.unprocessedItems() set-difference |
Cat D: two parallel SPARQLs feeding a JS Set difference. The pattern doesn't translate; Oxigraph FILTER NOT EXISTS hits a 60s planner cliff today. |
Channel.totalItemCount() |
Cat A but splits into 3 round trips (one per Message/Post/Task class) for the multi-type FILTER(?type IN (…)). 3 round trips + sum is strictly worse than 1 SPARQL. |
Channel.recentConversations() |
Cat A but already hand-optimised to use the native link API for timestamps (avoids the reifier-join planner cliff). |
Conversation.topics() |
Cat E: UNION query (topic linked either to the conversation directly OR via one of its subgroups). No clean Ad4mModel shape today; needs a polymorphic parent scope. |
Conversation.subgroupsData() first pass |
Cat C: reifier-timestamp read. |
Conversation.subgroupsData() batch timestamp |
Cat E: subgroup → grandparent channel via two ad4m:has_child hops, which isn't a declared relation. |
ConversationSubgroup.topics() |
Cat E: SR → tag join with conformance filter on the tag (topic vs embedding). |
ConversationSubgroup.topicsWithRelevance() |
Cat E: same shape, with relevance property. |
ConversationSubgroup.itemsData() |
Cat C: per-link reifier metadata for both subgroup_item and entry_type reifiers. |
Topic.linkedConversations() |
Cat E: 4-hop SR → conversation → channel + property fetch. |
Topic.linkedSubgroups() |
Cat E: same 4-hop chain. |
SemanticRelationship.allConversationEmbeddings() |
Cat E + missing reverse relation: needs @BelongsTo from Conversation back to SR. |
SemanticRelationship.allSubgroupEmbeddings() |
Cat E + missing reverse relation. |
SemanticRelationship.allItemEmbeddings() |
Cat E + multi-class polymorphic findAll. |
SemanticRelationship.allItemEmbeddingsByType() |
Cat B/E; convertible once the per-class polymorphic discrimination lands. Deferred. |
channel.test.ts and conversation.test.ts mocks updated:
createMockPerspective()now seedsmodelQuery: vi.fn().mockResolvedValue({ instances: [], totalCount: 0 })so call sites that converted offquerySparqlget an empty result for the empty case without each test having to opt in.- The four
Conversation.stats()tests usevi.spyOn(ConversationSubgroup, 'findAllAndCount')because the conversation-test@coasys/ad4mvi.mockprovides a stripped-down@Modeldecorator that can't drive the realfindAllAndCountpipeline. - 119 tests run; 115 pass; 4 fail. All 4 failures pre-exist on
devand are unrelated to the migration (3 areChannel.unprocessedItemstests where the samevi.mockis missingfileToDataUri, 1 is aparseLitJSON-stringify test).
Runtime correctness of this PR depends on coasys/ad4m#846 and its upstream stack (#837, #842). The withMetadata: false, count: false, and selective-WHERE single-plan paths the migrations rely on are only honoured by the post-#846 executor. Running against dev's executor falls back to the back-compat default — slower but still functional.