Context
This note captures the current behavior in the datafn/server query planner and execution pipeline, based on tracing the implementation in the codebase.
The original questions were:
- Why does the server use
IN_MEMORY for groupBy and having?
- Why does the server use
IN_MEMORY for search? Is the searchfn/search provider path being skipped when available?
- What would it take to support true aggregate pushdown instead of in-memory aggregation?
Short Answer
IN_MEMORY in the current planner means the query cannot be expressed entirely through the adapter pushdown path that exists today. It does not necessarily mean a provider is ignored or that the feature is unsupported.
For groupBy and having:
- The server has no aggregate pushdown API at the adapter boundary.
- Aggregate execution is implemented in-memory by loading records, grouping them in JavaScript, computing aggregates, then applying
having as a post-aggregation filter.
For search:
- The planner still classifies search queries as
IN_MEMORY.
- But at execution time, the server does use the configured
searchProvider when present.
- The search provider returns candidate IDs first, and then the server runs the normal DFQL execution pipeline over those candidates.
- So search retrieval can be provider-backed, while final DFQL shaping still happens in the store-backed/in-memory execution path.
Where The Classification Happens
The planner explicitly classifies the following as IN_MEMORY:
search
groupBy
having
- relation-heavy filter patterns such as
$any, $all, $none
- dot-path filters
- HTREE select tokens
This is the key current logic:
if (query.search) return "IN_MEMORY";
if (query.groupBy || query.having) return "IN_MEMORY";
The important implication is that the planner is not asking whether the operation is semantically valid. It is asking whether the query can be executed by the existing pushdown machinery.
Why groupBy / having Are In-Memory Today
1. The pushdown path only knows how to do row retrieval, not grouped aggregation
The current pushdown implementation is built around:
- converting DFQL filters to adapter
where clauses
- converting DFQL sort to adapter
orderBy
- calling
db.findMany(...)
- optionally calling
db.count(...)
- projecting fields afterward
That means pushdown today is designed around record retrieval, not grouped result production.
There is no adapter method in the current contract that expresses:
GROUP BY
- aggregate functions like
count, sum, avg, min, max
HAVING
- aggregate-aware sorting
- grouped pagination semantics
Because that capability is missing at the adapter boundary, the server has nowhere to push the aggregate query down to.
2. The actual aggregate executor is explicitly in-memory
Aggregate query execution currently does the following:
- starts from loaded records
- applies normal filters in-memory
- groups records into a
Map
- computes each aggregation over the grouped records
- applies
having as a filter over the grouped rows
- sorts grouped rows
- applies pagination / cursor logic
This is an in-process aggregate engine, not a translation layer to the DB.
3. having depends on post-aggregation rows
The current having behavior is implemented over rows that only exist after grouping and aggregation. In the current architecture, those rows are not represented in the adapter API. So having cannot be pushed independently; it is coupled to the lack of aggregate pushdown overall.
Why search Is Also Classified As IN_MEMORY
1. Search is treated as a two-phase operation
The current search pipeline works like this:
- obtain candidate IDs using either:
- the configured
searchProvider, or
- a DB-native full-text fallback when the adapter reports native support
- merge the candidate IDs into a filter like
id in [...]
- re-run the main DFQL execution path for filtering, shaping, sorting, pagination, and projection
So the planner marks search as IN_MEMORY because the query is not fully expressible via the normal adapter pushdown path alone.
2. The provider is still used when available
The server-side execution branch does not skip the provider. It checks:
- if
searchProvider exists, call it
- else if DB native full-text support exists, use that fallback
- else return unsupported
That means the provider is absolutely part of execution when configured.
3. Why the classification still says IN_MEMORY
The label is really about the overall execution strategy, not just candidate retrieval.
Even when the provider is used, the server still performs a follow-up DFQL execution step over the matched candidates. That is why search remains in the IN_MEMORY bucket in the current planner design.
Important Clarification About Validation
There is some stale wording in validation comments/messages that refers to requiring a searchfn plugin. The actual server configuration path is based on searchProvider, with DB-native fallback recognized as well.
So conceptually:
- old wording suggests a plugin-only model
- current implementation supports a pluggable
searchProvider and DB-native full-text fallback
The execution wiring reflects the newer model even if some comments still use older language.
What The Current Architecture Is Optimized For
The present architecture cleanly supports three broad strategies:
- full pushdown for simple base-resource queries
- partial pushdown for queries where the primary resource can be fetched efficiently but relation expansion must happen afterward
- in-memory/store-backed execution for everything that requires more expressive semantics than the adapter contract currently exposes
This is a reasonable design boundary. The issue is not that aggregation or search are invalid. The issue is that the adapter contract and pushdown layer are narrower than the full DFQL language.
What Would Need To Change For True Aggregate Pushdown
1. Extend the adapter contract
The first necessary change is at the database adapter boundary. A new aggregate-oriented method would be needed, something conceptually like:
That method would need to express:
model
where
groupBy
aggregations
having
orderBy
limit
offset
namespace
- possibly cursor semantics later
Without this, the server has no stable way to ask an adapter for grouped results.
2. Add adapter capability flags
As with native full-text support today, aggregate support should be feature-detected. Examples:
capabilities.operations.aggregate
- potentially finer-grained flags such as:
- supports
having
- supports aggregate sort by alias
- supports base-field groupBy only
- supports dot-path or relation-aware grouping
This matters because not every adapter will support the full surface area at once.
3. Add a new server execution path
The server would need a new execution branch for aggregate-capable adapters.
Instead of always sending groupBy / having to the in-memory executor, the server would do something like:
- if query has aggregate semantics and adapter supports aggregate pushdown, execute aggregate pushdown
- otherwise keep the current in-memory fallback
This preserves compatibility while allowing adapters to opt in incrementally.
4. Build DFQL-to-adapter aggregate translation
The server would need a translation layer analogous to the existing filter/sort pushdown converter. That translation would need to handle:
- pre-group
filters -> adapter where
groupBy
- aggregate definitions and aliases
having on grouped fields and aggregate aliases
- sort terms that may reference group keys or aggregate aliases
- grouped pagination semantics
5. Preserve Datafn semantics exactly
One subtle but important requirement: the pushdown path must preserve the existing Datafn aggregate semantics.
The current shared aggregate utility intentionally defines count as counting records, not non-null field values. SQL adapters often distinguish:
If the new pushdown path is implemented carelessly, it could silently diverge from current in-memory results. So aggregate pushdown must match the current behavior exactly, even when the underlying SQL engine has different defaults.
Recommended Scope For A First Aggregate Pushdown Version
A practical first version would likely support only the simplest safe subset:
- base-resource
groupBy only
- no relation traversal in
groupBy
- no relation expansion in
select for aggregate queries
having only on group keys and aggregate aliases
- sort by group keys and aggregate aliases only
- limit/offset pagination before tackling full cursor semantics
This would unlock meaningful performance improvements without forcing the entire DFQL relation model into the first adapter API revision.
Why Keeping The In-Memory Fallback Still Matters
Even after aggregate pushdown exists, the in-memory path should remain for:
- adapters with no aggregate support
- memory or test adapters
- complex aggregate queries the adapter cannot express yet
- relation-aware grouping cases beyond the initial scope
That keeps the language portable while allowing more capable adapters to accelerate common cases.
Practical Architectural Reading
The current system should be read as:
FULL_PUSHDOWN: pure adapter-supported retrieval query
PARTIAL_PUSHDOWN: adapter-supported base retrieval plus relation expansion afterward
IN_MEMORY: a broader execution bucket for queries whose semantics exceed the current adapter pushdown contract
That last category includes both:
- true in-memory computation like grouped aggregation
- hybrid flows like provider-backed search plus store-backed DFQL shaping
So the label is slightly broader than the literal phrase suggests.
Why @superfunctions/db Is The Real Boundary
This point deserves to be explicit because it changes where the fix should live.
The root cause is not just that datafn/server lacks aggregate pushdown code. The deeper cause is that the upstream adapter contract in @superfunctions/db does not currently expose any grouped query primitive.
Today, the Adapter interface supports:
findOne
findMany
count
- CRUD and batch operations
- transactions
- schema/lifecycle methods
It does not support anything like:
aggregate(...)
groupBy(...)
- grouped result rows
- aggregate-aware
having
- grouped ordering and grouped pagination
That means datafn/server cannot cleanly implement true aggregate pushdown on its own without either:
- extending
@superfunctions/db, or
- bypassing the adapter abstraction and re-implementing DB-specific grouped query logic inside Datafn
The second option would be the wrong design move. It would duplicate responsibilities, create adapter-specific logic in the wrong layer, and make it much harder to preserve consistent semantics across adapters.
Why This Matters For Anyone Fixing The Issue
A future implementation should not start by adding aggregate SQL directly inside datafn/server pushdown code as a one-off workaround.
That would likely cause all of the following:
- duplication of query capabilities that belong in
@superfunctions/db
- inconsistent behavior between adapters
- difficulty supporting non-SQL or partial-capability adapters
- risk of semantic drift from the in-memory aggregate implementation already shared by Datafn
The correct architectural reading is:
@superfunctions/db defines what can be pushed down
datafn/server translates DFQL into that contract
- if the contract does not express grouped queries, Datafn must fall back to in-memory aggregation
Why Search Looks Different
Search already has a narrow foothold in the @superfunctions/db capability model because adapter capabilities include operations.fulltext.
That allows Datafn to distinguish:
- provider-backed search
- DB-native full-text fallback
- unsupported adapters
There is no equivalent aggregate capability today. That is a strong signal that aggregate pushdown is not merely missing in Datafn, but missing at the adapter layer itself.
Semantics Risk If This Is Implemented In The Wrong Layer
The shared aggregate semantics in Datafn are not automatically the same as native SQL semantics.
A good example is count:
- the current shared aggregate utility defines
count as counting records
- many SQL implementations distinguish
COUNT(*) from COUNT(field) and handle nulls differently
If grouped queries are implemented ad hoc only inside Datafn pushdown logic, different adapters could easily end up with subtly different results.
Putting aggregate support into @superfunctions/db first creates a single place to define and normalize those semantics.
Package-By-Package Implementation Checklist
1. packages/db
This is the first place that should change.
Add an aggregate query contract to the adapter system, for example:
aggregate(params: AggregateParams): Promise<AggregateRow[]>
Define the supporting types for:
- group keys
- aggregate operations and aliases
having
- grouped
orderBy
- limit/offset
- namespace
- optional future cursor semantics
Extend adapter capabilities with at least:
Potentially add finer-grained capability flags if needed, such as:
- supports
having
- supports aggregate sort by alias
- supports grouped pagination
- supports only base-field
groupBy
2. packages/db adapter implementations
After the contract exists, implement it in the concrete adapters where feasible:
- Drizzle adapter
- Kysely adapter
- Prisma adapter
- memory adapter as a reference/fallback implementation if it makes sense
The implementation must preserve Datafn’s existing aggregate semantics rather than leaking raw ORM/SQL defaults.
3. datafn/server
Only after the adapter contract exists should Datafn’s pushdown path be extended.
Changes would include:
- adding DFQL-to-adapter aggregate translation
- branching to aggregate pushdown when the adapter reports support
- keeping the current in-memory aggregate executor as fallback
- updating planner classification so
groupBy/having are not unconditional IN_MEMORY when aggregate pushdown is available
4. datafn/core
Keep datafn/core as the source of shared aggregate semantics.
If new aggregate behavior is introduced, validate whether the shared utility remains the semantic reference for:
count
- null handling
- alias behavior
- aggregate result row shape
That way pushdown and in-memory paths stay aligned.
5. Docs and tests
Update docs and tests in both layers:
@superfunctions/db docs for the new adapter API
- Datafn docs for aggregate pushdown behavior and fallback behavior
- adapter contract tests for grouped queries
- Datafn integration tests that assert pushdown and in-memory paths return the same results
Guardrail For Future Work
A good rule for anyone implementing this:
If the feature requires expressing a new class of database query, first ask whether @superfunctions/db can represent it.
If the answer is no, the work should begin in packages/db, not in datafn/server.
Conclusion
The current behavior is internally consistent:
groupBy and having are IN_MEMORY because grouped aggregation is implemented entirely in the server and the adapter contract has no aggregate query primitive.
search is also classified as IN_MEMORY, but that does not mean the provider is ignored. The provider is used to retrieve candidate IDs; the final DFQL query is then executed over those candidates in the normal server pipeline.
- The missing piece is not only in Datafn. The real boundary is
@superfunctions/db, which currently lacks an aggregate query contract and aggregate capability flags.
- True aggregate pushdown should therefore start in
packages/db, then be consumed by datafn/server, rather than being reimplemented as duplicate DB logic in Datafn.
Suggested Follow-Up
If we want to take this from analysis to implementation, the clean next step is a small design/spec pass that defines:
- the aggregate adapter method signature in
@superfunctions/db
- capability flags in
@superfunctions/db
- exact semantics for
count, having, sorting, and pagination
- the minimum supported aggregate subset for v1
- the Datafn-side execution and fallback plan once the adapter contract exists
That would let the implementation proceed without ambiguity, without duplication across layers, and without accidental semantic drift from the current in-memory executor.
Context
This note captures the current behavior in the
datafn/serverquery planner and execution pipeline, based on tracing the implementation in the codebase.The original questions were:
IN_MEMORYforgroupByandhaving?IN_MEMORYforsearch? Is thesearchfn/search provider path being skipped when available?Short Answer
IN_MEMORYin the current planner means the query cannot be expressed entirely through the adapter pushdown path that exists today. It does not necessarily mean a provider is ignored or that the feature is unsupported.For
groupByandhaving:havingas a post-aggregation filter.For
search:IN_MEMORY.searchProviderwhen present.Where The Classification Happens
The planner explicitly classifies the following as
IN_MEMORY:searchgroupByhaving$any,$all,$noneThis is the key current logic:
The important implication is that the planner is not asking whether the operation is semantically valid. It is asking whether the query can be executed by the existing pushdown machinery.
Why
groupBy/havingAre In-Memory Today1. The pushdown path only knows how to do row retrieval, not grouped aggregation
The current pushdown implementation is built around:
whereclausesorderBydb.findMany(...)db.count(...)That means pushdown today is designed around record retrieval, not grouped result production.
There is no adapter method in the current contract that expresses:
GROUP BYcount,sum,avg,min,maxHAVINGBecause that capability is missing at the adapter boundary, the server has nowhere to push the aggregate query down to.
2. The actual aggregate executor is explicitly in-memory
Aggregate query execution currently does the following:
Maphavingas a filter over the grouped rowsThis is an in-process aggregate engine, not a translation layer to the DB.
3.
havingdepends on post-aggregation rowsThe current
havingbehavior is implemented over rows that only exist after grouping and aggregation. In the current architecture, those rows are not represented in the adapter API. Sohavingcannot be pushed independently; it is coupled to the lack of aggregate pushdown overall.Why
searchIs Also Classified AsIN_MEMORY1. Search is treated as a two-phase operation
The current search pipeline works like this:
searchProvider, orid in [...]So the planner marks search as
IN_MEMORYbecause the query is not fully expressible via the normal adapter pushdown path alone.2. The provider is still used when available
The server-side execution branch does not skip the provider. It checks:
searchProviderexists, call itThat means the provider is absolutely part of execution when configured.
3. Why the classification still says
IN_MEMORYThe label is really about the overall execution strategy, not just candidate retrieval.
Even when the provider is used, the server still performs a follow-up DFQL execution step over the matched candidates. That is why search remains in the
IN_MEMORYbucket in the current planner design.Important Clarification About Validation
There is some stale wording in validation comments/messages that refers to requiring a
searchfnplugin. The actual server configuration path is based onsearchProvider, with DB-native fallback recognized as well.So conceptually:
searchProviderand DB-native full-text fallbackThe execution wiring reflects the newer model even if some comments still use older language.
What The Current Architecture Is Optimized For
The present architecture cleanly supports three broad strategies:
This is a reasonable design boundary. The issue is not that aggregation or search are invalid. The issue is that the adapter contract and pushdown layer are narrower than the full DFQL language.
What Would Need To Change For True Aggregate Pushdown
1. Extend the adapter contract
The first necessary change is at the database adapter boundary. A new aggregate-oriented method would be needed, something conceptually like:
aggregate(params)That method would need to express:
modelwheregroupByaggregationshavingorderBylimitoffsetnamespaceWithout this, the server has no stable way to ask an adapter for grouped results.
2. Add adapter capability flags
As with native full-text support today, aggregate support should be feature-detected. Examples:
capabilities.operations.aggregatehavingThis matters because not every adapter will support the full surface area at once.
3. Add a new server execution path
The server would need a new execution branch for aggregate-capable adapters.
Instead of always sending
groupBy/havingto the in-memory executor, the server would do something like:This preserves compatibility while allowing adapters to opt in incrementally.
4. Build DFQL-to-adapter aggregate translation
The server would need a translation layer analogous to the existing filter/sort pushdown converter. That translation would need to handle:
filters-> adapterwheregroupByhavingon grouped fields and aggregate aliases5. Preserve Datafn semantics exactly
One subtle but important requirement: the pushdown path must preserve the existing Datafn aggregate semantics.
The current shared aggregate utility intentionally defines
countas counting records, not non-null field values. SQL adapters often distinguish:COUNT(*)COUNT(field)If the new pushdown path is implemented carelessly, it could silently diverge from current in-memory results. So aggregate pushdown must match the current behavior exactly, even when the underlying SQL engine has different defaults.
Recommended Scope For A First Aggregate Pushdown Version
A practical first version would likely support only the simplest safe subset:
groupByonlygroupByselectfor aggregate querieshavingonly on group keys and aggregate aliasesThis would unlock meaningful performance improvements without forcing the entire DFQL relation model into the first adapter API revision.
Why Keeping The In-Memory Fallback Still Matters
Even after aggregate pushdown exists, the in-memory path should remain for:
That keeps the language portable while allowing more capable adapters to accelerate common cases.
Practical Architectural Reading
The current system should be read as:
FULL_PUSHDOWN: pure adapter-supported retrieval queryPARTIAL_PUSHDOWN: adapter-supported base retrieval plus relation expansion afterwardIN_MEMORY: a broader execution bucket for queries whose semantics exceed the current adapter pushdown contractThat last category includes both:
So the label is slightly broader than the literal phrase suggests.
Why
@superfunctions/dbIs The Real BoundaryThis point deserves to be explicit because it changes where the fix should live.
The root cause is not just that
datafn/serverlacks aggregate pushdown code. The deeper cause is that the upstream adapter contract in@superfunctions/dbdoes not currently expose any grouped query primitive.Today, the
Adapterinterface supports:findOnefindManycountIt does not support anything like:
aggregate(...)groupBy(...)havingThat means
datafn/servercannot cleanly implement true aggregate pushdown on its own without either:@superfunctions/db, orThe second option would be the wrong design move. It would duplicate responsibilities, create adapter-specific logic in the wrong layer, and make it much harder to preserve consistent semantics across adapters.
Why This Matters For Anyone Fixing The Issue
A future implementation should not start by adding aggregate SQL directly inside
datafn/serverpushdown code as a one-off workaround.That would likely cause all of the following:
@superfunctions/dbThe correct architectural reading is:
@superfunctions/dbdefines what can be pushed downdatafn/servertranslates DFQL into that contractWhy Search Looks Different
Search already has a narrow foothold in the
@superfunctions/dbcapability model because adapter capabilities includeoperations.fulltext.That allows Datafn to distinguish:
There is no equivalent aggregate capability today. That is a strong signal that aggregate pushdown is not merely missing in Datafn, but missing at the adapter layer itself.
Semantics Risk If This Is Implemented In The Wrong Layer
The shared aggregate semantics in Datafn are not automatically the same as native SQL semantics.
A good example is
count:countas counting recordsCOUNT(*)fromCOUNT(field)and handle nulls differentlyIf grouped queries are implemented ad hoc only inside Datafn pushdown logic, different adapters could easily end up with subtly different results.
Putting aggregate support into
@superfunctions/dbfirst creates a single place to define and normalize those semantics.Package-By-Package Implementation Checklist
1.
packages/dbThis is the first place that should change.
Add an aggregate query contract to the adapter system, for example:
aggregate(params: AggregateParams): Promise<AggregateRow[]>Define the supporting types for:
havingorderByExtend adapter capabilities with at least:
operations.aggregatePotentially add finer-grained capability flags if needed, such as:
havinggroupBy2.
packages/dbadapter implementationsAfter the contract exists, implement it in the concrete adapters where feasible:
The implementation must preserve Datafn’s existing aggregate semantics rather than leaking raw ORM/SQL defaults.
3.
datafn/serverOnly after the adapter contract exists should Datafn’s pushdown path be extended.
Changes would include:
groupBy/havingare not unconditionalIN_MEMORYwhen aggregate pushdown is available4.
datafn/coreKeep
datafn/coreas the source of shared aggregate semantics.If new aggregate behavior is introduced, validate whether the shared utility remains the semantic reference for:
countThat way pushdown and in-memory paths stay aligned.
5. Docs and tests
Update docs and tests in both layers:
@superfunctions/dbdocs for the new adapter APIGuardrail For Future Work
A good rule for anyone implementing this:
If the feature requires expressing a new class of database query, first ask whether
@superfunctions/dbcan represent it.If the answer is no, the work should begin in
packages/db, not indatafn/server.Conclusion
The current behavior is internally consistent:
groupByandhavingareIN_MEMORYbecause grouped aggregation is implemented entirely in the server and the adapter contract has no aggregate query primitive.searchis also classified asIN_MEMORY, but that does not mean the provider is ignored. The provider is used to retrieve candidate IDs; the final DFQL query is then executed over those candidates in the normal server pipeline.@superfunctions/db, which currently lacks an aggregate query contract and aggregate capability flags.packages/db, then be consumed bydatafn/server, rather than being reimplemented as duplicate DB logic in Datafn.Suggested Follow-Up
If we want to take this from analysis to implementation, the clean next step is a small design/spec pass that defines:
@superfunctions/db@superfunctions/dbcount,having, sorting, and paginationThat would let the implementation proceed without ambiguity, without duplication across layers, and without accidental semantic drift from the current in-memory executor.