Status: Draft / confirmed direction
This document captures the final confirmed ideas discussed for replacing the current trace-server architecture with a general-purpose dataset analysis kernel.
The intent is to preserve all major ideas from the discussion so they are not lost.
These are the decisions currently considered confirmed unless explicitly changed later.
-
No backward compatibility is required.
- Existing adapter APIs, CLI shapes, endpoint shapes, and query globals may all be removed.
- We are optimizing for the right long-term architecture, not incremental compatibility.
-
The system will be redesigned around a dataset kernel, not format adapters with custom endpoints.
- The current
TraceAdapter<T>model is too narrow. - The replacement will use source drivers, dataset sessions, lazy layers, and model packs.
- The current
-
All query evaluation will happen against one stable root object:
ds.- No more ad hoc globals like
events,trace,byName, etc. as the core API. - Domain-specific conveniences may exist under namespaces, but
dsis the stable entry point.
- No more ad hoc globals like
-
Lazy layers are a first-class primitive.
- Datasets should load cheaply.
- Expensive derived views should build only when queried.
- Layers must support dependency tracking, memoization, cancellation, cache metadata, and eviction.
-
The system must support multiple deep domains with the same architectural quality.
- DevTools traces
- OTEL / OTLP datasets
- Sentry artifacts
- Bundle analyzer outputs
- Raw/untyped data mode
-
Raw data and semantic derived data must coexist in the same query runtime.
- The agent should be able to query raw facts, normalized facts, semantic dimensions, derived views, and reports from one place.
- We want stacked access, not an either/or choice.
-
Artifacts/files/workspace are first-class concepts.
- Logical artifacts (screenshots, source text, source maps, etc.)
- Materialized files/directories on disk
- Managed scratch/export workspace for loaders, layers, and agents
-
Provenance is mandatory.
- Derived rows and reports should be able to reference the raw event/document rows they came from.
- The agent must be able to trust and audit answers.
-
Lossless ID handling is mandatory.
- Large IDs must not silently lose precision.
- Canonical string IDs should be used where necessary.
-
Raw mode is a first-class product, not just a fallback.
- It should provide schema inference, path cataloging, inferred tables, samples, and extractable blobs/files.
-
DevTools is the first implementation target, but not the architectural center of gravity.
- DevTools is the proving ground because it is very rich.
- The final design must remain domain-general.
-
Generic tables, reports, blobs, and exports will replace most format-specific endpoints.
- Adapter-specific HTTP endpoints are not the long-term core abstraction.
-
The implementation target is Node-first, not Bun-first.
- Bun-specific runtime APIs are not part of the intended long-term architecture.
- Bun may be used opportunistically as a package manager or optional fast path, but the runtime must work cleanly on Node.
-
The server will use raw Node HTTP and a small custom router.
- We do not currently intend to base the kernel on Hono, itty-router, or similar frameworks.
- The route surface is small enough that a small custom router is preferable.
-
Query evaluation will be JavaScript-first, with optional TypeScript syntax support.
- The runtime query model should not depend on TypeScript types.
- TS syntax support may still be offered through a fast transpilation step.
-
Transpilation should use a fast runtime transpiler rather than TypeScript as the primary engine.
esbuildis the preferred default runtime transpiler.typescriptcan remain as a compatibility fallback.
-
Packaging should produce bundled runtime outputs rather than distributing the project as a large tree of source files.
- The project should have an explicit build step for packaging.
esbuildis the intended packaging/bundling tool.- The long-term publish shape should center on bundled
dist/entrypoints rather than shipping the internal source tree.
The current codebase is a useful proof of concept for load-once/query-many analysis, but it is too limited for the data we actually observed in real traces.
Across the reviewed DevTools traces, we found that the raw data contains much more than the current adapter surface exposes.
Examples of data present in reviewed traces:
- screenshots
- frame pipeline / compositor benchmark data
- event timing / interaction latency data
- CPU profile chunks
- network timing with headers and connection metadata
- inline script source text
- source maps
- original sources embedded in
sourcesContent - layout shifts
- soft navigation data
- worker and frame metadata
- render instrumentation encoded in user timing
- V8 source rundown events
- stack trace capture events
The current model exposes only a small hand-curated set of heuristic endpoints and a flat query context. That is not sufficient for high-depth agent workflows.
This spec defines a replacement architecture.
- Make large analysis artifacts queryable in a load-once/query-many workflow.
- Let agents discover what data exists in a dataset without reverse-engineering the format each time.
- Support both:
- raw/fact-level inspection
- semantic/high-level analysis
- Make repeated analysis cheap through lazy caching and reusable derived layers.
- Support artifact extraction and file materialization in a managed way.
- Preserve provenance across all derived outputs.
- Domain-general core architecture
- Rich domain-specific packs
- Async-first query runtime
- Lossless IDs and canonical units
- Explicit layer graph with dependency management
- Generic API surface that can be extended without format-specific hacks
- Be excellent for DevTools traces
- Be equally principled for OTEL, Sentry, bundle outputs, and raw mode
- Make agent workflows dramatically easier than writing one-off scripts
- Backward compatibility with the current adapter API
- Backward compatibility with existing CLI commands/endpoints/query globals
- Designing around the current code layout if it prevents a better kernel design
- Restricting the system to trace analysis only
The current architecture is roughly:
- detect format with adapter
- parse file eagerly
- build some indexes eagerly
- expose adapter-specific endpoints
- inject a flat query context into the VM
This has several limitations:
-
Adapter-specific endpoint APIs do not scale.
- They are manageable for a few heuristics.
- They are not a durable foundation for many domains.
-
buildQueryContext()kills discoverability and laziness.- It provides ad hoc globals.
- It discourages layered modeling.
- It does not expose a stable system API.
-
Too much semantic reconstruction is left to the agent.
- Cross-model joins are custom every time.
- The same analyses must be rediscovered repeatedly.
-
It does not scale to other deep domains cleanly.
- OTEL and Sentry are not just “another adapter with a few endpoints”.
-
There is no first-class artifact/file/workspace system.
- This makes screenshots, source text, sourcemaps, extracted bodies, and generated outputs awkward.
-
No explicit layer graph exists.
- Expensive derived structures are either eager or hand-built ad hoc.
-
No explicit provenance model exists.
- Derived outputs do not uniformly explain what raw rows they came from.
We reviewed multiple traces from ~/Downloads/*Trace*.json.gz plus one uncompressed JSON trace.
- Some traces are interaction-heavy.
- Some traces are frame/screenshot-heavy.
- Some traces are source/sourcemap-heavy.
- Some traces contain large inline script text payloads.
- Some traces contain many embedded sourcemaps and source contents.
- Some traces contain render instrumentation via user timing.
- Not all traces contain the same high-level signals.
Across reviewed gz traces we found examples like:
- ~239k to ~2.09M events
- ~179 to ~450 screenshots
- ~1.3k to ~14.3k
ProfileChunkevents - 0 to 392 sourcemaps in metadata
- 0MB to ~40MB inline script source text
- 0MB to ~24MB original
sourcesContent - 0 to 6212
EventTimingrows - 0 to 22
LayoutShiftrows - 0 to 22
SoftNavigationrows - 5 to 856 network requests
From one trace we inspected in detail:
- bad interaction around ~232ms total latency
- main
clickdispatch around ~201.8ms - ~736 render measures in that interaction window
- repeated rerenders in components like
VirtualItem,ChatBlock,ToolCallAccordion - many dropped frame states during the interaction
- significant React / scheduler JS hot spots
This demonstrated that the trace contained enough information to explain the interaction, but the current surface made correlation unnecessarily hard.
DevTools traces are not “just arrays of timeline events”. They are rich multi-model datasets that deserve a real semantic kernel.
From first principles, a DevTools trace is a collection of partially-overlapping event systems.
Most raw events may expose:
namecatphpidtidtsduridsargs
Different ph values imply different semantics.
Observed phase families include:
Xduration slicesIinstant eventsMmetadata eventsb/easync/nestable pairss/fflow or async link phasesnasync instants / chain pointsPCPU profile chunksN/Dobject lifecycle-ish events
Observed metadata includes:
thread_nameprocess_nameprocess_uptime_seconds
Top-level metadata observed includes:
enhancedTraceVersionsourcestartTimedataOriginhostDPRsourceMapsmodifications
Observed event families include:
EventTimingEventDispatchInputLatency::*WidgetBaseInputHandler::OnHandleInputEvent
These contain overlapping representations of user actions.
Observed event families include:
PipelineReporterBeginFrameRequestMainThreadFrameBeginImplFrameToSendBeginMainFrameSendBeginMainFrameToCommitAnimationFrameAnimationFrame::RenderAnimationFrame::StyleAndLayoutAnimationFrame::Presentation
Observed event families include:
ProfileProfileChunk
Payloads can contain:
cpuProfile.nodessamplestimeDeltaslinescolumnstrace_ids
Observed event families include:
ResourceSendRequestResourceReceiveResponseResourceReceivedDataResourceFinishResourceMarkAsCachedResourceRequestSender::*
Payloads may contain:
requestIdurlheaderstimingstatusCodeprotocolmimeType- cache and service worker flags
Observed event families include:
Screenshot
Payloads may contain:
snapshotframe_sequenceexpected_display_timesource_id
Observed event families include:
LayoutShiftSoftNavigationSoftNavigationHeuristics::*SoftNavigationContext::*- viewport/paint timing events
Observed families include:
blink.user_timingUserTiming::Measure
These often encode structured JSON strings describing component/render behavior.
Observed event families include:
ScriptCompiledScriptCatchupLargeScriptCatchupStubScriptCatchupTooLargeScriptCatchupV8StackTraceImpl::capture
These can include:
scriptIdurlsourceText- execution context info
- inline source text blobs
Some traces contain metadata.sourceMaps[] with:
urlsourceMapUrlsourceMap
Some source maps include:
sourcessourcesContentmappingsx_google_ignoreList
DevTools traces are multi-dimensional datasets containing enough information to answer much richer questions than the current API supports.
The system will be re-architected around the following top-level components:
- Source drivers
- Dataset sessions
- Dataset kernel
- Lazy layers
- Model packs
- Query runtime
- Artifacts/files/workspace subsystem
A source driver detects and opens one kind of input artifact.
Examples:
- DevTools trace driver
- OTEL/OTLP driver
- Sentry dataset driver
- Next bundle analysis driver
- Raw JSON/NDJSON/CSV driver
A dataset session represents one loaded source artifact in the running server.
It owns:
- dataset manifest
- dataset kind
- source metadata
- kernel
- layer registry/cache
- query runtime factory
- lifecycle / cleanup hooks
The kernel is the runtime substrate shared by all dataset kinds.
It owns:
- layer host
- schema/catalog
- raw store
- table registry
- report registry
- artifact store
- file materializer
- workspace manager
- capability registry
A model pack augments a dataset with reusable domain or cross-domain structure.
Examples:
devtools.*otel.*sentry.*bundle.*raw.*code.*network.*graph.*
A layer is a memoized build unit with dependencies.
Examples:
- parse ingest facts
- normalized request dimension
- render measure view
- source map registry
- interaction report helper cache
Every query will execute against one stable root object:
const dsThis object is the main query surface.
interface DatasetQueryApi {
caps: CapabilityApi;
schema: SchemaApi;
raw: RawApi;
tables: TableApi;
reports: ReportApi;
artifacts: ArtifactApi;
files: FileApi;
workspace: WorkspaceApi;
layers: LayerDebugApi;
tools: UtilityApi;
ns: NamespaceApi;
}caps: feature detection for the datasetschema: discoverability, catalogs, available tables/reports/pathsraw: lossless source-level accesstables: normalized facts, dimensions, viewsreports: opinionated summaries and explainersartifacts: logical blobsfiles: materialized files/directories on diskworkspace: managed scratch/export areaslayers: lazy-layer inspection/debuggingtools: generic helpersns: domain namespaces, e.g.devtools,otel,bundle,raw
The runtime should support both:
- generic string-driven access
- ergonomic domain namespaces
Examples:
await ds.tables.get("devtools.dims.interactions").rows()
await ds.reports.run("devtools.interaction", { id: "4758" })and optionally:
await ds.ns.devtools.interactions.rows()
await ds.ns.devtools.report.interaction("4758")The generic string-based APIs are the stable minimum; namespace sugar can sit on top.
Replace the current adapter model with source drivers.
interface SourceDriver {
id: string;
detect(source: SourceProbe): Promise<Detection | null>;
open(source: SourceHandle, detection: Detection): Promise<DatasetSession>;
}A driver is responsible for:
- recognizing input shape
- opening raw source access
- registering initial layers/model packs
- producing a dataset session
A driver is not responsible for:
- owning the entire public API surface
- hardcoding the query context globals
- being the main endpoint routing abstraction
interface DatasetSession {
id: string;
kind: string;
manifest: DatasetManifest;
kernel: DatasetKernel;
createQueryRuntime(options?: QueryRuntimeOptions): QueryRuntime;
dispose(): Promise<void>;
}interface DatasetKernel {
layers: LayerHost;
schema: SchemaRegistry;
tables: TableRegistry;
reports: ReportRegistry;
artifacts: ArtifactStore;
files: FileMaterializer;
workspace: WorkspaceManager;
caps: CapabilityRegistry;
raw: RawStore;
}The manifest should include things like:
- dataset id
- kind
- source path(s)
- source size
- detected features
- loaded timestamp
- maybe content hashes
Lazy layers solve the core problem:
- datasets can be large and rich
- not every query needs every derived structure
- repeated queries should reuse derived work
We discussed the following useful categories:
- ingest
- facts
- dims
- views
- indexes
- reports
interface LayerSpec<T> {
key: string;
deps?: string[];
when?: (caps: CapabilitySet) => boolean;
scope?: "session" | "query";
weight?: "light" | "heavy";
build(ctx: LayerContext): Promise<T>;
}The layer host should support:
- dependency resolution
- deduped concurrent builds
- memoization
- build metadata
- cancellation
- cache eviction for heavy layers
- lazy build only on access
A call to:
await ds.reports.run("devtools.interaction", { id: "4758" })might lazily build:
devtools/facts.eventsdevtools/indexes.commondevtools/dims.interactionsdevtools/views.renderMeasuresdevtools/views.framePipelinedevtools/views.mainThreadTasks
but not build unrelated layers like full source map resolution unless needed.
Layers should usually be parameter-free reusable units.
Good:
devtools/dims.interactionsdevtools/views.renderMeasures
Bad:
- a unique cached layer per interaction ID
Parameterized reports should be computed over reusable layers, not stored as an exploding layer key space.
The current buildQueryContext() approach should be removed.
- Client sends query
- Session creates a query runtime
- Query runtime exposes one root object:
ds - Query executes in async context
- Layer builds happen lazily through
ds - Timeout aborts both VM execution and in-flight async layer work
- Query runtime must be async-first
- Lazy layer calls must be
await-able - Timeout must propagate into layer builds and file reads
The VM context should contain:
ds- safe utilities like
console,performance, timers,URL,TextEncoder, etc. - lightweight presentation helpers like
pretty(...)andtable(...)
The dataset itself should be accessed through ds, not flattened globals.
Generic formatting helpers are query-runtime utilities, not dataset namespaces.
The first implementation may use JS-driven table operations, but the API should be shaped so pushdown is possible later.
We should avoid committing to “always return giant arrays” as the only model.
If possible, registry lookups like ds.tables.get(name) and ds.reports.get(name) should be cheap and chainable. Expensive or async work should happen at evaluation points like rows(), count(), run(), pretty(), and table().
A useful target shape is:
const rows = await ds.tables
.get("devtools.views.codeHotspots")
.select(["functionName", "totalDurationMs"])
.orderBy("totalDurationMs", "desc")
.limit(20)
.rows()and:
const report = await ds.reports
.get("devtools.interaction")
.args({ id: "4758" })
.run()The runtime should support a formal table query plan that can be shared across:
ds.tables- generic HTTP table-query routes
- table-aware renderers
An initial plan shape can support:
selectwhereorderByoffsetlimit- filtered
count
The first implementation may execute these plans in JS over realized rows, but table providers should be able to optionally implement direct execution later.
Structured data is the canonical result model.
However, agent workflows also need token-efficient readable output. The runtime should therefore support three complementary modes:
- structured results for composition
- built-in readable rendering via
pretty(...) - deterministic tabular rendering via
table(...)
Important rules:
pretty(...)should be compact and adaptivetable(...)should be explicit and deterministic for rectangular row data- manual string building inside queries is a first-class workflow, not a hack
- table/report handles may expose
.pretty()and.table()helpers where that is natural - plain returned objects and arrays must remain plain objects and arrays
- the system must not patch global JS prototypes or attach methods to arbitrary returned values
Representative examples:
await ds.reports.get("devtools.interaction").args({ id: "4758" }).pretty()await ds.tables.get("devtools.views.codeHotspots").limit(10).table()const summary = await ds.reports.run("devtools.interaction", { id: "4758" })
return [
`interaction ${summary.interaction.interactionId} ${summary.interaction.totalLatencyMs}ms`,
`dropped ${summary.droppedFrames} frames`,
].join("\n")One of the biggest pain points in rich artifacts is simply discovering what is present.
ds.schema should support:
- list namespaces
- list tables
- list reports
- describe capabilities
- discover field paths
- return sample values
- describe columns / units / IDs
await ds.schema.namespaces()
await ds.schema.tables()
await ds.schema.reports()
await ds.schema.describeTable("devtools.dims.interactions")
await ds.schema.paths()
await ds.schema.samples("raw.events.args.data.timing")This catalog/discoverability layer is especially important for raw mode and unknown/untyped inputs.
The system should expose data in layers:
- raw data
- normalized facts
- dimensions/entities
- reusable views
- reports/explainers
It allows:
- raw inspection when needed
- reusable semantic structures
- principled derivations
- agent-friendly analysis without hiding the source facts
raw.*facts.*dims.*views.*
Reports should be opinionated helpers built on top of reusable layers. They should never be the only way to access underlying data.
Rows in dimensions/views/reports should preserve or reference provenance to raw inputs.
Provenance is required across the system.
- raw row/event IDs
- source artifact references
- originating layer key
- possibly transformation notes
- debugging
- trust
- agent auditability
- report explainability
The system should converge on a normalized provenance contract rather than ad hoc per-table fields.
A representative shape may include:
rawIds: string[]artifactIds?: string[]layer: stringnotes?: string[]
Not every output must use these exact field names forever, but the semantics should be stable and easy for agents to recognize.
- major rows in
dims.*andviews.*should either carry provenance directly or expose an obvious provenance field - report outputs should include provenance-rich subobjects or a dedicated provenance section where appropriate
- aggregates may summarize provenance rather than enumerate every raw input, but they must still remain auditable
No major derived semantic object should be impossible to trace back to its raw origin.
Some datasets use large IDs that may overflow JS number precision.
Examples include:
- trace IDs
- frame trace IDs
- isolate IDs
- other runtime or distributed tracing IDs
Rule:
- canonicalize risky IDs as strings
- do not silently lose precision
Across domains we may encounter:
- ns
- µs
- ms
- s
- bytes
- KB/MB/GB
- relative timestamps
- wall-clock timestamps
Rule:
- facts tables should expose canonical normalized units
- original raw values should remain accessible
This is a first-class subsystem.
Logical payloads that exist in a dataset but are not necessarily materialized to disk.
Examples:
- screenshot bytes
- inline script source text
- sourcemap JSON
- original source file contents
- generated flamegraph SVG
- network response body
Materialized on-disk outputs derived from artifacts.
Examples:
- screenshot JPEG files
- exported script files
- exported source tree
- exported sourcemaps
Managed scratch/export storage used by:
- loaders
- layer builders
- reports
- agents
ds.artifactsds.filesds.workspace
This supports both:
- agent-visible exports
- internal scratch space for analysis and loaders
Artifacts are logical refs to typed payloads.
- text
- json
- image
- binary
interface ArtifactRef {
id: string;
kind: "text" | "json" | "image" | "binary";
mediaType: string;
sizeBytes?: number;
filenameHint?: string;
hash?: string;
metadata?: Record<string, unknown>;
}- get artifact metadata
- read bytes
- read text
- read JSON
- list artifacts by filter
Artifacts must be exportable as files/directories in a managed way.
- materialize one artifact as a file
- export a collection as a directory
- return stable paths + manifest metadata
- allow leases/pinning/release
Model packs should be able to register exportable file collections.
Examples:
devtools.screenshotsdevtools.scriptscode.sourcescode.source-mapsdevtools.network-bodies
Every exported directory should include a manifest mapping files back to artifacts and dataset metadata.
File materialization should not be treated as fire-and-forget forever.
The system should support:
- lease IDs for materialized files/directories
- pin/release semantics
- export cleanup policies
- quota-aware export behavior
- enough metadata to explain why an export still exists or was cleaned up
Workspace is the managed temp/scratch/export environment.
- loader scratch
- decompression scratch
- derived report scratch
- agent-visible exports
- temporary generated analysis files
- allocate scratch dir
- allocate scratch file
- list/manage export roots
- leases / pin / release
- cleanup / TTL / quotas
- sanitized paths
- no path traversal
- read-only exports by default where appropriate
- managed cleanup
Before moving on to later domains like OTEL, the workspace/artifact subsystem should have a real lifecycle model for:
- scratch cleanup
- export TTL and/or quota enforcement
- lease release
- pinned vs evictable outputs
- operator-visible status for active workspace usage
A global CAS can still remain future work, but per-session lifecycle behavior should be real rather than aspirational.
A global or semi-global content-addressed blob store is strongly recommended.
It avoids repeatedly storing the same:
- screenshots
- source texts
- source maps
- sourcesContent blobs
- generated artifacts
- artifact bytes stored by hash in CAS
- exports are lightweight views / links / copies with manifests
DevTools is the first implementation target and the richest immediate proving ground.
Examples:
- screenshots available
- CPU profile data available
- EventTiming available
- frame pipeline data available
- network timing available
- source rundown/source text available
- source maps available
- sourcesContent available
- user timing render instrumentation available
- layout shift available
- soft navigation available
Should expose raw trace access and metadata.
Examples:
- raw event rows
- raw metadata
- raw path samples
Normalize raw events into reusable facts.
Examples:
- event rows with canonical columns
- instant event table
- slice event table
- async/flow facts
- CPU sample facts
- object lifecycle facts
Canonical extracted columns should be considered for common nested fields like:
- frame ID
- request ID
- script ID
- interaction ID
- trace/sample ID
- node ID
- task ID
- frame sequence ID
- URL
Examples:
- by name
- by category
- by phase
- by thread
- by request ID
- by script ID
- by interaction ID
- by frame sequence ID
- by node ID
- by URL
Examples:
- threads
- processes
- frames
- requests
- interactions
- tasks
- scripts
- source maps
- original sources
- screenshots
- layout shifts
- soft navigations
- workers
- layers
Examples:
- frame pipeline
- main-thread tasks
- render measures
- code hotspots
- network waterfall
- visual changes
- interaction windows
Examples:
- interaction report
- request report
- frame report
- script report
- soft navigation report
Examples:
- screenshots
- scripts
- source maps
- original sources
- network bodies
The trace showed a strong need for first-class interaction modeling.
Desired table/report coverage:
- deduped interaction entities
- grouped input representations (
pointerup,mouseup,click, etc.) - total latency
- dispatch latency
- render count
- dropped frame count
- related JS hot spots
- related layout/paint
- related screenshots
Desired support:
- decoded user timing render rows
- joined begin/end/measure semantics
- parsed detail JSON
- component names / track / prop instability
- aggregation by component
- render measures scoped to time windows or interactions
Desired support:
- frame sequence correlation
- dropped/presented states
- benchmark stage timings
- relation to screenshots and interactions
Desired support:
- scripts dimension
- inline source text artifacts
- source maps dimension
- original source files dimension
- source-backed hotspot attribution
Desired support:
- lifecycle correlation by request ID
- headers/timing/protocol/cache info
- raw bodies when available
- relation to screenshots/interactions when useful
Desired support:
- clustered shift entities
- impacted nodes / rects
- relation to interaction windows
- soft navigation contexts and task IDs
Desired support:
- decode
ProfileChunkdata into normalized CPU sample facts - canonical CPU node/frame dimensions
- self time vs total time semantics
- call-tree / folded-stack style derived views
- timeline buckets for hot code over time
- interaction-scoped and task-scoped CPU hotspot views
- source-backed attribution tied to scripts/source maps/sources when possible
Desired support:
- explicit instant-event, slice-event, and async/flow fact coverage
- reusable indexes by common keys like request/script/interaction/frame/url
- process/frame/worker/layer dimensions where the trace supports them
- better first-class task entities instead of only task-like derived rows
Desired support:
- request/response body artifacts when the trace contains them
devtools.network-bodiesas an exportable collection- linkage from requests to exported bodies and related screenshots/interactions where useful
Desired support:
- built-in readable rendering for the major DevTools reports
- compact tables for the major DevTools dimensions/views
- coherent provenance across facts/dimensions/views/reports
DevTools should only be considered “complete” for this spec when the raw/facts/indexes/dimensions/views/reports/artifacts stack is strong enough to support rich investigations without forcing the agent to reconstruct core semantic joins from scratch.
The architecture must support deep semantic modeling for multiple domains.
Possible dims/views/reports:
- resources
- scopes
- spans
- links
- logs
- metrics
- service graph
- trace graph
- latency summaries
- error hot spots
- file/artifact exports for related payloads
Possible dims/views/reports:
- issues
- events
- transactions
- breadcrumbs
- exceptions
- threads
- stack frames
- release artifacts
- suspect code
- report surfaces for issues/transactions
Possible dims/views/reports:
- modules
- chunks
- assets
- routes
- dependency graph
- duplication analysis
- route size reports
- module hot spots
Raw mode is a first-class domain.
Desired support:
- schema inference
- path cataloging
- inferred tables
- samples
- time field detection
- extractable embedded blobs
- generic exports
Desired support:
- nested-array inferred tables, not just top-level array discovery
- path-based naming that stays understandable and stable
- richer type summaries and path statistics where feasible
- sane sampling/truncation behavior for very large raw documents
Desired support:
- data URLs
- wrapper objects like
{ data, mimeType }or{ body, encoding: "base64" } - byte-array blobs
- base64/gzip-wrapped text or JSON payloads
- media sniffing from obvious magic bytes
- confidence scoring and filename/media-type hints
Desired support:
- readable raw summary rendering
- compact table rendering for schema/path catalogs
- export manifests rich enough to explain extracted blobs
Raw mode should only be considered “complete” for this spec when it supports strong schema discovery, nested inference, robust blob extraction heuristics, and agent-friendly readable summaries without being reduced to a weak fallback path.
We explicitly discussed that some packs should be reusable across domains rather than tightly coupled to one dataset kind.
Examples:
-
code pack
- scripts
- source maps
- original sources
- source-backed attribution
-
network pack
- request/response modeling
- timing and protocol info
-
graph pack
- relations and edges
- parent/child/dependency graph operations
-
artifact/file pack
- artifact registry
- export collections
- file materialization
-
raw schema/catalog pack
- path discovery
- type summaries
- inferred table helpers
This helps keep the architecture generic and composable.
We identified that graph relations are important and should be treated as a real reusable concept.
Examples:
- DevTools: event -> script -> source map -> source -> interaction -> frame
- OTEL: span -> parent span -> service -> resource -> log
- Bundle: module -> chunk -> route
- Sentry: event -> stack frame -> source map -> source
The system should make it easy to model and query edges without forcing everything into flat tables only.
The current adapter-specific endpoints are not the target architecture.
Representative long-term routes could include:
- session caps
- session schema
- session tables registry
- session reports registry
- generic table queries
- generic report execution
- artifact access
- file export access
- generic query execution
Representative long-term commands could include:
schematablesreportqueryartifactexportstatus
The CLI should be full-featured, but it should primarily surface the generic runtime rather than duplicate it with an ever-growing garden of special-case commands.
For agents, the primary interface should be the query runtime:
dsfor dataset accesspretty(...)for adaptive readable outputtable(...)for deterministic tabular output- manual string building for custom compact summaries
CLI/docs/skills should teach these runtime primitives directly.
Report and table presentation helpers should be shared between the runtime and CLI rather than reimplemented separately in command-specific code.
To support large rich datasets, we should not rely only on large nested JS objects.
A useful physical model is:
Stores raw source-level material.
Examples:
- raw event arrays
- raw documents
- raw metadata
Stores normalized rows/columns suitable for query layers.
Stores heavy payloads out-of-line.
Examples:
- screenshots
- source text
- source maps
- original source contents
- response bodies
Stores schema/path/type/statistical discovery.
Lazy layers are not sufficient without explicit lifecycle policy.
- heavy layer memory usage
- blob storage size
- export size
- session workspace cleanup
- eviction policy
- TTL or LRU for cold layers
- keep light reusable layers cached
- allow heavy layers to be evicted with metadata retained
- maintain build metadata for debugging
- use CAS for heavy repeated blobs
Layers and workspace-backed outputs should expose enough lifecycle metadata for operators and agents to reason about cache state.
Representative metadata includes:
statusbuildMslastAccessedAtsizeByteswhere feasible- dependency keys
evictablepinned
The kernel should support:
- evicting cold layers
- pinning/unpinning layers or outputs
- releasing export/materialization leases
- workspace cleanup under TTL/quota rules
- enough status visibility to debug why data was retained or evicted
A full global CAS can remain future work, but a real per-session lifecycle policy should be in place before moving on to later domains.
The system must support cancellation beyond just VM timeout.
Timeout should abort:
- query execution
- layer builds
- large file reads
- expensive derived report generation
- exports if needed
- safe managed export paths
- sanitized filenames
- no path traversal
- controlled workspace roots
- possible quotas and cleanup policies
This was identified as important future headroom.
Examples:
- DevTools trace + bundle output
- Sentry event + release artifacts + source maps
- OTEL traces + logs + metrics loaded separately
This is not required for the first milestone, but the architecture should not make it impossible.
Build:
- source driver interface
- dataset session + kernel
- layer host
- query runtime with
ds - schema/catalog registry
- artifact/file/workspace subsystem
Build:
- raw DevTools source driver
- raw event store
- basic caps detection
- schema/path discovery
- initial normalized facts
Build:
- common indexes
- interactions
- requests
- screenshots
- scripts
- source map/source layers
- frame pipeline
- render measures
- layout shifts / soft navigations
Build:
- generic HTTP routes
- generic CLI
- report/table/artifact/export commands
- first-class readable output via
pretty(...),table(...), and report/table presentation helpers
Build:
- generic raw driver
- schema inference
- inferred tables
- extractable blob support
- richer raw blob heuristics and readable summaries
Before OTEL, complete:
- pushdown-ready table/query plan
- runtime presentation model
- provenance normalization
- layer/workspace lifecycle basics
- DevTools completion for the agreed semantic surfaces
- raw-mode completion for the agreed schema/blob surfaces
- docs/checklist updates aligned with the actual runtime surface
Use OTEL to pressure-test the generality of the architecture.
Use these to validate graph/code/source/artifact cross-domain capabilities.
The following current ideas are expected to be removed or de-emphasized:
TraceAdapter<T>as the main model- adapter-owned custom endpoint maps as the core abstraction
buildQueryContext()returning arbitrary globals- flat query contexts as the main API
- format-specific endpoint routing as the center of the design
- eager adapter-owned derived models where lazy reusable layers are better
No.
Yes.
Yes, but as model-pack tables/reports/namespaces, not as the fundamental architecture.
Yes.
Yes.
Yes.
Examples of the intended style:
await ds.schema.tables()await ds.tables.get("devtools.dims.interactions").rows()await ds.tables.get("devtools.views.codeHotspots").limit(10).table()await ds.reports.run("devtools.interaction", { id: "4758" })await ds.reports.get("devtools.interaction").args({ id: "4758" }).pretty()await ds.files.exportCollection("devtools.screenshots")await ds.files.materializeArtifact("artifact:devtools:script:26")await ds.tables.get("code.dims.sources").rows()const rows = await ds.tables.get("raw.schema.paths").limit(20).rows()
table(rows)const report = await ds.reports.run("devtools.interaction", { id: "4758" })
return [
`interaction ${report.interaction.interactionId} ${report.interaction.totalLatencyMs}ms`,
`dropped ${report.droppedFrames} frames`,
].join("\n")These examples intentionally show:
- generic access
- domain-specific data
- file export/materialization
- raw and semantic layers living side by side
- built-in readable presentation
- manual string building as a first-class workflow
The final confirmed direction is:
- Replace the current adapter system with a dataset kernel.
- Build all semantics through lazy layers and model packs.
- Expose one stable query root,
ds. - Treat raw data, normalized facts, semantic dimensions/views, and reports as separate but connected layers.
- Make artifacts, materialized files, and workspace management first-class.
- Provide first-class readable presentation through
pretty(...),table(...), and report/table presentation helpers without polluting plain JS values. - Preserve provenance, lossless IDs, and canonical units.
- Make DevTools the first implementation target, but keep the architecture fully capable of supporting OTEL, Sentry, bundle outputs, and raw mode at the same level of depth.
This is the direction we should implement unless a later explicit design review changes it.