Architecture

Deep-dive companion to README.md. Read the README first for the quickstart and headline story; read this for the multi-node compose shape, plugin write-back path, schema rationale, and scaling notes.

Domain model
Schema and per-table retention
Multi-node compose
Plugin write-back path (cross-node)
Three patterns for feeding a UI panel
Processing Engine triggers
Enterprise features used
Token bootstrap (cluster-wide)
Plugin conventions and gotchas
Security notes
Scaling to production
Extending the cluster

1. Domain model

A data-center Clos fabric: 8 spines, 16 leaves, 48 servers per leaf rack (768 servers total, modeled only as flow src_ip/dst_ip). Total ~1024 fabric interfaces, ~128 leaf↔spine BGP sessions, ~5,000 sampled flow records per second, 64 latency probe pairs.

Vocabulary: fabric, spine, leaf, ECMP, BGP, peer, prefix, flap, flow record, sampled, top-N talkers, microburst, ECN-mark, PFC pause, oversubscription, hotspot.

2. Schema and per-table retention

Six tables (see influxdb/schema.md for the detailed reference).

Tables are created explicitly at init time via the configure API (POST /api/v3/configure/table or the influxdb3 create table CLI), not via implicit creation from the first write. This means:

Caches and triggers can reference tables immediately at init time without a sentinel-row workaround.
Schemas (tag set, field types, retention) are declared up front rather than inferred.
LVC and DVC reads don't have to filter out a __init sentinel row.

fabric_health is the only table with a retention period (24 hours), demonstrating per-table retention. Other tables have no retention in the demo. In production, typical retention values would be:

interface_counters, bgp_sessions, latency_probes: 7-30 days
flow_records: 30-90 days (regulatory)
fabric_health, anomalies: 365 days (operational history)

3. Multi-node compose

Five InfluxDB 3 Enterprise nodes plus a one-shot token-bootstrap and init container, plus simulator/UI/scenarios. All five InfluxDB nodes mount the same influxdb-data named volume at /var/lib/influxdb3. Sharing the disk is what makes the cluster a cluster — every node sees the same object store and catalog, so writes from one ingest node are immediately visible from the query and process nodes, and the catalog (databases, tables, caches, triggers) stays consistent across all nodes without explicit coordination.

Node	Mode	Purpose
`nt-ingest-1`, `nt-ingest-2`	`ingest`	Accept writes; simulator round-robins per batch
`nt-query`	`query`	Serves UI partials, browser direct fetches, CLI; hosts request plugins
`nt-compact`	`compact`	Background compaction only
`nt-process`	`process,query`	Hosts schedule plugins; queries locally, writes back via httpx through an ingest node

The process node uses the process,query mode combo so plugin code can call influxdb3_local.query() against the local engine without HTTP-hopping to another node for reads. (Setting --plugin-dir implicitly adds process mode; explicitly setting --mode query keeps the query engine available.)

4. Plugin write-back path

This is a new convention introduced by this repo (now codified in the meta repo's CONVENTIONS.md).

A schedule plugin running on a process-only node has no obvious local ingest target — the engine doesn't accept writes locally on a non-ingest node. The plugin module-level code therefore loads the admin token once at import time (from the shared volume) and uses httpx to POST line protocol back through an ingest node's /api/v3/write_lp endpoint. The shared plugins/_writeback.py module factors this out: round-robin over the configured ingest URLs, one fallback hop on connection error.

Configuration via env vars on the process node (set in docker-compose.yml):

NT_INGEST_URLS=http://nt-ingest-1:8181,http://nt-ingest-2:8181
NT_DB=nt
NT_TOKEN_FILE=/var/lib/influxdb3/.nt-operator-token

LineBuilder is not used by the schedule plugins in this repo — the cross-node write-back replaces it.

5. Three patterns for feeding a UI panel

This repo demonstrates all three ways to get data into the dashboard, side-by-side, each with its own latency badge:

Pattern	Where the call goes	Used for
SQL via FastAPI (Python proxy)	browser → `nt-ui:8080/partials/...` → `nt-query:8181/api/v3/query_sql`	Banner, KPIs, throughput chart, anomalies
SQL from browser (DVC TVF)	browser → `nt-query:8181/api/v3/query_sql` directly	Source-IP typeahead. Sub-ms badge teaches DVC speed.
Request plugin from browser (Processing Engine)	browser → `nt-query:8181/api/v3/engine/<name>` directly	Top-N talkers, source-IP detail. Composite payloads.

When to pick which: SQL through FastAPI when the response is HTML fragments (HTMX swaps); SQL direct from browser when the cache speed is the headline (typeahead); request plugin when the response is a composite shape that joins multiple queries' worth of data.

6. Processing Engine triggers

Name	Type	Spec	Where it runs	Effect
`fabric_health`	Schedule	`every:5s`	process	Writes one row per layer to `fabric_health`
`anomaly_detector`	Schedule	`every:5s`	process	Detects and writes anomalies to `anomalies`
`top_talkers`	Request	`request:top_talkers`	query	Top src_ip aggregates
`src_ip_detail`	Request	`request:src_ip_detail`	query	Composite drill-down for one IP

The repo uses every:5s exclusively for schedule triggers — short, regular intervals don't need cron's time-of-day alignment, and every: is more readable.

7. Enterprise features used

Feature	Where
Multi-node ingest	2 ingest nodes; simulator round-robins
Multi-node split (ingest/query/compact/process)	5-node compose
Last Value Cache	`bgp_session_last`; powers banner BGP up-count
Distinct Value Cache	`src_ip_distinct`; powers typeahead with sub-ms badge
Per-table retention	`fabric_health` 24h. Exclusive to this repo in the portfolio.
Schedule trigger via `every:` syntax	Exclusive to this repo in the portfolio.
Schedule plugin with cross-node write-back	Both schedule plugins via `_writeback.py`. New convention.
Request trigger	top_talkers + src_ip_detail on query node
Custom UI	Three patterns side-by-side

8. Token bootstrap (cluster-wide)

A single token-bootstrap compose service generates one offline admin token at first boot, written to the shared volume. All five InfluxDB nodes start with --admin-token-file pointing at the same path; the simulator, UI, init, and process node read the same token from the same volume. License validation also happens once per cluster.

9. Plugin conventions and gotchas

See CONVENTIONS.md in the meta repo. Highlights specific to this repo:

LineBuilder is INJECTED — not used by this repo's schedule plugins (they use httpx).
6-field cron OR every: interval — this repo uses every:5s.
LVC reads via last_cache(table, cache_name) TVF.
DVC reads via distinct_cache(table, cache_name) TVF.
Multiple unaliased COUNT(*) scalar subqueries don't compose under DataFusion.
date_bin() returns ns-integer strings on the wire.
Browser-facing endpoints need INFLUX_PUBLIC_URL.

10. Security notes

Demo simplifications, called out for production users:

One admin token, shared by all services. Production should issue scoped tokens per service (read-only for UI, write-only for simulator, scoped for plugin write-back).
The browser sees the admin token (passed in template context for the direct-fetch panels). Production should proxy through the UI backend or use a token-exchange flow.
No TLS in compose. Production needs TLS between nodes and to clients.

11. Scaling to production

More ingest: add ingest-3, ingest-4, etc. Simulator's round-robin scales without code change. Production would put them behind a load balancer.
Multi-query: add a second query node for read scaling. Both serve the same SQL endpoints; UI hits whichever responds first or load-balanced.
Object store: swap file for S3/GCS/Azure. No code changes; one env var per node.
Retention: extend per-table retention to all tables per the production guidance in §2.
K8s: the compose service shape maps 1:1 to a Helm chart per node-role. Not shipped here per portfolio policy.

12. Extending the cluster

To add a new schedule plugin:

Create plugins/schedule_<name>.py following the existing pattern; import from _writeback import write_lines.
Add a trigger registration in init.sh's ensure_triggers().
Add a unit test under tests/test_plugins/.
make down && make up — init.sh registers the new trigger on next boot.

To add a new request plugin:

Create plugins/request_<name>.py.
Add a trigger registration with --trigger-spec request:<name>.
Add a unit test.
The plugin is reachable at /api/v3/engine/<name> after restart.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

Table of contents

1. Domain model

2. Schema and per-table retention

3. Multi-node compose

4. Plugin write-back path

5. Three patterns for feeding a UI panel

6. Processing Engine triggers

7. Enterprise features used

8. Token bootstrap (cluster-wide)

9. Plugin conventions and gotchas

10. Security notes

11. Scaling to production

12. Extending the cluster

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture

Table of contents

1. Domain model

2. Schema and per-table retention

3. Multi-node compose

4. Plugin write-back path

5. Three patterns for feeding a UI panel

6. Processing Engine triggers

7. Enterprise features used

8. Token bootstrap (cluster-wide)

9. Plugin conventions and gotchas

10. Security notes

11. Scaling to production

12. Extending the cluster