Privacy-first APM and performance diagnostics for Node.js — zero sidecar, zero raw data exported.
Minimum Node 14.18 as a compiled package · Node 22.6 for source/dev mode
Named after Argus Panoptes, the hundred-eyed watchman of Greek mythology. A lightweight agent that embeds directly into your application — silently tracking runtime behaviour, isolating bottlenecks, and mathematically sanitizing all context before exporting OpenTelemetry (OTLP) telemetry to your observability stack.
- Why This Exists
- Quick Start
- Privacy Guarantees
- Requirements
- Build from Source
- Demo App
- Profile API (recommended)
- Builder API (fine-grained)
- Instance Methods
- Events Reference
- Environment Variables
- Production Safety Reference
- Architecture Layers
- Project Structure
- Low-Level API
- Self-Host Your OTLP Endpoint
- Roadmap
- License
Standard APM products either require heavy agents, compile steps, or sacrifice data privacy by shipping raw query values and log payloads to the cloud. This agent takes a different position:
- 100% in-process — no sidecar, no daemon, no separate process
- AST-first privacy — SQL/NoSQL query values are shredded at the AST layer before they ever touch a metric
- Entropy-checked logs — Shannon entropy scanning strips JWT tokens, API keys, and any other high-entropy string from
consolepayloads automatically - Zero prototype pollution — all DB interception goes through
node:diagnostics_channel, the official Node.js observability primitive
npm install argus-apmimport { ArgusAgent } from 'argus-apm';
const agent = await ArgusAgent.createProfile({
environment: 'dev', // 'dev' prints all events to console automatically; silence with 'prod' or ARGUS_DEBUG=false
// appType defaults to 'auto' — detects 'web', 'db', 'worker' from your package.json
// set explicitly if auto-detection misses something: appType: ['web', 'db', 'worker']
}).start();
// SIGTERM / SIGINT → flush telemetry → process.exit is wired automaticallyNote
Zero-overhead kill-switch — set ARGUS_ENABLED=false (or 0) in any environment and the agent skips all initialisation with no CPU cost. Useful for gradual rollouts, incident response, or staging overrides without a code deploy.
- Query structure (SQL/NoSQL operation type, tables, columns, clauses)
- HTTP method, URL path (no query-string), status code, duration
- Event loop lag duration (ms)
- Heap growth (bytes)
- File path + operation type (no file contents)
- Log level + message (after entropy scrubbing)
| Data Class | Mechanism |
|---|---|
| SQL / NoSQL bound values | AST-level replacement — values are replaced before the string is ever stored |
| High-entropy strings (JWTs, API keys, tokens) | Shannon entropy check (default threshold: 4.0 bits/char) |
| PII in log messages | Entropy scrubbing on all console.* payloads |
| Raw file contents | Only path and operation are recorded |
| Heap object values | Only growth delta in bytes is recorded |
Telemetry is exported over mTLS (Mutual TLS) — both client and server certificates are verified. No telemetry is sent without explicit .withExporter(config) configuration.
The agent has two distinct usage modes with different Node.js requirements:
| Usage Mode | Min Node.js | When to use |
|---|---|---|
| Compiled npm package (recommended for most users) | ≥ 14.18.0 | You install the built package in your project via npm |
| Source / dev mode (this repo, contributors) | ≥ 22.6.0 | You run .ts files directly with --experimental-strip-types |
Important
Most users should use the compiled package and only need Node ≥ 14.18.0.
The 22.6.0 requirement only applies to running the TypeScript source files directly (e.g. contributors, or the pnpm test / pnpm start scripts in this repo).
Why 14.18.0 as the compiled minimum?
node:diagnostics_channel has been present since Node 14.0.0 (experimental) and became stable in Node 18.7.0. The API surface the agent uses (.channel(), .subscribe(), .publish(), .unsubscribe()) has not changed between the two versions, so the compiled package works on any Node ≥ 14.18.0 with two caveats:
| Feature | Minimum Node | Behaviour on older versions |
|---|---|---|
| DB query tracing (all 17 drivers) | 14.18.0 | Full support — we control both publisher and subscriber |
| HTTP outbound tracing | 18.0.0 | Automatic via diagnostics_channel; on Node 14–17 the agent falls back to monkey-patching http.request / https.request automatically |
Module load timing (slow-require) |
20.0.0 | Silent no-op on Node < 20 (channels absent) |
| Stream leak auto-detection | 22.0.0 | Falls back to manual track() calls on Node < 22 |
| Worker-threads pool monitoring | 22.0.0 | No auto-detection on Node < 22 |
Everything else (node:perf_hooks, node:v8, node:inspector, node:fs/promises) has been available since Node 12+. Once this package is compiled to JavaScript, --experimental-strip-types is irrelevant — the consumer runs plain .js.
This package ships a dual build: ESM and CommonJS. Node.js picks the right format automatically via the exports field — no config needed on your side.
// ✅ ESM project (type:module or .mjs)
import { ArgusAgent } from 'argus-apm';
// ✅ CommonJS project — require() works directly
const { ArgusAgent } = require('argus-apm');
// ✅ CommonJS project — dynamic import also works
const { ArgusAgent } = await import('argus-apm');git clone https://github.com/sharon77242/Argus.git
pnpm install
# Run all 618 tests (uses --experimental-strip-types, requires Node 22.6+)
pnpm test
# Build both ESM and CJS outputs
pnpm build
# └─ build:esm → tsc -p tsconfig.build.json → dist/esm/**/*.js + .d.ts
# └─ build:cjs → tsc -p tsconfig.cjs.json → dist/cjs/**/*.cjs + .d.cts
# (post-build script renames .js → .cjs, .d.ts → .d.cts)
# Build only one format if needed
pnpm build:esm
pnpm build:cjsThe published dist/ directory contains:
dist/
esm/ ← consumed by import / ESM bundlers
index.js
index.d.ts
...
cjs/ ← consumed by require() / CommonJS bundlers
index.cjs
index.d.cts
...
quotes-demo-app/ is a small Express + PostgreSQL API that runs the agent in dev mode and streams every monitoring event to the terminal in colour. Use it to see the agent in action before wiring it into your own project.
# Quickest path — fully containerised, no local Node.js required
docker compose -f docker-compose.demo.yml up --buildOr run Node.js natively against a Dockerised Postgres:
cd packages/agent && pnpm build && cd ../..
cd quotes-demo-app && docker compose -f docker-compose-pg-only.yml up -d && npm install
node simulate.js # scripted traffic sequence — watch the agent fire in real timeThe scripted traffic sequence exercises every monitoring feature, including the Phase R cross-signal rules:
| Route | Phase R rule triggered | Severity |
|---|---|---|
GET /debug/sync-read (×3, with createMiddleware) |
sync-in-hot-path — sync FS inside live request |
critical |
GET /debug/correlated-slow |
correlated-slow-endpoint — N+1 + slow HTTP on same traceId |
critical |
GET /debug/n-plus-one-in-txn |
n-plus-one-in-transaction — N+1 inside BEGIN/COMMIT |
critical |
See quotes-demo-app/README.md for the full setup guide, annotated terminal output, and curl examples.
createProfile returns a pre-configured builder instance wired for your environment and app type. Call .start() to initialize all subsystems.
const agent = await ArgusAgent.createProfile({
environment: 'prod', // 'dev' | 'test' | 'prod'
appType: 'auto', // auto-detects 'web', 'db', 'worker' from package.json; or pass an explicit array
enabled: true, // overridden by ARGUS_ENABLED env-var
workspaceDir: process.cwd(), // dev/test only — enables StaticScanner, AuditScanner, SourceMaps
}).start();environment |
Modules Enabled | Optimization Target |
|---|---|---|
prod |
CrashGuard, LogTracing, GracefulShutdown | Stability — minimal overhead, high safety |
dev |
prod + FsTracing + StaticScanner, AuditScanner, SourceMaps (when workspaceDir set) |
Forensics — deep blocking & security analysis |
test |
prod + FsTracing + StaticScanner, AuditScanner, SourceMaps (when workspaceDir set) |
Forensics — same as dev |
appType |
Modules Enabled | Optimization Target |
|---|---|---|
'web' |
HttpTracing, ResourceLeakMonitor, Auto-Patching | Latency — request/response & socket tracking |
'db' |
QueryAnalysis, SlowQueryMonitor, ResourceLeakMonitor, Auto-Patching | Data Access — query patterns & connection safety |
'worker' |
RuntimeMonitor (CPU/Mem), GcMonitor, ResourceLeakMonitor, Auto-Patching, JobTracing, MessagingTracing | Throughput — long-running safety, loop health & queue visibility |
['web','db'] |
Union of web + db |
Hybrid — full HTTP + query coverage |
['web','db','worker'] |
All modules | Full-Stack — maximum observability |
Each .with*() call is idempotent — combining types never double-registers a module.
// Express API + background job runner
ArgusAgent.createProfile({ appType: ['web', 'worker'] });
// Worker that queries databases directly
ArgusAgent.createProfile({ appType: ['db', 'worker'] });
// Monolith — full coverage
ArgusAgent.createProfile({ appType: ['web', 'db', 'worker'] });Leave appType unset (or set it to 'auto') and the agent will scan your package.json dependencies to infer the correct profile:
const agent = await ArgusAgent.createProfile({
environment: 'prod',
// appType: 'auto' is the default
}).start();
// agent.on('info', msg => console.log(msg)) ← fires in dev/test if nothing is detectedYou can also call the detector standalone:
const result = ArgusAgent.detectAppTypes('./my-service');
// { types: ['web', 'db'], matches: { web: ['express', 'cors'], db: ['pg', 'ioredis'], worker: [] } }Recognized fingerprints (non-exhaustive):
| Type | Packages |
|---|---|
web |
express, fastify, koa, @hapi/hapi, @nestjs/core, next, nuxt, socket.io, ws, apollo-server, … |
db |
pg, mysql2, mongodb, mongoose, sequelize, typeorm, @prisma/client, knex, redis, ioredis, mssql, … |
worker |
bull, bullmq, agenda, bee-queue, pg-boss, node-cron, amqplib, kafkajs, piscina, … |
Note
If no packages match and environment is dev or test, the agent emits an 'info' event advising you to set appType explicitly. In prod, it starts silently with only the environment-level modules (CrashGuard, LogTracing, GracefulShutdown).
For maximum control, compose the agent manually using the fluent builder:
import { ArgusAgent } from 'argus-apm';
import fs from 'node:fs';
const agent = await ArgusAgent.create()
.withSourceMaps('./dist') // Source-map resolution for stack traces
.withRuntimeMonitor({ eventLoopThresholdMs: 50 }) // Event loop lag + memory leak detection
.withInstrumentation({ autoPatching: true }) // 17 DB/RPC drivers via diagnostics_channel
.withHttpTracing() // Slow request & insecure HTTP detection
.withLogTracing({ scrubContext: true }) // Strip secrets from console overrides
.withFsTracing() // ⚠ DEV ONLY — sync FS blocker detection
.withCrashGuard() // uncaughtException telemetry flush
.withResourceLeakMonitor({
handleThreshold: 5000,
alertCooldownMs: 60_000, // Min ms between repeated leak alerts
})
.withGracefulShutdown({ timeoutMs: 5000 }) // SIGTERM/SIGINT → flush → process.exit
.withQueryAnalysis() // AST-based N+1 & query fix suggestions
.withSlowQueryMonitor({ defaultThresholdMs: 500 }) // Per-driver slow query log (top-5)
.withTransactionMonitor() // BEGIN/COMMIT/ROLLBACK duration tracking
.withCacheMonitor({ minHitRate: 0.6 }) // Cache hit-rate degradation detection
.withGcMonitor({ pausePctThreshold: 15 }) // GC pressure detection
.withPoolMonitor() // Connection pool exhaustion & slow-acquire
.withDnsMonitor({ slowThresholdMs: 200 }) // DNS resolution latency tracking
.withAdaptiveSampler({ burst: 20 }) // Token-bucket rate limiter under high load
.withStaticScanner(process.cwd()) // ⚠ DEV ONLY — background tsc/eslint
.withAuditScanner(process.cwd()) // ⚠ DEV ONLY — npm audit CVE scan
.withExporter({
endpointUrl: 'https://otel.example.com/v1/traces',
key: fs.readFileSync('./certs/client.key'),
cert: fs.readFileSync('./certs/client.crt'),
ca: fs.readFileSync('./certs/ca.crt'),
})
.start();
// Register pools after start (requires .withPoolMonitor())
agent.watchPool(pgPool, 'pg');
agent.watchPool(mysql2Pool, 'mysql2');Every .with*() method is optional — enable only what you need. All internal event wiring, entropy scrubbing, and p99 aggregation happens automatically.
.withSlowQueryMonitor({
defaultThresholdMs: 1000, // global fallback threshold (default: 1000)
thresholds: { pg: 500, redis: 50 }, // per-driver overrides (also configurable via env vars)
topN: 5, // top-N slowest queries retained in memory (default: 5)
})Fires 'slow-query' when a query exceeds the threshold for its driver. Access the log via agent.getSlowQueries() / agent.getSlowestQuery() / agent.clearSlowQueries().
.withTransactionMonitor({
maxOpenMs: 60_000, // evict open transactions after this duration (default: 60 000)
})Detects BEGIN/COMMIT/ROLLBACK patterns in traced queries. Fires 'transaction' with duration, query count, and whether the transaction was aborted.
.withCacheMonitor({
windowMs: 60_000, // sliding window size (default: 60 000)
minSamples: 10, // minimum samples before an event can fire (default: 10)
minHitRate: 0.5, // fire when hit rate drops below this value 0–1 (default: 0.5)
})Monitors cache hit/miss ratios for traced drivers (Redis, Memcached). Fires 'cache-degraded' when the hit rate falls below minHitRate within the window.
.withGcMonitor({
windowMs: 10_000, // sliding window for pressure calculation (default: 10 000)
pausePctThreshold: 10, // fire when GC consumes ≥ this % of the window (default: 10)
})Observes GC performance entries via node:perf_hooks. Fires 'gc-pressure' with total pause time, pause percentage, and GC cycle count.
.withPoolMonitor({
maxWaitingCount: 3, // fire 'pool-exhaustion' when this many clients wait (default: 3)
maxWaitMs: 1000, // fire 'slow-acquire' when acquiring takes longer (default: 1000)
checkIntervalMs: 5000, // poll interval for pool statistics (default: 5000)
})After calling .withPoolMonitor(), register each pool instance:
agent.watchPool(pgPool, 'pg');
agent.watchPool(mysql2Pool, 'mysql2');Compatible with any pool that exposes totalCount / idleCount / waitingCount getters and/or emits an 'acquire' event (pg.Pool, mysql2 pool, generic-pool).
.withJobTracing({
providers?: ('bullmq' | 'bull' | 'pg-boss' | 'agenda')[], // default: auto-detect
slowJobThresholdMs?: number, // default: 5000
retryStormThreshold?: number, // default: 5 retries in 5 min
})Wraps job processors so every execution runs inside an AsyncLocalStorage context — all DB queries inside a job automatically carry the job's traceId. Fires 'job' events on completion/failure and 'anomaly' events for: job-slow-query (query >50% of job time), job-retry-storm (retried above threshold in 5 min), job-stall-pattern (BullMQ/Bull only), and job-long-transaction (transaction >80% of job time).
Supported: BullMQ, Bull, pg-boss, Agenda. Auto-detected from installed packages. Production safe — context injection adds <1 ms per job.
.withMessagingTracing({
providers?: ('kafkajs' | 'amqplib')[], // default: auto-detect
slowConsumerThresholdMs?: number, // default: 1000
lagWarningMs?: number, // default: 30000
propagateTraceHeaders?: boolean, // default: true
})Injects W3C traceparent headers into every produced message and extracts them on consume, creating end-to-end distributed traces across service boundaries. Fires 'message' events and 'anomaly' events for: consumer-lag-spike, message-processing-slow, producer-batch-too-small (Kafka rapid single sends), and missing-trace-header (uninstrumented producer detected).
Supported: KafkaJS, amqplib (RabbitMQ). Auto-detected from installed packages. Production safe — header injection is synchronous string concatenation.
.withDnsMonitor({
slowThresholdMs: 100, // fire 'slow-dns' above this duration (default: 100)
})Wraps dns.lookup to track every resolution. Fires 'dns' for each lookup and 'slow-dns' for those exceeding the threshold.
.withAdaptiveSampler({
ratePerMs: 1 / 1000, // token refill rate — 1 token/sec by default
burst: 10, // max bucket depth / burst capacity (default: 10)
})Token-bucket rate limiter applied per event category ('query', 'http'). Under sustained high throughput, events beyond the bucket capacity are silently dropped, capping agent overhead without disabling monitoring.
.withJobTracing({
providers?: ['bullmq', 'bull', 'pg-boss', 'agenda'], // auto-detected from package.json if omitted
retryStormThreshold?: 5, // fire 'job-retry-storm' after this many retries in 5 min (default: 5)
slowJobThresholdMs?: 5000, // (reserved — used by cross-signal rules)
})Wraps job processors for BullMQ, Bull, pg-boss, and Agenda. Fires 'job' events with duration, attempt count, and retry flag. Automatically correlates in-job DB queries with the parent job via AsyncLocalStorage, enabling cross-signal rules like job-slow-query and job-long-transaction.
Console output (dev mode):
[ARGUS] JOB bullmq/send-email wait: 340ms process: 2,840ms ✓
[ARGUS] JOB agenda/nightly-report wait: 0ms process: 12,400ms ✓
.withMessagingTracing({
providers?: ['kafkajs', 'amqplib'], // auto-detected from package.json if omitted
slowConsumerThresholdMs?: 1000, // fire 'message-processing-slow' above this (default: 1000)
lagWarningMs?: 30_000, // fire 'consumer-lag-spike' above this (default: 30 000)
propagateTraceHeaders?: true, // inject W3C traceparent on produce (default: true)
})Patches KafkaJS producers/consumers and amqplib channels. Injects W3C traceparent headers on produce and extracts them on consume, propagating trace context across service boundaries.
Console output (dev mode):
[ARGUS] MSG kafka→produce orders 2ms 128B
[ARGUS] MSG kafka←consume orders 340ms 128B lag: 420ms
[ARGUS] MSG rabbit→produce payments.queue 1ms 256B
For drivers that don't publish to diagnostics_channel, use traceQuery:
const rows = await agent.traceQuery(
'SELECT * FROM orders WHERE id = $1',
() => db.query('SELECT * FROM orders WHERE id = $1', [42])
);After calling .start(), the agent exposes several utility methods:
agent.getSlowQueries(): SlowQueryRecord[] // top-N slowest queries, sorted slowest first
agent.getSlowestQuery(): SlowQueryRecord | undefined
agent.clearSlowQueries(): void // reset the log (useful between test cases)Requires .withSlowQueryMonitor(). Returns an empty array / undefined if not enabled.
agent.watchPool(pool: PoolLike, driver: string): thisRegister a connection pool for monitoring. Safe to call before or after .start(). Requires .withPoolMonitor().
app.use(agent.createMiddleware());Connect-compatible middleware that reads the incoming traceparent W3C header and runs the request inside a RequestContext. All queries and HTTP calls within the same async chain automatically carry the same traceId and correlationId. Compatible with Express, Fastify (express-compat), Koa-connect, and raw Node HTTP.
import { runWithContext } from 'argus-apm';
const ctx = agent.createContext('JOB', '/process-order');
runWithContext(ctx, async () => {
// all traced queries here carry ctx.traceId
await processOrder(orderId);
});const original = await agent.resolvePosition('./dist/index.js', 42, 15);
// { source: 'src/handlers/order.ts', line: 10, column: 3 }Requires .withSourceMaps().
The agent is an EventEmitter. All events are emitted on the ArgusAgent instance:
| Event | Payload | When |
|---|---|---|
'job' |
JobEvent |
Job completed, failed, retried, or stalled (BullMQ, Bull, pg-boss, Agenda) |
'message' |
MessageEvent |
Message produced or consumed (KafkaJS, amqplib) |
'anomaly' |
ProfilerEvent |
Memory leak, event loop lag, CPU spike, cross-signal compound anomaly, or job/message rule violation |
'query' |
{ sanitizedQuery, durationMs, driver?, traceId?, correlationId?, cacheHit?, suggestions? } |
DB query completed |
'slow-query' |
SlowQueryRecord |
Query exceeded the per-driver threshold |
'transaction' |
TransactionEvent |
BEGIN/COMMIT/ROLLBACK pattern completed |
'cache-degraded' |
CacheDegradedEvent |
Cache hit rate dropped below minHitRate |
'gc-pressure' |
GcPressureEvent |
GC pause % exceeded threshold in the window |
'pool-exhaustion' |
PoolExhaustionEvent |
Waiting client count exceeded maxWaitingCount |
'slow-acquire' |
SlowAcquireEvent |
Connection acquire time exceeded maxWaitMs |
'http' |
{ method, url, statusCode, durationMs, suggestions } |
HTTP request completed |
'dns' |
DnsEvent |
DNS lookup completed |
'slow-dns' |
DnsEvent |
DNS lookup exceeded slowThresholdMs |
'fs' |
{ operation, path, durationMs, suggestions } |
File system operation completed (suggestions include sync-in-hot-path when called inside a request) |
'log' |
{ level, scrubbed, durationMs, suggestions? } |
console.* call intercepted |
'crash' |
CrashEvent |
uncaughtException or unhandledRejection received |
'leak' |
ResourceLeakEvent |
Active OS handle count exceeded threshold |
'scan' |
StaticScanResult[] |
Background tsc/ESLint/connection-pool scan complete (dev/test only) |
'audit' |
AuditResult |
npm audit CVE scan complete (dev/test only) |
'info' |
string |
Advisory messages (e.g., auto-detection found nothing) |
'error' |
Error |
Non-fatal internal error (e.g., heap snapshot write failed) |
agent.on('anomaly', (event) => {
// runtime: 'memory-leak' | 'event-loop-lag' | 'cpu-spike'
// cross-signal: 'correlated-slow-endpoint' | 'pool-starvation-by-slow-query' | 'n-plus-one-in-transaction'
console.log(event.type);
console.log(event.heapSnapshotPath); // only set when a snapshot write succeeded
});
agent.on('crash', (event) => {
console.log(event.type); // 'uncaughtException' | 'unhandledRejection'
// NOTE: unhandledRejection does NOT call process.exit — your app keeps running
});
agent.on('query', (trace) => {
console.log(trace.sanitizedQuery); // bound values are NEVER here — AST-scrubbed
trace.suggestions?.forEach(s => console.log(s.rule, s.suggestedFix)); // present only with withQueryAnalysis()
});
agent.on('slow-query', (record) => {
console.log(record.sanitizedQuery, record.durationMs, record.driver);
// agent.getSlowQueries() returns the persisted top-N log at any time
});
agent.on('transaction', (event) => {
if (event.aborted) console.warn(`Rolled-back txn on ${event.driver} after ${event.durationMs}ms`);
});
agent.on('gc-pressure', (event) => {
console.warn(`GC consuming ${event.pausePct.toFixed(1)}% of CPU time`);
});
agent.on('pool-exhaustion', (event) => {
console.warn(`${event.driver} pool: ${event.waitingCount} clients queued`);
});Note
ArgusAgent calls setMaxListeners(0) internally — you can attach as many listeners as needed without triggering Node's memory leak warning.
Argus is designed to be fully extensible. While the built-in console logger formats events beautifully during development, you can easily pipe critical alerts to Slack, PagerDuty, or your own custom logging stack:
// 1. Send memory leaks to a Slack webhook
agent.on('leak', async (event) => {
await fetch(process.env.SLACK_WEBHOOK, {
method: 'POST',
body: JSON.stringify({ text: `🚨 Memory leak detected: ${event.handlesCount} active handles` })
});
});
// 2. Send slow queries to your own logging infrastructure
agent.on('slow-query', (record) => {
myCustomLogger.warn('Slow DB Query', {
query: record.sanitizedQuery,
ms: record.durationMs
});
});Per-Event Opt-Out: If you are using the default console logger (e.g., in the dev profile), the agent detects your custom .on() listeners and gracefully steps aside. It will skip printing its own default log for that specific event, avoiding duplicate noise while continuing to format everything else perfectly.
All thresholds can be overridden without code changes, making the agent CI/CD and container-friendly:
| Variable | Default | Controls |
|---|---|---|
ARGUS_ENABLED |
true |
Set to false or 0 for a zero-CPU-overhead global kill-switch |
ARGUS_DEBUG |
true in dev profile, false otherwise |
Enables the built-in console logger for all agent events. Explicitly set to false or 0 to suppress even in dev mode; set to true or 1 to force-enable in prod |
ARGUS_EVENT_LOOP_THRESHOLD_MS |
50 |
Minimum lag (ms) before an event-loop anomaly fires |
ARGUS_MEMORY_GROWTH_BYTES |
10485760 (10 MB) |
Minimum heap growth before a memory-leak anomaly fires |
ARGUS_CPU_PROFILE_COOLDOWN_MS |
60000 |
Minimum ms between back-to-back CPU profiles |
ARGUS_MONITOR_CHECK_INTERVAL_MS |
1000 |
How often thresholds are polled |
ARGUS_CPU_PROFILE_DURATION_MS |
500 |
Duration of each CPU profile capture |
ARGUS_HEAP_USAGE_PCT_THRESHOLD |
90 |
Heap usage % of heapTotal before a memory anomaly fires |
ARGUS_SLOW_QUERY_THRESHOLD_MS |
1000 |
Global slow query threshold used when no per-driver default applies |
ARGUS_SLOW_QUERY_THRESHOLD_<DRIVER> |
(per-driver) | Per-driver threshold override. Key is the driver name uppercased with non-alphanumeric runs replaced by _ — e.g. ARGUS_SLOW_QUERY_THRESHOLD_PG=500, ARGUS_SLOW_QUERY_THRESHOLD_REDIS=50, ARGUS_SLOW_QUERY_THRESHOLD_ELASTIC_ELASTICSEARCH=300 |
Tip
Malformed values (non-numeric, 0, negative) are silently ignored and replaced with the default. This means misconfigured infrastructure cannot accidentally disable monitoring.
| Method | Prod Safe? | Resource Impact | Description |
|---|---|---|---|
ArgusAgent.createProfile(config) |
✅ Yes | N/A | Pre-configured instance from env/app presets |
ArgusAgent.create() |
✅ Yes | N/A | Unconfigured fluent builder |
.withSourceMaps(dir?) |
✅ Yes | Very Low | Source-map resolution for minified stack traces |
.withRuntimeMonitor(opts?) |
✅ Yes | Low | Event loop lag + memory leak detection |
.withCrashGuard() |
✅ Yes | Very Low | Intercepts uncaughtException; emits event for unhandledRejection |
.withResourceLeakMonitor(opts?) |
✅ Yes | Low | Tracks OS handles; rate-limited by alertCooldownMs |
.withGracefulShutdown(opts?) |
✅ Yes | Very Low | Registers SIGTERM/SIGINT; awaits agent.stop() before process.exit |
.withInstrumentation(opts?) |
✅ Yes | Low | DB/IO tracing via diagnostics_channel (17 drivers) |
.withHttpTracing() |
✅ Yes | Low | HTTP request inspection & slow-request detection |
.withLogTracing(opts?) |
✅ Yes | Low | console.* override with entropy-scrubbed payloads |
.withFsTracing() |
❌ No | High | Patches fs. Detects *Sync blockers; escalates to sync-in-hot-path (critical) when called inside a live request. DEV ONLY. |
.withQueryAnalysis() |
✅ Yes | Medium (AST) | N+1 detection + query fix suggestions |
.withSlowQueryMonitor(opts?) |
✅ Yes | Very Low | Per-driver slow query detection + top-N log |
.withTransactionMonitor(opts?) |
✅ Yes | Very Low | BEGIN/COMMIT/ROLLBACK duration tracking |
.withCacheMonitor(opts?) |
✅ Yes | Very Low | Cache hit-rate degradation detection |
.withGcMonitor(opts?) |
✅ Yes | Very Low | GC pause pressure detection via perf_hooks |
.withPoolMonitor(opts?) |
✅ Yes | Low | Connection pool exhaustion & slow-acquire |
.withJobTracing(opts?) |
✅ Yes | Very Low | Job lifecycle events + slow-query/retry-storm correlation across BullMQ, Bull, pg-boss, Agenda |
.withMessagingTracing(opts?) |
✅ Yes | Very Low | W3C traceparent propagation + consumer lag/batch alerts for KafkaJS and amqplib |
.withDnsMonitor(opts?) |
✅ Yes | Low | DNS lookup latency tracking |
.withAdaptiveSampler(opts?) |
✅ Yes | Very Low | Token-bucket rate limiter under high load |
.withJobTracing(opts?) |
✅ Yes | Very Low | Job queue tracing for BullMQ, Bull, pg-boss, Agenda |
.withMessagingTracing(opts?) |
✅ Yes | Very Low | Message broker tracing for KafkaJS, amqplib (RabbitMQ) |
.withStaticScanner(dir) |
❌ No | High | Background tsc/ESLint scan + TypeScript AST walk for connection-pool misuse (missing-connection-pool). DEV ONLY. |
.withAuditScanner(dir) |
❌ No | High | Spawns npm audit. DEV/startup ONLY. |
.withExporter(config) |
✅ Yes | Very Low | OTLP JSON export over mTLS |
.withAggregatorWindow(ms) |
✅ Yes | None | Override p99 sliding window (default: 60 s) |
.withEntropyThreshold(n) |
✅ Yes | None | Override Shannon entropy threshold (default: 4.0) |
.start() |
— | — | Async — initialize all subsystems and begin monitoring |
.stop() |
— | — | Async — tear down and flush remaining telemetry |
┌──────────────────────────────────────────────────────────────────────────┐
│ ArgusAgent │ ← Fluent builder / event bus
├─────────────────┬──────────────────────────────┬────────────────────────┤
│ Profiling │ Instrumentation │ Analysis │
│ ─────────────── │ ────────────────────────── │ ────────────────── │
│ RuntimeMonitor │ InstrumentationEngine │ QueryAnalyzer │
│ CrashGuard │ 17 DB/RPC Drivers │ SlowQueryMonitor │
│ ResourceLeakMon │ HttpInstrumentation │ TransactionMonitor │
│ GcMonitor │ FsInstrumentation │ CacheMonitor │
│ PoolMonitor │ LoggerInstrumentation │ CircuitBreaker │
│ SourceMapResolver DnsMonitor │ StaticScanner │
│ WorkerThreadsMon│ AdaptiveSampler │ AuditScanner │
│ SlowRequireDet. │ Job Queue Drivers │ ExplainAnalyzer │
│ StreamLeakDet. │ ├─ BullMQ ├─ Bull │ JobAnalyzer │
│ │ ├─ pg-boss ├─ Agenda │ MessageAnalyzer │
│ │ Messaging Drivers │ │
│ │ ├─ KafkaJS │ │
│ │ └─ amqplib (RabbitMQ) │ │
├─────────────────┴──────────────────────────────┴────────────────────────┤
│ AstSanitizer + EntropyChecker │ ← Privacy firewall (always on)
├──────────────────────────────────────────────────────────────────────────┤
│ MetricsAggregator (p99 sliding window) │
├──────────────────────────────────────────────────────────────────────────┤
│ OTLPExporter (mTLS) / OTLPCompatibleExporter (API key) │
└──────────────────────────────────────────────────────────────────────────┘
Full source tree (contributors)
packages/agent/
src/
index.ts → Public API barrel export
diagnostic-agent.ts → Fluent builder, public API surface & lifecycle
internal/
profile-factory.ts → buildAgentProfile() — preset resolution for createProfile()
query-handler.ts → createQueryHandler() — per-query sampling/analysis/slow-log closure
console-logger.ts → installConsoleLogger() — ARGUS_DEBUG event formatting
profiling/
app-type-detector.ts → package.json fingerprint scanner
runtime-monitor.ts → Event loop lag & heap snapshot profiling
crash-guard.ts → uncaughtException / unhandledRejection handler
graceful-shutdown.ts → SIGTERM/SIGINT flush with configurable timeout
resource-leak-monitor.ts → OS handle / socket leak detection
slow-require-detector.ts → CJS module load-time tracking (Node 20+)
stream-leak-detector.ts → Readable/Writable stream leak detection
worker-threads-monitor.ts → Worker pool depth & anomaly tracking (Node 22+)
source-map-resolver.ts → .js.map scanning & lazy resolution
gc-monitor.ts → GC pressure detection via perf_hooks
pool-monitor.ts → Connection pool exhaustion & slow-acquire
instrumentation/
safe-channel.ts → Backward-compatible diagnostics_channel loader (Node 14.18+)
engine.ts → Core InstrumentationEngine
correlation.ts → AsyncLocalStorage request context & correlationId
http.ts → HTTP tracing (channel path Node 18+; monkey-patch Node 14–17)
fs.ts → File system operation tracing
logger.ts → console.* override with entropy scrubbing
dns-monitor.ts → DNS lookup latency tracking
adaptive-sampler.ts → Token-bucket adaptive sampler
drivers/
index.ts → Driver registry (apply / remove patches)
patch-utils.ts → Shared wrapping utilities & PATCHED_SYMBOL
pg.ts → PostgreSQL
mysql.ts → MySQL / Aurora (mysql2)
mongodb.ts → MongoDB
mssql.ts → MSSQL / tedious
sqlite.ts → better-sqlite3
prisma.ts → @prisma/client
redis.ts → ioredis + node-redis
dynamodb.ts → @aws-sdk/client-dynamodb
firestore.ts → @google-cloud/firestore
cassandra.ts → cassandra-driver
elasticsearch.ts → @elastic/elasticsearch
bigquery.ts → @google-cloud/bigquery
neo4j.ts → neo4j-driver
clickhouse.ts → @clickhouse/client
grpc.ts → @grpc/grpc-js (unary + streaming RPCs)
jobs/
types.ts → JobEvent, JobSuggestion, JobTracingOptions types
base-job-driver.ts → wrapJobProcessor() — shared processor wrapping + context
bullmq.ts → BullMQ Worker / Queue patching
bull.ts → Bull (legacy) queue.process() patching
pg-boss.ts → pg-boss boss.work() / boss.send() patching
agenda.ts → Agenda define() / schedule() patching
index.ts → Job drivers barrel export
messaging/
types.ts → MessageEvent, MessageSuggestion, MessagingTracingOptions types
trace-headers.ts → W3C traceparent inject / extract utilities
kafkajs.ts → KafkaJS producer.send() / consumer.run() patching
amqplib.ts → amqplib connect() / channel patching
index.ts → Messaging drivers barrel export
licensing/
public-key.ts → Bundled ECDSA public keys (keyed by kid)
license-validator.ts → JWT ES256 signature + expiry validation
clock-guard.ts → Monotonic clock-rollback detection (enterprise)
expiry-signal.ts → Writes expiry notice to cwd / tmpdir / homedir
sanitization/
ast-sanitizer.ts → SQL AST scrubbing (node-sql-parser)
entropy-checker.ts → Shannon entropy secret detection
analysis/
types.ts → Shared FixSuggestion & analysis types
query-analyzer.ts → AST-based query fix suggestions + N+1 detection
slow-query-monitor.ts → Per-driver slow query detection + top-N log
transaction-monitor.ts → BEGIN/COMMIT/ROLLBACK duration tracking
cache-monitor.ts → Cache hit-rate degradation detection
explain-analyzer.ts → EXPLAIN plan analysis for supported drivers
fs-analyzer.ts → Sync FS blocker & path traversal detection
http-analyzer.ts → Insecure URL & slow request detection
log-analyzer.ts → Log storm & payload size detection
circuit-breaker-detector.ts → Sustained error-rate detection across drivers
static-scanner.ts → Background tsc / ESLint / query-in-loop static analysis
audit-scanner.ts → npm audit CVE scanning
source-analyzer.ts → R.4: DB calls inside loops; R.8: unhandled DB calls
migration-scanner.ts → R.5: SQL & Prisma migration parser → index map
index-hint-analyzer.ts → R.5: Runtime high-frequency missing-index detection
route-tracker.ts → R.6: Endpoint-never-called detection after warmup
column-usage-analyzer.ts → R.7: SELECT * hot-path → specific column suggestions
job-analyzer.ts → JobAnalyzer — 5 rules: slow-query, retry-storm, stall-pattern, no-timeout, long-transaction
message-analyzer.ts → MessageAnalyzer — 4 rules: lag-spike, processing-slow, batch-too-small, missing-trace-header
export/
aggregator.ts → p99 sliding window metric aggregation
exporter.ts → OTLP JSON formatter + mTLS transport
otlp-compatible-exporter.ts → Simplified OTLP exporter (API key, no mTLS)
tests/ → Mirrors src/ structure (821 tests, 146 suites)
All subsystems are exported individually for advanced composition:
Every subsystem is individually exported — TypeScript autocomplete surfaces the full list. A few useful standalone examples:
// Scrub a string manually
import { EntropyChecker } from 'argus-apm';
const sanitized = new EntropyChecker().scrub('Bearer eyJhbGc...');
// → 'Bearer [REDACTED]'
// Detect connection-pool circuit-break conditions without the full agent
import { CircuitBreakerDetector } from 'argus-apm';
const suggestions = new CircuitBreakerDetector().analyze(recentQueryEvents);
// Ship metrics to Honeycomb / New Relic / Datadog without mTLS
import { OTLPCompatibleExporter } from 'argus-apm';
const exporter = new OTLPCompatibleExporter({
endpointUrl: 'https://api.honeycomb.io/v1/metrics',
apiKey: process.env.HONEYCOMB_API_KEY,
serviceName: 'my-service',
});
await exporter.export(aggregatorEvents);
// Manual async-context propagation (background jobs, queue workers)
import { runWithContext } from 'argus-apm';
runWithContext(agent.createContext('WORKER', '/process-job'), async () => {
// all traced queries here carry the same traceId
await processJob();
});Source mode (contributors): replace
'argus-apm'with'./packages/agent/src/index.ts'and run withnode --experimental-strip-typeson Node 22.6+.
Important
OTLP export requires a paid Self-Hosted Pro or Enterprise license.
In free mode the agent emits events locally via EventEmitter only — .withExporter() has no effect without a valid ARGUS_LICENSE_KEY.
To get notified when Self-Hosted Pro licenses go on sale: open this GitHub issue or email sharon10vp614@gmail.com.
The Self-Hosted Pro tier exports standard OTLP JSON directly to your own collector — no data ever leaves your infrastructure. Any OTLP-compatible collector works. Below is the quickest local setup using Jaeger's all-in-one image.
# Set your license key (Self-Hosted Pro or Enterprise)
export ARGUS_LICENSE_KEY="your-license-key"# docker-compose.jaeger.yml — save this alongside your project
services:
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "4318:4318" # OTLP HTTP receiver
- "16686:16686" # Jaeger UI
environment:
- COLLECTOR_OTLP_ENABLED=truedocker compose -f docker-compose.jaeger.yml up -dThen point the agent at it:
const agent = await ArgusAgent.createProfile({ environment: 'dev' })
.withExporter({ endpointUrl: 'http://localhost:4318/v1/traces' }) // no TLS needed locally
.start();Open http://localhost:16686 to browse traces.
| Destination | OTLP endpoint |
|---|---|
| Grafana Alloy | http://localhost:4318/v1/traces (default) |
| OpenTelemetry Collector | configure otlp receiver on port 4318 |
| Datadog, New Relic, Honeycomb | use their OTLP ingest URLs with an API key header |
Note
The key/cert/ca fields in withExporter are optional — omit them for plaintext local endpoints. mTLS is only needed for production remote collectors.
For cloud SaaS destinations (Honeycomb, New Relic, Datadog) that authenticate via an API key rather than mTLS, use OTLPCompatibleExporter from the Low-Level API instead — no license required.
- SaaS Dashboard — hosted dashboard with 30-day query history, AI-powered fix suggestions, and cross-service correlation
- Self-Hosted Platform — Docker image with embedded dashboard, ClickHouse storage, and AI suggestions (BYOK) for teams with strict data residency requirements
- IDE plugin — surface suggestions directly in VS Code as you write queries
→ Watch the repo or email sharon10vp614@gmail.com to be notified of major releases.
Using Argus in production? Open an issue or email sharon10vp614@gmail.com to be listed here.
If Argus saves you debugging time, consider sponsoring the project ❤
Apache-2.0
