Skip to content

Latest commit

 

History

History
236 lines (204 loc) · 14.9 KB

File metadata and controls

236 lines (204 loc) · 14.9 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.



0.4.0 — 2026-05-02

Added

  • gRPC instrumentation@grpc/grpc-js driver patch (drivers/grpc.ts). Auto-patches all four call types on Client.prototype:
    • Unary (makeUnaryRequest) and client-streaming (makeClientStreamRequest) — wraps the callback to capture wall-clock duration and forward errors to diagnostics_channel.
    • Server-streaming (makeServerStreamRequest) and bidi-streaming (makeBidiStreamRequest) — listens for the stream's status (completion) and error events; a published flag prevents double-publishing when both events fire for the same call. The RPC method path (e.g. /package.Service/Method) is used as the query key, making gRPC calls visible in slow-query logs, the cache monitor, and OTLP exports. 17 drivers total (up from 16). Wired into applyDriverPatches() under a new // RPC section; silently skipped when @grpc/grpc-js is not installed.
  • CrossSignalRuleEngine — extracted class (R.3–R.7 rules) now exported directly; independently testable.
  • CROSS_SIGNAL_THRESHOLDS — exported const with all cross-signal rule threshold defaults.
  • WindowedMonitorBase — exported abstract class; GcMonitor and CacheMonitor now extend it.

Changed

  • Architecture: CrossSignalRuleEngine extracted from ArgusAgent; GcMonitor / CacheMonitor refactored onto WindowedMonitorBase.
  • Safety: silent .catch() blocks replaced with this.emit("warn", err); routeTracker non-null assertion replaced with defensive check; global-regex lastIndex reset added in IndexHintAnalyzer, MigrationScanner, and ArgusAgent.
  • Tests: 627 → 644 passing — scenario tests (worker-only, cache-degradation, crash-recovery), OTLP edge cases (circular payload, ECONNREFUSED, timeout), licensing boundary tests (JWT expiry ±1 s, clock-skew at 60 s boundary), CrossSignalRuleEngine unit tests.

0.3.2 — 2026-04-29

Changed

  • Package metadata: added funding field pointing to GitHub Sponsors
  • Publish process: README is now copied from the repo root at publish time (prepublishOnly) and cleaned up afterwards (postpublish) — single source of truth, no drift
  • Removed src/licensing/public-key.ts from files (compiled output in dist/ is sufficient; raw source was never needed by consumers)
  • CI: restored individual named steps (Typecheck / Lint / Format check / Build / Test) for better failure visibility in GitHub Actions
  • README: fixed contributor npm commands → pnpm in "Building from source" section

0.3.0 — 2026-04-29

Added

  • Phase R Wave 3 — Complex codebase-intelligence rules: Two new rules that require both runtime observations and TypeScript AST access to produce targeted advice that generic APMs cannot generate.

    • hot-path-select-star (warning) — ColumnUsageAnalyzer tracks how many times a SELECT * call fires from the same sourceLine. After threshold hits (default 5), it parses the TypeScript source file, finds the variable the result is assigned to, and walks the containing function scope to collect all accessed field names. The emitted hot-path-select-star anomaly replaces the generic "use explicit columns" suggestion with e.g. "you only access id, email — replace with SELECT id, email". Supports .map() callbacks, rows[0].field element access, and const { a, b } = rows[0] destructuring. Falls back to null (and stays silent) for dynamic patterns or plain JS. Enabled via .withColumnUsageAnalysis(dir?, threshold?). Auto-wired for db app type in dev/test environments via profile-factory.
    • unhandled-db-call (info / critical) — SourceAnalyzer.scanForUnhandledDbCall() walks the TypeScript AST of every source file and flags DB calls (query, execute, findMany, find, etc.) that have no surrounding try/catch and no .catch() chain. Default severity is info. When CrashGuard feeds in a set of crashed source lines (lines that have actually thrown uncaughtException in production), any finding at a crashed location is escalated to critical with a message that mentions the crash history. Integrated into StaticScanner.scan() via the new runUnhandledDbCallScan() method. Wired into ArgusAgent startup scans; crash escalation feeds back from the CrashGuard event stream.
  • Phase R Wave 2 — Static + runtime intelligence rules: Three new rules that combine static codebase knowledge with runtime frequency data to surface issues no single monitor can detect alone.

    • query-in-loop (warning) — SourceAnalyzer walks the TypeScript AST of every source file at startup. When a known DB method call (query, execute, findMany, find, etc.) appears inside a loop (for, while, do, for…of, for…in) or an iteration callback (.map(), .forEach(), .filter(), .reduce(), …), it emits a query-in-loop suggestion with the exact file:line location. Wired into StaticScanner.scan() via the new runQueryInLoopScan() method.
    • missing-index-hint (warning) — MigrationScanner parses SQL migration files (CREATE INDEX / CREATE UNIQUE INDEX) and Prisma schema files (@@index([…])) at startup to build a table → Set<column> index map. IndexHintAnalyzer then tracks per-query execution frequency in a sliding window; when a query exceeds the threshold (default 100/min) against an un-indexed WHERE column, it emits a concrete CREATE INDEX suggestion. Only fires when migration files were found — no false positives when the index map is empty. Enabled via .withIndexHints(dir?).
    • endpoint-never-called (info) — RouteTracker accepts a list of registered routes (extracted by a regex scanner from Express/Fastify source files) and records inbound request hits via createMiddleware(). After the warmup period (default 5 min) a 'anomaly' event is emitted listing routes that never received a request. Enabled via .withRouteTracking(dir?, warmupMs?). Wired automatically in profile-factory.ts for web app type in dev/test environments.
  • Demo app — Phase R scenarios (quotes-demo-app/):

    • W3C trace context is now propagated into every request via agent.createMiddleware() — wired in app.js so all cross-signal rules can correlate DB calls with their originating HTTP request.
    • GET /debug/correlated-slow — runs 6 N+1 queries + 1 100 ms deliberate delay within one request, triggering correlated-slow-endpoint (critical).
    • GET /debug/n-plus-one-in-txnBEGIN + 6 identical-template SELECTs + COMMIT, triggering n-plus-one-in-transaction (critical).
    • GET /debug/sync-read already demonstrates sync-in-hot-path now that the middleware is active (both synchronous-fs and sync-in-hot-path fire simultaneously).
    • The anomaly event handler in diagnostic.js now prints the full suggestions array (rule name, severity, message, suggestedFix) for all Phase R compound events.
    • traffic.js updated with three new Phase R traffic scenarios.
  • Phase R Wave 1 — Cross-signal diagnostic rules: Five new rules that correlate events from multiple subsystems to produce high-signal compound anomalies that no single monitor can produce alone.

    • sync-in-hot-path (critical) — FsAnalyzer now accepts an insideRequest flag. When a *Sync FS call fires inside an active request context (AsyncLocalStorage), a second, more specific suggestion is emitted alongside synchronous-fs. Wired automatically by FsInstrumentation via getCurrentContext().
    • missing-connection-pool (warning) — StaticScanner.runConnectionPoolScan() walks the TypeScript AST at startup to detect new Client(), new Connection(), createConnection(), etc. called inside function bodies instead of at module scope. Results are surfaced as tool: "argus-static" ScanResult entries.
    • correlated-slow-endpoint (critical) — ArgusAgent cross-references the active N+1 traceId index against incoming HTTP events. When an outbound HTTP call exceeds 1 s and the same W3C traceId has an active N+1 pattern, a compound anomaly is emitted.
    • pool-starvation-by-slow-query (critical) — When a pool-exhaustion event fires within 10 s of a slow query on the same driver, the slow query is surfaced as the likely culprit holding connections.
    • n-plus-one-in-transaction (critical) — When N+1 is detected inside an open transaction (matched by traceId / correlationId), severity is escalated to critical because repeated queries inside a transaction also delay COMMIT and hold the connection.

Fixed

  • SlowQueryMonitor.check() type contract — parameter changed from driver: string to driver: string | undefined. When no driver is known (e.g. manual traceQuery() calls or raw diagnostics_channel publishes without a driver field), check() now returns null immediately instead of falling back to the synthetic string "unknown", which previously triggered a spurious ARGUS_MISSING_DRIVER_THRESHOLD process warning in CI.
  • Test isolation — the missing-driver warning describe block in slow-query-monitor.test.ts previously monkey-patched process.emitWarning, which leaked across parallel test files and caused a flaky failure in CI. Replaced with the additive process.on('warning') / process.off('warning') API, which is fully parallel-safe. Tests are now async and yield one nextTick before asserting, matching the asynchronous dispatch path of process.emitWarning.

Changed

  • Architecture — God object split: Extracted three cohesive modules from the 1 109-line diagnostic-agent.ts:
    • src/internal/profile-factory.tsbuildAgentProfile() contains all preset-resolution and builder-wiring logic for ArgusAgent.createProfile().
    • src/internal/query-handler.tscreateQueryHandler() factory produces the per-query processing closure (adaptive sampling → query analysis → slow-query check → aggregation).
    • src/internal/console-logger.tsinstallConsoleLogger() registers formatted console output for all agent events and returns the listener pairs for clean removal on stop().
    • diagnostic-agent.ts reduced from 1 109 → ~960 lines; each new module has a single responsibility and is independently testable.
  • SlowQueryMonitor.check() call site — the && traced.driver guard added in a previous hotfix is removed; the type change makes it redundant and the intent is now expressed in the contract rather than the caller.

0.1.0 — 2026-04-11

Added

Core agent

  • ArgusAgent fluent builder with two entry points: create() (manual) and createProfile() (preset-based).
  • Zero-overhead global kill-switch via ARGUS_ENABLED=false.start() becomes a no-op with no timer, subscription, or memory overhead.
  • ARGUS_DEBUG=true built-in console logger for all agent events.

Preset system

  • Three environment presets: prod, dev, test.
  • Three app-type presets: web, db, worker (composable as an array).
  • 'auto' mode — scans package.json dependencies and infers the correct preset.
  • ArgusAgent.detectAppTypes() standalone detector.

Instrumentation

  • node:diagnostics_channel-based query tracing for 14 DB drivers: pg, mysql2, mssql, tedious, better-sqlite3, redis, ioredis, mongodb, @google-cloud/firestore, @aws-sdk/client-dynamodb, neo4j-driver, @elastic/elasticsearch, @clickhouse/client, @google-cloud/bigquery, cassandra-driver, @prisma/client.
  • Zero prototype-pollution — no monkey-patching of driver prototypes in the default path.
  • HTTP outbound tracing (node:diagnostics_channel on Node ≥ 18; monkey-patch fallback on Node 14–17).
  • File-system tracing (fs.*Sync blocker detection) — dev/test only.
  • Console log tracing with Shannon entropy scrubbing.
  • DNS lookup latency tracking.
  • W3C traceparent propagation via AsyncLocalStorage (createMiddleware() / runWithContext()).

Analysis

  • SlowQueryMonitor — per-driver threshold registry (16 built-in defaults), top-N log, ARGUS_SLOW_QUERY_THRESHOLD_<DRIVER> env var overrides, once-per-driver dev warning for unregistered drivers.
  • QueryAnalyzer — AST-based N+1 and query fix suggestions.
  • TransactionMonitor — BEGIN/COMMIT/ROLLBACK duration tracking.
  • CacheMonitor — sliding-window hit-rate degradation detection.
  • CircuitBreakerDetector — sustained error-rate detection across drivers.
  • ExplainAnalyzer — EXPLAIN plan parsing for supported drivers.
  • StaticScanner — background tsc / ESLint scan (dev/test only).
  • AuditScannernpm audit CVE scan (dev/test only).

Profiling

  • RuntimeMonitor — event loop lag, heap growth, CPU profiling.
  • CrashGuarduncaughtException / unhandledRejection telemetry and flush.
  • ResourceLeakMonitor — OS handle / socket exhaustion detection.
  • GcMonitor — GC pause pressure via node:perf_hooks.
  • PoolMonitor — connection pool exhaustion and slow-acquire events.
  • SourceMapResolver.js.map scanning and lazy position resolution.
  • GracefulShutdown — SIGTERM/SIGINT handler with configurable flush timeout.
  • AdaptiveSampler — token-bucket rate limiter per event category.

Privacy

  • AstSanitizer — SQL/NoSQL query values shredded at the AST layer (via node-sql-parser).
  • EntropyChecker — Shannon entropy scanner strips JWTs, API keys, and secrets from logs. Configurable threshold (default 4.0 bits/char).

Export

  • MetricsAggregator — p99 sliding-window aggregation.
  • OTLPExporter — OTLP JSON over mTLS (requires paid license).
  • OTLPCompatibleExporter — simplified OTLP exporter with API key auth.

Licensing

  • ECDSA ES256 JWT license validation with offline verification.
  • Clock-integrity guard (monotonic rollback detection).
  • Expiry signal file written to cwd / tmpdir / homedir on license expiry.

Developer experience

  • Dual ESM + CommonJS build (dist/esm/ and dist/cjs/).
  • Native TypeScript source execution via --experimental-strip-types (Node ≥ 22.6, dev only).
  • Docker demo app (quotes-demo-app/) with docker compose one-liner.
  • 485 tests across 102 suites mirroring the source tree.