docs: hype LLM tracing in README

sharon77242 · sharon77242 · commit b5601955c930 · 2026-05-18T11:08:37.000+03:00
- Tagline updated to mention LLM observability
- New 'LLM Observability' section after Quick Start with real demo
  console output, per-field breakdown table, and four unique selling
  points (cost-per-call, PII exposure, injection detection,
  llm-dominates-request latency correlation)
- App Type Presets table: add 'llm' row and common combo examples
- Events Reference: add 'llm' event row; expand 'anomaly' to list all
  four LLM anomaly types including llm-dominates-request
- Production Safety Reference: add .withLLMTracing() row
- withLLMTracing section: fix console output format to match real
  output; add llm-dominates-request to anomaly events table
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Argus
 
-> **Privacy-first APM and performance diagnostics for Node.js — zero sidecar, zero raw data exported.**
+> **Privacy-first APM for Node.js — runtime diagnostics, LLM observability, zero sidecar, zero raw data exported.**
 
 [![CI](https://github.com/sharon77242/Argus/actions/workflows/ci.yml/badge.svg)](https://github.com/sharon77242/Argus/actions/workflows/ci.yml)
 [![Sponsor](https://img.shields.io/badge/Sponsor-%E2%9D%A4-pink?logo=github)](https://github.com/sponsors/sharon77242)
@@ -21,11 +21,12 @@ Named after **Argus Panoptes**, the hundred-eyed watchman of Greek mythology. A
 
 1. [Why This Exists](#why-this-exists)
 2. [Quick Start](#quick-start)
-3. [Privacy Guarantees](#privacy-guarantees)
-4. [Requirements](#requirements)
-5. [Build from Source](#build-from-source)
-6. [Demo App](#demo-app)
-7. [Profile API (recommended)](#profile-api-recommended)
+3. [LLM Observability](#llm-observability)
+4. [Privacy Guarantees](#privacy-guarantees)
+5. [Requirements](#requirements)
+6. [Build from Source](#build-from-source)
+7. [Demo App](#demo-app)
+8. [Profile API (recommended)](#profile-api-recommended)
    - [Environment Presets](#environment-presets)
    - [App Type Presets](#app-type-presets)
    - [Auto-Detection](#auto-detection)
@@ -39,16 +40,16 @@ Named after **Argus Panoptes**, the hundred-eyed watchman of Greek mythology. A
    - [Adaptive Sampler](#adaptive-sampler)
    - [Job Queue Tracing](#job-queue-tracing)
    - [Messaging Tracing](#messaging-tracing)
-9. [Instance Methods](#instance-methods)
-10. [Events Reference](#events-reference)
-11. [Environment Variables](#environment-variables)
-12. [Production Safety Reference](#production-safety-reference)
-13. [Architecture Layers](#architecture-layers)
-14. [Project Structure](#project-structure)
-15. [Low-Level API](#low-level-api)
-16. [Self-Host Your OTLP Endpoint](#self-host-your-otlp-endpoint)
-17. [Roadmap](#roadmap)
-18. [License](#license)
+10. [Instance Methods](#instance-methods)
+11. [Events Reference](#events-reference)
+12. [Environment Variables](#environment-variables)
+13. [Production Safety Reference](#production-safety-reference)
+14. [Architecture Layers](#architecture-layers)
+15. [Project Structure](#project-structure)
+16. [Low-Level API](#low-level-api)
+17. [Self-Host Your OTLP Endpoint](#self-host-your-otlp-endpoint)
+18. [Roadmap](#roadmap)
+19. [License](#license)
 
 ---
 
@@ -60,6 +61,7 @@ Standard APM products either require heavy agents, compile steps, or sacrifice d
 - **AST-first privacy** — SQL/NoSQL query values are shredded at the AST layer before they ever touch a metric
 - **Entropy-checked logs** — Shannon entropy scanning strips JWT tokens, API keys, and any other high-entropy string from `console` payloads automatically
 - **Zero prototype pollution** — all DB interception goes through `node:diagnostics_channel`, the official Node.js observability primitive
+- **LLM-aware** — intercepts OpenAI and Anthropic SDK calls to surface cost, token usage, PII exposure, and prompt injection attempts with zero code changes
 
 ---
 
@@ -85,6 +87,66 @@ const agent = await ArgusAgent.createProfile({
 
 ---
 
+## LLM Observability
+
+Add `appType: 'llm'` and Argus intercepts every OpenAI and Anthropic call — cost per request, token counts, PII exposure, and prompt injection attempts, all in a single console line with zero code changes:
+
+```
+19:51:02.160 [LLM] openai/gpt-4o  /api/chat  1240ms  $0.0043  in:342 out:89  ⚠ PII: [EMAIL×1] — sanitized  ⚠ INJECTION ATTEMPT
+```
+
+**What each field means:**
+
+| Field | Example | Description |
+|---|---|---|
+| Provider / model | `openai/gpt-4o` | SDK and model used |
+| Endpoint | `/api/chat` | HTTP route that triggered the call |
+| Latency | `1240ms` | Wall-clock time for the full LLM round-trip |
+| Cost | `$0.0043` | Calculated from token counts × per-model pricing |
+| Tokens | `in:342 out:89` | Prompt and completion tokens |
+| PII warning | `⚠ PII: [EMAIL×1] — sanitized` | Detected and redacted before telemetry export |
+| Injection warning | `⚠ INJECTION ATTEMPT` | Prompt injection pattern detected |
+
+**Four things no other Node.js APM shows you:**
+
+1. **Your real LLM bill, per request.** Not an estimate — computed from the actual token counts the model reports. Cost spike detection fires automatically when a single call runs 10× over your rolling average.
+
+2. **Your users' emails are in those prompts.** Argus redacts PII (emails, phone numbers, SSNs, card numbers, IPs) from the telemetry record before export. The raw prompt reaches the model unchanged — your observability data never sees it.
+
+3. **Prompt injection attempts, logged before damage is done.** Six regex patterns covering `ignore previous instructions`, role-override, and data-exfil attempts. Wire one listener to your security log.
+
+4. **When your LLM owns your latency budget.** The `llm-dominates-request` rule fires an `'anomaly'` event when LLM time exceeds 80% of the HTTP request duration — the exact signal you need to decide whether to cache, stream, or move the call off the hot path.
+
+```typescript
+const agent = await ArgusAgent.createProfile({
+  environment: "prod",
+  appType: ["web", "llm"], // or just "llm"
+}).start();
+
+// That's it. All OpenAI / Anthropic calls are traced from this point.
+
+// Optional: react to anomalies
+agent.on("anomaly", (event) => {
+  if (event.type === "llm-dominates-request") {
+    // LLM took >80% of the HTTP request budget — consider caching or streaming
+  }
+  if (event.type === "llm-cost-spike") {
+    // Single call cost spiked 10× — worth investigating
+  }
+});
+
+// Optional: react to security events
+agent.on("llm", (event) => {
+  if (event.injectionAttemptDetected) {
+    securityLog.warn("prompt injection attempt", { endpoint: event.endpoint, traceId: event.traceId });
+  }
+});
+```
+
+→ Full API reference: [`withLLMTracing(options?)`](#withllmtracingoptions)
+
+---
+
 ## Privacy Guarantees
 
 ### What this agent collects
@@ -249,13 +311,15 @@ const agent = await ArgusAgent.createProfile({
 
 ### App Type Presets
 
-| `appType`               | Modules Enabled                                                                                        | Optimization Target                                                  |
-| ----------------------- | ------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------- |
-| `'web'`                 | HttpTracing, ResourceLeakMonitor, Auto-Patching                                                        | **Latency** — request/response & socket tracking                     |
-| `'db'`                  | QueryAnalysis, SlowQueryMonitor, ResourceLeakMonitor, Auto-Patching                                    | **Data Access** — query patterns & connection safety                 |
-| `'worker'`              | RuntimeMonitor (CPU/Mem), GcMonitor, ResourceLeakMonitor, Auto-Patching, **JobTracing, MessagingTracing** | **Throughput** — long-running safety, loop health & queue visibility |
-| `['web','db']`          | Union of `web` + `db`                                                                                  | **Hybrid** — full HTTP + query coverage                              |
-| `['web','db','worker']` | All modules                                                                                            | **Full-Stack** — maximum observability                               |
+| `appType`                      | Modules Enabled                                                                                          | Optimization Target                                                  |
+| ------------------------------ | -------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- |
+| `'web'`                        | HttpTracing, ResourceLeakMonitor, Auto-Patching                                                          | **Latency** — request/response & socket tracking                     |
+| `'db'`                         | QueryAnalysis, SlowQueryMonitor, ResourceLeakMonitor, Auto-Patching                                      | **Data Access** — query patterns & connection safety                 |
+| `'worker'`                     | RuntimeMonitor (CPU/Mem), GcMonitor, ResourceLeakMonitor, Auto-Patching, **JobTracing, MessagingTracing** | **Throughput** — long-running safety, loop health & queue visibility |
+| `'llm'`                        | LLMTracing (OpenAI + Anthropic), HttpTracing                                                             | **AI** — cost, tokens, PII, injection, latency correlation           |
+| `['web','db']`                 | Union of `web` + `db`                                                                                    | **Hybrid** — full HTTP + query coverage                              |
+| `['web','db','llm']`           | Union of `web` + `db` + `llm`                                                                            | **AI App** — full-stack + LLM observability                          |
+| `['web','db','worker','llm']`  | All modules                                                                                              | **Full-Stack** — maximum observability                               |
 
 Each `.with*()` call is **idempotent** — combining types never double-registers a module.
 
@@ -326,12 +390,11 @@ const response = await openai.chat.completions.create({
 });
 ```
 
-**What you see in dev mode:**
+**What you see in dev mode** (real output from the demo app):
 
 ```
-[ARGUS] LLM  openai/gpt-4o  POST /api/chat   1,240ms  $0.0043  in:342 out:89
-[ARGUS] ⚠    PII: [EMAIL×1] — sanitized before export
-[ARGUS] LLM  anthropic/claude-3-5-sonnet  POST /api/summarize  890ms  $0.0012
+19:51:02.160 [LLM] openai/gpt-4o  /api/chat  1240ms  $0.0043  in:342 out:89  ⚠ PII: [EMAIL×1] — sanitized  ⚠ INJECTION ATTEMPT
+19:51:05.302 [LLM] anthropic/claude-3-5-sonnet  /api/summarize  890ms  $0.0012  in:150 out:62
 ```
 
 **Options:**
@@ -343,10 +406,10 @@ const response = await openai.chat.completions.create({
 
 **Events emitted:**
 
-| Event       | When                                                     |
-| ----------- | -------------------------------------------------------- |
-| `'llm'`     | Every completed LLM call                                 |
-| `'anomaly'` | `n-llm-calls`, `llm-cost-spike`, `context-window-growth` |
+| Event       | When                                                                                              |
+| ----------- | ------------------------------------------------------------------------------------------------- |
+| `'llm'`     | Every completed LLM call                                                                          |
+| `'anomaly'` | `n-llm-calls` · `llm-cost-spike` · `context-window-growth` · `llm-dominates-request`             |
 
 **What is sanitized:**
 
@@ -656,9 +719,10 @@ The agent is an `EventEmitter`. All events are emitted on the `ArgusAgent` insta
 
 | Event               | Payload                                                                                      | When                                                                                                  |
 | ------------------- | -------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- |
-| `'job'`             | `JobEvent`                                                                                   | Job completed, failed, retried, or stalled (BullMQ, Bull, pg-boss, Agenda)                           |
-| `'message'`         | `MessageEvent`                                                                               | Message produced or consumed (KafkaJS, amqplib)                                                      |
-| `'anomaly'`         | `ProfilerEvent`                                                                              | Memory leak, event loop lag, CPU spike, cross-signal compound anomaly, or job/message rule violation  |
+| `'job'`             | `JobEvent`                                                                                   | Job completed, failed, retried, or stalled (BullMQ, Bull, pg-boss, Agenda)                                                                                      |
+| `'message'`         | `MessageEvent`                                                                               | Message produced or consumed (KafkaJS, amqplib)                                                                                                                  |
+| `'llm'`             | `LLMEvent`                                                                                   | LLM call completed — provider, model, endpoint, durationMs, costUsd, tokens, piiDetected, injectionAttemptDetected, suggestions                                  |
+| `'anomaly'`         | `ProfilerEvent`                                                                              | Memory leak, event loop lag, CPU spike, cross-signal anomaly, job/message rule violation, or LLM anomaly (`n-llm-calls`, `llm-cost-spike`, `context-window-growth`, `llm-dominates-request`) |
 | `'query'`           | `{ sanitizedQuery, durationMs, driver?, traceId?, correlationId?, cacheHit?, suggestions? }` | DB query completed                                                                                    |
 | `'slow-query'`      | `SlowQueryRecord`                                                                            | Query exceeded the per-driver threshold                                                               |
 | `'transaction'`     | `TransactionEvent`                                                                           | BEGIN/COMMIT/ROLLBACK pattern completed                                                               |
@@ -680,8 +744,9 @@ The agent is an `EventEmitter`. All events are emitted on the `ArgusAgent` insta
 
 ```typescript
 agent.on("anomaly", (event) => {
-  // runtime:    'memory-leak' | 'event-loop-lag' | 'cpu-spike'
+  // runtime:      'memory-leak' | 'event-loop-lag' | 'cpu-spike'
   // cross-signal: 'correlated-slow-endpoint' | 'pool-starvation-by-slow-query' | 'n-plus-one-in-transaction'
+  // llm:          'n-llm-calls' | 'llm-cost-spike' | 'context-window-growth' | 'llm-dominates-request'
   console.log(event.type);
   console.log(event.heapSnapshotPath); // only set when a snapshot write succeeded
 });
@@ -776,6 +841,7 @@ All thresholds can be overridden without code changes, making the agent CI/CD an
 | `.withCrashGuard()`                | ✅ Yes     | Very Low        | Intercepts `uncaughtException`; emits event for `unhandledRejection`                                                                |
 | `.withResourceLeakMonitor(opts?)`  | ✅ Yes     | Low             | Tracks OS handles; rate-limited by `alertCooldownMs`                                                                                |
 | `.withGracefulShutdown(opts?)`     | ✅ Yes     | Very Low        | Registers SIGTERM/SIGINT; awaits `agent.stop()` before `process.exit`                                                               |
+| `.withLLMTracing(opts?)`           | ✅ Yes     | Very Low        | OpenAI + Anthropic call interception — cost, tokens, PII redaction, injection detection, anomaly rules                              |
 | `.withInstrumentation(opts?)`      | ✅ Yes     | Low             | DB/IO tracing via `diagnostics_channel` (17 drivers)                                                                                |
 | `.withHttpTracing()`               | ✅ Yes     | Low             | HTTP request inspection & slow-request detection                                                                                    |
 | `.withLogTracing(opts?)`           | ✅ Yes     | Low             | `console.*` override with entropy-scrubbed payloads                                                                                 |