[FEATURE] add a token bloat detector



### 🧠 Feature: Token Bloat Detector

#### Problem

We’re losing 10–30% of LLM spend to silent token inefficiencies — verbose generations, over-sized context windows, retry storms, and model misuse (e.g. GPT-4 where GPT-3.5 would suffice).
Existing observability tools *show* token counts; they don’t *explain* waste or *enforce* prevention. CrashLens needs to detect, attribute, and enforce token efficiency as policy.

#### Goals

* Identify and quantify “token bloat” across requests, routes, and models.
* Attribute waste to specific prompt templates, model versions, or user flows.
* Auto-suggest fixes and integrate with CI/CD or runtime policy enforcement.
* Output audit-grade financial reports showing cost impact per % of bloat.

#### Detection Logic (initial heuristics)

* Compare **actual tokens used** vs. **expected tokens** (based on prompt length, model efficiency benchmarks).
* Flag overage thresholds:

  * > 25% deviation = warning
  * > 50% deviation = violation
* Classify root causes:

  1. **Prompt verbosity** – unnecessary or repetitive phrasing.
  2. **Context overreach** – excessive retrieval/context size.
  3. **Model misuse** – high-cost model where smaller model suffices.
  4. **Retry storms** – repeated completions for same intent.

#### Output & Integration

* CLI & API endpoints:

  * `crashlens detect --bloat` → outputs JSON report per route.
  * `crashlens enforce --policy token_bloat.yaml` → blocks CI/CD merges if threshold exceeded.
* Dashboard view: cost leakage chart (% of spend wasted).
* Optional webhook → send alerts to Slack or Grafana.

#### Example Policy

```yaml
policies:
  - id: token_bloat_limit
    threshold: 25
    action: block
    message: "Token usage exceeds expected by >25%."
```

#### Metrics of Success

* Detect 90%+ of token inefficiencies with <5% false positives.
* Reduce token spend by 15–20% in pilot customers.
* Median detection latency <2s for 1M+ log entries.

#### Open Questions

1. Which model metadata (usage, retries, latency) is exposed per vendor?
2. How do we baseline “expected token use” per model-family?
3. Should detection run inline (real-time) or as nightly batch job?
4. How do we integrate fixes with OPA policy layer seamlessly?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] add a token bloat detector #45

🧠 Feature: Token Bloat Detector

Problem

Goals

Detection Logic (initial heuristics)

Output & Integration

Example Policy

Metrics of Success

Open Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] add a token bloat detector #45

Description

🧠 Feature: Token Bloat Detector

Problem

Goals

Detection Logic (initial heuristics)

Output & Integration

Example Policy

Metrics of Success

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions