-
Notifications
You must be signed in to change notification settings - Fork 1
Description
loq currently enforces file size via line counts. For LLM-assisted workflows, token count is often the more meaningful metric — a 300-line file of dense code can burn far more context than a 500-line file of simple declarations.
Proposal
Add token-based limits alongside line-based limits. Metrics can be set per-rule — some files are human-facing (lines make sense), others are agent-facing (tokens matter more).
Config surface
Three mutually exclusive limit fields — self-describing, no ambiguity:
# Top-level defaults (pick one)
default_max_lines = 500 # today's config, still works
# default_max_tokens = 4000 # or this
# default_max = 4000 # or this (requires default_metric)
# default_metric = "tokens"
[[rules]]
path = "**/*.rs"
max_lines = 500 # implies lines, no metric field needed
[[rules]]
path = "prompts/**/*.md"
max_tokens = 8000 # implies tokens, no metric field needed
[[rules]]
path = "scripts/**"
max = 4000 # generic form, inherits default_metricRules:
max_linesandmax_tokensare self-describing — no separatemetricfield neededmaxis the generic form, inherits metric fromdefault_metric(errors if none set)- Using more than one of
max/max_lines/max_tokenson the same rule or at the top level is a validation error metricfield on a rule is only valid alongsidemax, not alongsidemax_lines/max_tokens- Existing configs with
max_lines/default_max_lineswork unchanged — fully backwards compatible
What needs to change
loq_core
- Add a
Metricenum (Lines | Tokens) - Add
maxandmax_tokensas limit fields alongside existingmax_lines - Validate mutual exclusivity during config parsing
- Carry
Metricthroughreporttypes so output formatters know the unit
loq_fs
- New
count_tokens.rsmodule — approximate token counting viabytes / 4 - Branch on metric in
check_fileto call line counting or token counting - Include metric in cache config hash so cache invalidates on metric change
loq_cli
- Dynamic unit in output messages ("lines" vs "tokens")
- Optional
--metricCLI flag override (stomps everything — default and all rules)
decide.rs is already metric-agnostic — it just returns a limit: usize. Needs to carry the Metric alongside the limit so the measurement layer knows what to count.
Token counting strategy
Start with a fast approximation: bytes / 4. This is roughly correct for English-heavy code, fast enough to not need special caching treatment, and avoids pulling in a tokenizer dependency. A future iteration can add real tokenizer support (tiktoken-rs with cl100k_base, o200k_base, etc.) behind a config flag.
Other considerations
- Output phrasing: token counts from
bytes / 4are approximate — decide whether to mark them with~or just say "tokens" without hedging - Baseline files: may need to store the metric so baselines created under one metric don't silently apply to another
- Binary files: existing null-byte detection still applies before token counting; no change needed
Decisions
- Naming: three mutually exclusive fields (
max/max_lines/max_tokens) — self-describing, backwards compatible - Per-rule metrics: supported — rules can independently use lines or tokens
- Tokenizer: deferred.
bytes / 4approximation first, real tokenizer later - Default metric:
lines— existing configs unchanged - CLI override:
--metricstomps everything (default + all rules)