You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Dedupe build_large_oom_csv into tests/workdir.rs so test_stats and
test_frequency share one source of truth (Low #1).
- Document the pre-indexed + OOM → sketch fallback path in --memcheck
USAGE text, CHANGELOG, and docs/STATS_DEFINITIONS.md (Low #2).
- Drop the dead flag_sketch_method='frequent_items' assignment before
run_frequent_items — confirmed run_frequent_items does not consult
flag_sketch_method (Low #3).
- Tighten the stats and frequency OOM wwarn messages to "Re-run with
explicit ... exact to disable the auto-enable" — matches the
established frequency wording and removes the misleading "override"
phrasing (Low #4).
Verified Low #5 separately: which_stats() already gates mad on
!approx_quantiles regardless of flag_everything/flag_mad, so the
auto-disable promised by the wwarn is honored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: .claude/skills/qsv/qsv-frequency.json
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -77,7 +77,7 @@
77
77
{
78
78
"flag": "--memcheck",
79
79
"type": "flag",
80
-
"description": "Check if there is enough memory to load the entire CSV into memory using CONSERVATIVE heuristics. On OOM, qsv auto-creates an index when possible and also switches to the Frequent Items sketch (Apache DataSketches Misra-Gries, equivalent to sketch-method frequent_items) where compatible, before failing. A wwarn is emitted when the sketch fallback engages."
80
+
"description": "Check if there is enough memory to load the entire CSV into memory using CONSERVATIVE heuristics. On OOM, qsv auto-creates an index when no index exists (skipped for stdin) and ALSO switches to the Frequent Items sketch (Apache DataSketches Misra-Gries, equivalent to sketch-method frequent_items) where compatible. The sketch fallback can also fire when an index is already present and the OOM still trips (e.g., when jobs is pinned to 1 on a pre-indexed file). A wwarn is emitted when the sketch fallback engages."
Copy file name to clipboardExpand all lines: .claude/skills/qsv/qsv-stats.json
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -81,7 +81,7 @@
81
81
{
82
82
"flag": "--memcheck",
83
83
"type": "flag",
84
-
"description": "Check if there is enough memory to load the entire CSV into memory using CONSERVATIVE heuristics. This option is ignored when computing default, streaming statistics, as it is not needed. On OOM, qsv auto-creates an index when possible and also switches to approx quantile + approx cardinality methods (DataSketches t-digest and HyperLogLog) where compatible, before failing. A wwarn is emitted listing the auto-enabled estimators."
84
+
"description": "Check if there is enough memory to load the entire CSV into memory using CONSERVATIVE heuristics. This option is ignored when computing default, streaming statistics, as it is not needed. On OOM, qsv auto-creates an index when no index exists (skipped for stdin) and ALSO switches to approx quantile + approx cardinality methods (DataSketches t-digest and HyperLogLog) where compatible. The sketch fallback can also fire when an index is already present and the OOM still trips (e.g., when jobs is pinned to 1 on a pre-indexed file). A wwarn is emitted listing the auto-enabled estimators."
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,7 +39,7 @@ Detailed MCP Server and Cowork Plugin changes are documented in the MCP Server/C
39
39
- `exclude`: add stdin support and memcheck [#3749](https://github.com/dathere/qsv/pull/3749)
40
40
41
41
### Changed
42
-
- `stats` / `frequency`: when `--memcheck` is set and `util::mem_file_check` returns OOM, qsv now auto-enables the Apache DataSketches-backed estimators (t-digest + HyperLogLog for `stats`; Misra-Gries Frequent Items for `frequency`) in addition to the existing auto-index fallback, where flag conflicts allow. The OOM error is only propagated when neither fallback engages. A `wwarn!` is emitted listing the auto-enabled estimators. Explicit `--quantile-method exact` / `--cardinality-method exact` / `--sketch-method exact` still suppresses the auto-enable for that method.
42
+
- `stats` / `frequency`: when `--memcheck` is set and `util::mem_file_check` returns OOM, qsv now auto-enables the Apache DataSketches-backed estimators (t-digest + HyperLogLog for `stats`; Misra-Gries Frequent Items for `frequency`) in addition to the existing auto-index fallback, where flag conflicts allow. The OOM error is only propagated when neither fallback engages. A `wwarn!` is emitted listing the auto-enabled estimators. The sketch fallback can fire even when an index is already present (e.g., with `--jobs 1` on a pre-indexed file) — this is a behavior change from the previous "error out" path in that narrow case. Explicit `--quantile-method exact` / `--cardinality-method exact` / `--sketch-method exact` still suppresses the auto-enable for that method.
43
43
- **BREAKING** `excel`: `--metadata csv` column ordering for `type`, `visible`, and `headers` is corrected. Previously the CSV header row declared `type, visible, headers` but the data rows pushed values in the order `headers, typ, visible`, so under each named column the wrong values appeared (the `type` column held the headers list, `visible` held the type, and `headers` held the visibility). The CSV output now matches the `--metadata json` (`SheetMetadata` struct) field order: `index, sheet_name, type, visible, headers, column_count, …`. Pipelines that consumed `qsv excel --metadata csv` and indexed by column position must shift those three columns; consumers that indexed by header name see corrected values automatically.
44
44
- **BREAKING** `enum`: `--hash` digest values change. The hashed input now carries a `u64` length prefix per field (to fix the multi-column collision bug above), so every `--hash` digest differs from earlier qsv versions — single-column hashes change identity values too, and stored hashes from earlier qsv versions will not match. Same input still hashes deterministically across rows and runs in ≥ this version.
45
45
- **BREAKING** `luau`: `qsv_loadcsv` now returns the headers table 1-indexed (per Lua convention). Scripts that accessed `headers[0]` or iterated `for i = 0, #headers - 1` must shift to `headers[1]` and `for i = 1, #headers` (or `ipairs(headers)`). Previously `headers[1]` returned the *second* header.
Copy file name to clipboardExpand all lines: docs/STATS_DEFINITIONS.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -363,7 +363,7 @@ By default, `stats` produces **exact, deterministic** results. Three opt-in flag
363
363
-`--quantile-method` auto-enables unless `--weight` is set; if `--mad` or `--everything` is also set, MAD is auto-disabled (mirroring the existing `--quantile-method approx` guard).
364
364
-`--cardinality-method` auto-enables unless `--infer-boolean` is set.
365
365
366
-
A `wwarn!` is emitted listing each auto-enabled estimator. The original OOM error is only propagated when **neither** fallback engages. Users can override by passing `--quantile-method exact` or `--cardinality-method exact` explicitly (the auto-enable only flips fields that were left at their `exact` default).
366
+
A `wwarn!` is emitted listing each auto-enabled estimator. The original OOM error is only propagated when **neither** fallback engages. The sketch fallback can fire even when an index is already present and the OOM check still trips (e.g., with `--jobs 1` on a pre-indexed file) — that is a behavior change from the previous "error out" path in this narrow case. Users can disable the auto-enable by passing `--quantile-method exact` or `--cardinality-method exact` explicitly (the auto-enable only flips fields that were left at their `exact` default).
367
367
368
368
**See also:**[t-digest paper (Dunning, 2019)](https://arxiv.org/abs/1902.04023), [HyperLogLog (Flajolet et al., 2007)](https://en.wikipedia.org/wiki/HyperLogLog), [Apache DataSketches](https://datasketches.apache.org/).
Copy file name to clipboardExpand all lines: docs/help/frequency.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -155,7 +155,7 @@ qsv frequency --help
155
155
| `‑o,`<br>`‑‑output` | string | Write output to <file> instead of stdout. ||
156
156
| `‑n,`<br>`‑‑no‑headers` | flag | When set, the first row will NOT be included in the frequency table. Additionally, the 'field' column will be 1-based indices instead of header names. ||
157
157
| `‑d,`<br>`‑‑delimiter` | string | The field delimiter for reading CSV data. Must be a single character. (default: ,) ||
158
-
| `‑‑memcheck` | flag | Check if there is enough memory to load the entire CSV into memory using CONSERVATIVE heuristics. On OOM, qsv auto-creates an index when possible and also switches to the Frequent Items sketch (Apache DataSketches Misra-Gries, equivalent to sketch-method frequent_items) where compatible, before failing. A wwarn is emitted when the sketch fallback engages. ||
158
+
| `‑‑memcheck` | flag | Check if there is enough memory to load the entire CSV into memory using CONSERVATIVE heuristics. On OOM, qsv auto-creates an index when no index exists (skipped for stdin) and ALSO switches to the Frequent Items sketch (Apache DataSketches Misra-Gries, equivalent to sketch-method frequent_items) where compatible. The sketch fallback can also fire when an index is already present and the OOM still trips (e.g., when jobs is pinned to 1 on a pre-indexed file). A wwarn is emitted when the sketch fallback engages. ||
Copy file name to clipboardExpand all lines: docs/help/stats.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -282,7 +282,7 @@ qsv stats --help
282
282
| `‑o,`<br>`‑‑output` | string | Write output to <file> instead of stdout. ||
283
283
| `‑n,`<br>`‑‑no‑headers` | flag | When set, the first row will NOT be interpreted as column names. i.e., They will be included in statistics. ||
284
284
| `‑d,`<br>`‑‑delimiter` | string | The field delimiter for READING CSV data. Must be a single character. (default: ,) ||
285
-
| `‑‑memcheck` | flag | Check if there is enough memory to load the entire CSV into memory using CONSERVATIVE heuristics. This option is ignored when computing default, streaming statistics, as it is not needed. On OOM, qsv auto-creates an index when possible and also switches to approx quantile + approx cardinality methods (DataSketches t-digest and HyperLogLog) where compatible, before failing. A wwarn is emitted listing the auto-enabled estimators. ||
285
+
| `‑‑memcheck` | flag | Check if there is enough memory to load the entire CSV into memory using CONSERVATIVE heuristics. This option is ignored when computing default, streaming statistics, as it is not needed. On OOM, qsv auto-creates an index when no index exists (skipped for stdin) and ALSO switches to approx quantile + approx cardinality methods (DataSketches t-digest and HyperLogLog) where compatible. The sketch fallback can also fire when an index is already present and the OOM still trips (e.g., when jobs is pinned to 1 on a pre-indexed file). A wwarn is emitted listing the auto-enabled estimators. ||
0 commit comments