Skip to content

fix: remove record_id from condensed CSV output#1099

Merged
florath merged 1 commit intomainfrom
fix/condense-remove-record-id
Mar 17, 2026
Merged

fix: remove record_id from condensed CSV output#1099
florath merged 1 commit intomainfrom
fix/condense-remove-record-id

Conversation

@florath
Copy link
Contributor

@florath florath commented Mar 17, 2026

Problem

The record_id column in the condensed CSV (produced by condense_mass_eval_jsonl.py) is a long composite key of the form file_path::entry_key::venue_raw. It exists for JSONL idempotency/deduplication purposes but is noisy in the human-readable CSV where users are doing analysis.

The entry_key column already uniquely identifies each row within a single run.

Change

Remove record_id from _build_base_row() and from the base_fields column list in condense_jsonl_to_csv().

The field remains present in the full JSONL output — this only affects the condensed CSV.

Test plan

  • Run condense_mass_eval_jsonl.py on an existing JSONL and verify record_id column is absent from the output CSV
  • Verify entry_key is still the first column and all other columns are unchanged

record_id is a long internal composite key (file_path::entry_key::venue)
useful for JSONL idempotency checks but adds noise in the human-readable
condensed CSV. The entry_key column already uniquely identifies each row
within a single run. Remove record_id from both the row builder and the
field list in condense_mass_eval_jsonl.py.
@florath florath merged commit 174ea6c into main Mar 17, 2026
21 checks passed
@florath florath deleted the fix/condense-remove-record-id branch March 17, 2026 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant