fix: remove record_id from condensed CSV output#1099
Merged
Conversation
record_id is a long internal composite key (file_path::entry_key::venue) useful for JSONL idempotency checks but adds noise in the human-readable condensed CSV. The entry_key column already uniquely identifies each row within a single run. Remove record_id from both the row builder and the field list in condense_mass_eval_jsonl.py.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
record_idcolumn in the condensed CSV (produced bycondense_mass_eval_jsonl.py) is a long composite key of the formfile_path::entry_key::venue_raw. It exists for JSONL idempotency/deduplication purposes but is noisy in the human-readable CSV where users are doing analysis.The
entry_keycolumn already uniquely identifies each row within a single run.Change
Remove
record_idfrom_build_base_row()and from thebase_fieldscolumn list incondense_jsonl_to_csv().The field remains present in the full JSONL output — this only affects the condensed CSV.
Test plan
condense_mass_eval_jsonl.pyon an existing JSONL and verifyrecord_idcolumn is absent from the output CSVentry_keyis still the first column and all other columns are unchanged