Priority Level
Medium
Task Summary
Duplicate column names in output dataframe. This issue is created from item #14 of the bugbash.
When the input CSV already contains a column named text_replaced (which is also the Anonymizer's output column name), the output DataFrame has duplicate column names:
Output columns: ['text', 'text_replaced', 'text_replaced', 'text_with_spans',
'final_entities', 'text_replaced', 'text_replaced']
Impact: df['text_replaced'] returns ambiguous results. Users who have pre-existing columns with names matching Anonymizer's output columns will get corrupted DataFrames. This could also cause silent data loss when saving to CSV.
Recommendation: Either prefix/suffix Anonymizer output columns to avoid collisions, or raise an error/warning if input columns collide with output column names.
Technical Details & Implementation Plan
No response
Dependencies
No response
Priority Level
Medium
Task Summary
Duplicate column names in output dataframe. This issue is created from item #14 of the bugbash.
When the input CSV already contains a column named
text_replaced(which is also the Anonymizer's output column name), the output DataFrame has duplicate column names:Impact:
df['text_replaced']returns ambiguous results. Users who have pre-existing columns with names matching Anonymizer's output columns will get corrupted DataFrames. This could also cause silent data loss when saving to CSV.Recommendation: Either prefix/suffix Anonymizer output columns to avoid collisions, or raise an error/warning if input columns collide with output column names.
Technical Details & Implementation Plan
No response
Dependencies
No response