-
Notifications
You must be signed in to change notification settings - Fork 869
Open
Description
Summary
Some files in the three-sixty directory are corrupted: they contain large sections of null bytes and malformed arrays, causing standard JSON parsers to fail.
Affected Files
-
data/three-sixty/3835338.json- Line 181321: 16KB of null bytes in
"location"array. - Structure:
"location": [ [nulls] 0.0, 80.0 ], - Error: Unparseable by
json.load(), missing closing bracket.
- Line 181321: 16KB of null bytes in
-
data/three-sixty/3835342.json- Line 171856: Corrupted
"visible_area"array with null bytes. - Structure:
"visible_area": [ numbers, 83.8r" : false, - Error:
json.JSONDecodeErroron standard parse.
- Line 171856: Corrupted
-
data/three-sixty/3845506.json- Line 92794: Truncated text in JSON structure.
- Structure:
lse,(appears to be missing beginning of line) - Error:
Expecting ',' delimiteron parse.
Reproduction
This issue can be reproduced by attempting to load the affected files using Pandas or JSON. For example:
# PANDAS
import pandas as pd
try:
df = pd.read_json("data/three-sixty/3835338.json")
except Exception as e:
print(f"FAILED: {e}")
# JSON
import json
with open("data/three-sixty/3835338.json") as f:
json.load(f) # Raises JSONDecodeError
Impact
- Standard tools (Python, Pandas, Polars) cannot load these files.
- Data ingestion pipelines have to skip the files.
Next Steps
I'm happy to help look into this further and contribute fixes if that would be useful.
Thank you for maintaining this dataset :)
Metadata
Metadata
Assignees
Labels
No labels