Skip to content

Corrupted JSON files in 360 events data with null bytes and malformed arrays #50

@archit-manek

Description

@archit-manek

Summary

Some files in the three-sixty directory are corrupted: they contain large sections of null bytes and malformed arrays, causing standard JSON parsers to fail.

Affected Files

  • data/three-sixty/3835338.json

    • Line 181321: 16KB of null bytes in "location" array.
    • Structure: "location": [ [nulls] 0.0, 80.0 ],
    • Error: Unparseable by json.load(), missing closing bracket.
  • data/three-sixty/3835342.json

    • Line 171856: Corrupted "visible_area" array with null bytes.
    • Structure: "visible_area": [ numbers, 83.8r" : false,
    • Error: json.JSONDecodeError on standard parse.
  • data/three-sixty/3845506.json

    • Line 92794: Truncated text in JSON structure.
    • Structure: lse, (appears to be missing beginning of line)
    • Error: Expecting ',' delimiter on parse.

Reproduction

This issue can be reproduced by attempting to load the affected files using Pandas or JSON. For example:

# PANDAS
import pandas as pd

try:
    df = pd.read_json("data/three-sixty/3835338.json")
except Exception as e:
    print(f"FAILED: {e}")

# JSON
import json

with open("data/three-sixty/3835338.json") as f:
    json.load(f)  # Raises JSONDecodeError

Impact

  • Standard tools (Python, Pandas, Polars) cannot load these files.
  • Data ingestion pipelines have to skip the files.

Next Steps

I'm happy to help look into this further and contribute fixes if that would be useful.

Thank you for maintaining this dataset :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions