Skip to content

"\\" in row breaks PyDruid - JSONDecodeError('Unterminated string...') #242

Open
@dklei

Description

@dklei

Hi,

I'm using pydruid.db.connector to run a query that pulls a row where the content that is returned ends in "...\\", and this appears to break pydruid, meaning it either drops rows from the data or fails with a JSONDecodeError.

e.g. "SELECT x FROM y" -> [{"x": "some row"},{"x": "...\\"},{"x": "another row"},{"x": "more rows"}]

2020-11-27 10:44:23: [CRITICAL] JSONDecodeError('Unterminated string starting at: line 1 column 85919 (char 85918)')
2020-11-27 10:44:23: [CRITICAL] Traceback (most recent call last):
File "xxxxx", line 291, in main
data_paths = pull_data(tracker.last_data_dt, tracker.next_data_dt)
File "xxxxx", line 162, in pull_data
data_path = collector.execute_and_save()
File "xxxxx", line 226, in execute_and_save
for i, row in enumerate(cursor):
File "xxxxx", line 181, in _get_cursor
raise err
File "xxxxx", line 164, in _get_cursor
raise err
File "xxxxx", line 161, in _get_cursor
r = next(cursor)
File "/xxxx/venv/lib64/python3.8/site-packages/pydruid/db/api.py", line 62, in g
return f(self, *args, **kwargs)
File "/xxxx/venv/lib64/python3.8/site-packages/pydruid/db/api.py", line 320, in next
return next(self._results)
File "/xxxx/venv/lib64/python3.8/site-packages/pydruid/db/api.py", line 370, in _stream_query
for row in rows_from_chunks(chunks):
File "/xxxx/venv/lib64/python3.8/site-packages/pydruid/db/api.py", line 420, in rows_from_chunks
for row in json.loads(
File "/usr/lib64/python3.8/json/init.py", line 370, in loads
return cls(**kw).decode(s)
File "/usr/lib64/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 85919 (char 85918)

Any rows proceeding the {"x": "...\\"} either do not return data, or return a JSONDecodeError. I'm guessing this is because pydruid.db.api.rows_from_chunks tries to parse the JSON itself, and looks for "\\" as end of strings?

I have attached a script and a dummy JSON file (scratch.zip) that shows the rows being dropped by the function but this does not trigger the JSONDecodeError - this appears to only trigger when I try to read this row and the surrounding rows from the database.

Many thanks in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions