Skip to content

[s3/azure] readinto() raises ValueError: invalid literal for int() #716

@Joe-Heffer-Shef

Description

@Joe-Heffer-Shef

Problem description

I am trying to stream a binary file from Azure Blob Storage.

I expect to be able to iterate over chunks of the data set, but I see an error do with the Azure readinto function.

I'm using the npTDMS library to read a LabVIEW data file in TDMS format (binary quantitative data files.)

Steps/code to reproduce the problem

The code is something like this:

import azure.storage.blob
import smart_open
import nptdms

CONN_STR = '******************'
BLOB_URI = 'azure://test/my_data_file.tdms'

transport_params = dict(
    client=azure.storage.blob.BlobServiceClient.from_connection_string(conn_str=CONN_STR),
)

with smart_open.open(BLOB_URI, mode='rb', transport_params=transport_params) as file:

    with nptdms.TdmsFile.open(file) as tdms_file:
        for group in tdms_file.groups():
            for channel in group.channels():
                for chunk in channel.data_chunks():
                    pass

and the error I get is:

Traceback (most recent call last):
  File "C:\Users\my_username\my_project\scripts\blob-tdms\smart.py", line 35, in <module>
    main()
  File "C:\Users\my_username\my_project\scripts\blob-tdms\smart.py", line 28, in main
    for chunk in channel.data_chunks():
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\tdms.py", line 564, in data_chunks
    for raw_data_chunk in self._read_channel_data_chunks():
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\tdms.py", line 758, in _read_channel_data_chunks
    for chunk in self._reader.read_raw_data_for_channel(self.path):
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\reader.py", line 191, in read_raw_data_for_channel
    for i, chunk in enumerate(
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\tdms_segment.py", line 269, in read_raw_data_for_channel
    for chunk in self._read_channel_data_chunks(f, data_objects, channel_path, chunk_offset, stop_chunk):
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\tdms_segment.py", line 367, in _read_channel_data_chunks
    for chunk in reader.read_channel_data_chunks(file, data_objects, channel_path, chunk_offset, stop_chunk):
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\base_segment.py", line 64, in read_channel_data_chunks
    yield self._read_channel_data_chunk(file, data_objects, chunk_index, channel_path)
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\base_segment.py", line 72, in _read_channel_data_chunk
    data_chunk = self._read_data_chunk(file, data_objects, chunk_index)
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\daqmx.py", line 39, in _read_data_chunk
    combined_data = read_interleaved_segment_bytes(file, raw_data_width, chunk_size)
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\base_segment.py", line 159, in read_interleaved_segment_bytes
    combined_data = fromfile(f, dtype=np.uint8, count=number_bytes)
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\base_segment.py", line 147, in fromfile
    bytes_read = file.readinto(buffer[offset:])
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\smart_open\azure.py", line 322, in readinto
    b[:len(data)] = data
ValueError: invalid literal for int() with base 10: b'\x93\xad\x03\x00k\xf0\xff\xff\xfe\xee\xff\xffm\xfd\xff\xffd\xc1E\x00<\xad\x03\x00O\xf0\xff\xffI\xee\xff\xff\xd1\xfd\xff\xff\xbe\xc2E\x00\xe8\xac\x03\x00\xa6\xef\xff\xff\xe5\xed\xff\xff\x92\xfd\xff\x

It seems like it's expecting a text file? Or it's not calculating the data index correctly to page through the data set?

Versions

>>> import platform, sys, smart_open
>>> print(platform.platform())
Windows-10-10.0.19042-SP0
>>> print("Python", sys.version)
Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:15:42) [MSC v.1916 64 bit (AMD64)]
>>> print("smart_open", smart_open.__version__)
smart_open 6.1.0

From pip list:

azure-core          1.23.0
azure-storage-blob  12.10.0
npTDMS              1.4.0
smart-open          6.1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedWe can't figure this out, if you can, then please help!

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions