Skip to content

BUG: pd.read_csv() does not work on MacOS with file, but with StringIO #62660

@thmsklngr

Description

@thmsklngr

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import io

names = [f"head{x:02}" for x in range(1, 11)]

# NOT WORKING
with pd.read_csv(
    "_data/random_data.csv",
    sep=";",
    names=names,
    encoding="utf-8",
    index_col=False,
    on_bad_lines="skip",
    chunksize=2,
    engine="python",
) as reader:
    for chunk in reader:
        print(chunk)

#   head01      head02      head03      head04      head05      head06      head07      head08      head09      head10
# 0      99  Wg1H2ivHFZ  BDpXoeOhKH  VohJYsnCV8  BtTuebP0nT  fmBHwP4IFV  TOg9YJp1h6  ooVM44HkzP  DSZqukVH3K  hU3NZQBuri
# 1      99  gzxEh5ieKn  HCAIPudvKj  YqTUuDKH8O  5383zSS6E6  7Nr9Ckatuo  tqfCuCh52l  JFK0cfq9mz  yyQsQGC6t3  Xc44lIK4BQ
#    head01      head02      head03      head04      head05      head06      head07      head08      head09      head10
# 2      99  M1XNbLOYG9  px78EDlwlW  gHdirv59k9  VRJgi4m1H0  vSFkaCbImk  IM9V0UCLBa  vjnpAidejp  chcpZKpn48  UlAzuehJo5
# 3      99  diWUN45qqP  16HJxD3wdU  0WvoDOwKBx  XHO9L6qVWX  94DhLCUEA7  vdQ0wFx2u3  ZeF0SOPSsc  gJfA44ZSdQ  y7rHFlT77G
#    head01      head02      head03      head04      head05      head06      head07      head08      head09      head10
# 4      99  XsqKrPi1eO  AouPwLJ8cx  qERFA7G6oE  2xcUukUfKQ  TWXUS2GNWQ  wEJ5Xz6Bzf  8G5eEJDsEo  84Gm40s4nh  wvZixCSZ5X
# 5      99  ul1YLwdMLJ  9zE2XgrLmV  LVccZLrNGl  dE6PWSqbYB  3ltSdpDsTf  5QfymfMUM7  KkxipJLtLE  hoWZps7wS6  oCrfsk9CsV
# /Users/thomas/Projekte/Entwicklung/Python/pandas_verifier/pandas_reader.py:52: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.
#   for chunk in reader:
...

# WORKING
sim_csv = io.StringIO(
    """99;Wg1H2ivHFZ;BDpXoeOhKH;VohJYsnCV8;BtTuebP0nT;fmBHwP4IFV;TOg9YJp1h6;ooVM44HkzP;DSZqukVH3K;hU3NZQBuri
99;gzxEh5ieKn;HCAIPudvKj;YqTUuDKH8O;5383zSS6E6;7Nr9Ckatuo;tqfCuCh52l;JFK0cfq9mz;yyQsQGC6t3;Xc44lIK4BQ
99;M1XNbLOYG9;px78EDlwlW;gHdirv59k9;VRJgi4m1H0;vSFkaCbImk;IM9V0UCLBa;vjnpAidejp;chcpZKpn48;UlAzuehJo5
99;diWUN45qqP;16HJxD3wdU;0WvoDOwKBx;XHO9L6qVWX;94DhLCUEA7;vdQ0wFx2u3;ZeF0SOPSsc;gJfA44ZSdQ;y7rHFlT77G
99;XsqKrPi1eO;AouPwLJ8cx;qERFA7G6oE;2xcUukUfKQ;TWXUS2GNWQ;wEJ5Xz6Bzf;8G5eEJDsEo;84Gm40s4nh;wvZixCSZ5X
99;ul1YLwdMLJ;9zE2XgrLmV;LVccZLrNGl;dE6PWSqbYB;3ltSdpDsTf;5QfymfMUM7;KkxipJLtLE;hoWZps7wS6;oCrfsk9CsV;
99;cQuqcgc9az;XyE3OYqhRw;HPELcHKBtt;PRR5qLpw1H;FZrXAWdRSZ;gJPL5W6C0Z;uFKnbdtpvS;4j1qBslPc0;imCvulSmhS
99;NzVF74lO9E;M28U9jb3oA;oAAlFQVUVt;6fkOztILHW;MZm20agksL;O0Yik187u6;ZgZMQMkjZc;yHMeT4HPEe;dppbphuT4b;;
99;bnNrfhGWri;HUxtRlvdKU;gyEjO0V1a3;xHh4SgJIfC;lawQnZfiAP;6FiB0bfmh2;shxKCWvV4Z;LmA6ZOidGv;rS8ZGBXQsx;;NF4cRa7bVJ
99;nuziDImo99;arsFldtXRS;DQpoylF0mE;qCh4S3O8hG;PdUexdXCwW;C9GUnzSXi0;ygMAcHTUCp;vH03yILzGm;1m3pSV7Eg0"""
)

names = [f"head{x:02}" for x in range(1, 11)]

with pd.read_csv(
    sim_csv,
    names=names,
    chunksize=2,
    on_bad_lines="warn",
    engine="python",
    delimiter=";",
) as reader:
    for chunk in reader:
        print(chunk)

#   head01      head02      head03      head04      head05      head06      head07      head08      head09      head10
# 0      99  Wg1H2ivHFZ  BDpXoeOhKH  VohJYsnCV8  BtTuebP0nT  fmBHwP4IFV  TOg9YJp1h6  ooVM44HkzP  DSZqukVH3K  hU3NZQBuri
# 1      99  gzxEh5ieKn  HCAIPudvKj  YqTUuDKH8O  5383zSS6E6  7Nr9Ckatuo  tqfCuCh52l  JFK0cfq9mz  yyQsQGC6t3  Xc44lIK4BQ
#    head01      head02      head03      head04      head05      head06      head07      head08      head09      head10
# 2      99  M1XNbLOYG9  px78EDlwlW  gHdirv59k9  VRJgi4m1H0  vSFkaCbImk  IM9V0UCLBa  vjnpAidejp  chcpZKpn48  UlAzuehJo5
# 3      99  diWUN45qqP  16HJxD3wdU  0WvoDOwKBx  XHO9L6qVWX  94DhLCUEA7  vdQ0wFx2u3  ZeF0SOPSsc  gJfA44ZSdQ  y7rHFlT77G
# /Users/thomas/Projekte/Entwicklung/Python/pandas_verifier/pandas_reader_2.py:36: ParserWarning: Skipping line 6: Expected 10 fields in line 6, saw 11

#   for chunk in reader:
...

Issue Description

pandas_reader_2.py
pandas_reader.py
random_data.csv

I tried to reproduce a fault in one of my works project at home on my MacOSX 15.6 and stumbled across a really strange effect. For the test I created a CSV (random_data.csv) file with 10 rows having 10 cols of data each, separator is ';' (semicolon). I modified 3 lines and added additional fields at 3 lines to simulate faulty data.

When trying to import from a CSV file (pandas_reader.py), I only get a normal warning, regardless the value I provide at on_bad_lines, even a callable is ignored. Compared to the same example (pandas_reader_2.py) using io.StringIO the behavior is as expected, warning are display or lines are skipped, depending again on the option's value. Also when providing a callable for cutting down the bad line to the expected size is working as normal.

When using my Ubuntu 22.04 on my works laptop the behavior with reading a CSV file is as expected, so I encountered this phenomenon only on my MacOS.

Expected Behavior

   head01      head02      head03      head04      head05      head06      head07      head08      head09      head10
0      99  Wg1H2ivHFZ  BDpXoeOhKH  VohJYsnCV8  BtTuebP0nT  fmBHwP4IFV  TOg9YJp1h6  ooVM44HkzP  DSZqukVH3K  hU3NZQBuri
1      99  gzxEh5ieKn  HCAIPudvKj  YqTUuDKH8O  5383zSS6E6  7Nr9Ckatuo  tqfCuCh52l  JFK0cfq9mz  yyQsQGC6t3  Xc44lIK4BQ
   head01      head02      head03      head04      head05      head06      head07      head08      head09      head10
2      99  M1XNbLOYG9  px78EDlwlW  gHdirv59k9  VRJgi4m1H0  vSFkaCbImk  IM9V0UCLBa  vjnpAidejp  chcpZKpn48  UlAzuehJo5
3      99  diWUN45qqP  16HJxD3wdU  0WvoDOwKBx  XHO9L6qVWX  94DhLCUEA7  vdQ0wFx2u3  ZeF0SOPSsc  gJfA44ZSdQ  y7rHFlT77G
/Users/thomas/Projekte/Entwicklung/Python/pandas_verifier/pandas_reader_2.py:36: ParserWarning: Skipping line 6: Expected 10 fields in line 6, saw 11

  for chunk in reader:
   head01      head02      head03      head04      head05      head06      head07      head08      head09      head10
4      99  XsqKrPi1eO  AouPwLJ8cx  qERFA7G6oE  2xcUukUfKQ  TWXUS2GNWQ  wEJ5Xz6Bzf  8G5eEJDsEo  84Gm40s4nh  wvZixCSZ5X
/Users/thomas/Projekte/Entwicklung/Python/pandas_verifier/pandas_reader_2.py:36: ParserWarning: Skipping line 8: Expected 10 fields in line 8, saw 12

  for chunk in reader:
   head01      head02      head03      head04      head05      head06      head07      head08      head09      head10
5      99  cQuqcgc9az  XyE3OYqhRw  HPELcHKBtt  PRR5qLpw1H  FZrXAWdRSZ  gJPL5W6C0Z  uFKnbdtpvS  4j1qBslPc0  imCvulSmhS
/Users/thomas/Projekte/Entwicklung/Python/pandas_verifier/pandas_reader_2.py:36: ParserWarning: Skipping line 9: Expected 10 fields in line 9, saw 12

  for chunk in reader:
   head01      head02      head03      head04      head05      head06      head07      head08      head09      head10
6      99  nuziDImo99  arsFldtXRS  DQpoylF0mE  qCh4S3O8hG  PdUexdXCwW  C9GUnzSXi0  ygMAcHTUCp  vH03yILzGm  1m3pSV7Eg0

Installed Versions

``` INSTALLED VERSIONS ------------------ commit : 9c8bc3e python : 3.10.19 python-bits : 64 OS : Darwin OS-release : 24.6.0 Version : Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:29 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6000 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 2.3.3
numpy : 2.2.6
pytz : 2025.2
dateutil : 2.9.0.post0
pip : None
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 21.0.0
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.2
qtpy : None
pyqt5 : None

</details>

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions