Skip to content

Commit 140c708

Browse files
committed
add claude.md
1 parent 1d57167 commit 140c708

File tree

4 files changed

+95
-29
lines changed

4 files changed

+95
-29
lines changed

CLAUDE.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
xlsx2csv is a Python utility that converts XLSX files to CSV format. It's designed to handle large XLSX files efficiently and supports multiple Python versions (2.4 to 3.14). The entire converter is implemented in a single Python file `xlsx2csv.py` (~54KB).
8+
9+
## Development Commands
10+
11+
### Testing
12+
```bash
13+
python3 test/run
14+
```
15+
This runs the comprehensive test suite that compares output from various test XLSX files against expected CSV files. Tests cover edge cases like datetime formatting, empty rows, hyperlinks, various delimiters, and multi-sheet files.
16+
17+
### Building
18+
```bash
19+
python3 -m build
20+
```
21+
Uses the modern Python build system defined in `pyproject.toml`.
22+
23+
### Installation for Development
24+
```bash
25+
pip install -e .
26+
```
27+
Install in editable mode for local development.
28+
29+
## Code Architecture
30+
31+
### Core Classes
32+
- **Xlsx2csv**: Main converter class that orchestrates the conversion process
33+
- **Workbook**: Represents the XLSX workbook structure and handles ZIP file extraction
34+
- **Sheet**: Handles individual worksheet conversion with SAX parsing for memory efficiency
35+
- **SharedStrings**: Manages the shared strings table
36+
- **Styles**: Handles cell formatting and number formats
37+
- **ContentTypes**: Manages XLSX content type definitions
38+
- **Relationships**: Handles XLSX relationship mappings
39+
40+
### Key Design Patterns
41+
- **SAX Parsing**: Uses `xml.parsers.expat` for memory-efficient XML parsing of large files
42+
- **Streaming Processing**: Processes XLSX files without loading entire content into memory
43+
- **Format Detection**: Comprehensive format mapping system (`FORMATS` and `STANDARD_FORMATS` dicts) for proper type conversion
44+
- **Command-line Interface**: Uses argparse for Python 3+ and optparse fallback for Python 2.4
45+
46+
### File Structure
47+
- `xlsx2csv.py`: Single-file implementation containing all classes and logic
48+
- `test/`: Contains test XLSX/CSV file pairs and the test runner
49+
- `test/run`: Python script that compares converter output against expected results
50+
51+
## Testing Strategy
52+
The test suite uses a comparison-based approach:
53+
1. Converts test XLSX files using the converter
54+
2. Compares output with pre-generated expected CSV files
55+
3. Tests both file input and STDIN input modes
56+
4. Covers various edge cases: datetime formatting, hyperlinks, empty cells, multi-sheet files, different encodings
57+
58+
## Key Features Supported
59+
- Multiple output formats and delimiters
60+
- Date/time formatting with custom patterns
61+
- Hyperlink extraction
62+
- Multi-sheet processing
63+
- Large file handling via streaming
64+
- Cross-platform compatibility (Windows/Linux/macOS)
65+
- Python 2.4 to 3.14 compatibility

README.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@
66
Converts xlsx files to csv format.
77
Handles large XLSX files. Fast and easy to use.
88

9-
## Tested(supported) python versions:
9+
## Tested (supported) Python versions:
1010
- 2.4
1111
- 2.7
12-
- 3.4 to 3.13
12+
- 3.4 to 3.14
1313

1414
## Installation:
1515

@@ -23,7 +23,7 @@ pip install xlsx2csv
2323
```
2424

2525

26-
Also, works standalone with only the *xlsx2csv.py* script
26+
Also works standalone with only the *xlsx2csv.py* script.
2727

2828
**Usage:**
2929
```
@@ -47,53 +47,53 @@ pip install xlsx2csv
4747
-v, --version show program's version number and exit
4848
-a, --all export all sheets
4949
-c OUTPUTENCODING, --outputencoding OUTPUTENCODING
50-
encoding of output csv ** Python 3 only ** (default: utf-8)
50+
encoding of output CSV **Python 3 only** (default: utf-8)
5151
-s SHEETID, --sheet SHEETID
5252
sheet number to convert, 0 for all
5353
-n SHEETNAME, --sheetname SHEETNAME
5454
sheet name to convert
5555
-d DELIMITER, --delimiter DELIMITER
56-
delimiter - columns delimiter in csv, 'tab' or 'x09'
56+
delimiter - column delimiter in CSV, 'tab' or 'x09'
5757
for a tab (default: comma ',')
5858
-l LINETERMINATOR, --lineterminator LINETERMINATOR
59-
line terminator - lines terminator in csv, '\n' '\r\n'
59+
line terminator - line terminator in CSV, '\n' '\r\n'
6060
or '\r' (default: os.linesep)
6161
-f DATEFORMAT, --dateformat DATEFORMAT
6262
override date/time format (ex. %Y/%m/%d)
6363
--floatformat FLOATFORMAT
6464
override float format (ex. %.15f)
6565
-i, --ignoreempty skip empty lines
66-
-e, --escape Escape \r\n\t characters
66+
-e, --escape escape \r\n\t characters
6767
-p SHEETDELIMITER, --sheetdelimiter SHEETDELIMITER
6868
sheet delimiter used to separate sheets, pass '' if
69-
you do not need delimiter, or 'x07' or '\\f' for form
69+
you do not need a delimiter, or 'x07' or '\\f' for form
7070
feed (default: '--------')
7171
-q QUOTING, --quoting QUOTING
7272
field quoting, 'none' 'minimal' 'nonnumeric' or 'all' (default: 'minimal')
7373
--hyperlinks, --hyperlinks
7474
include hyperlinks
7575
-I INCLUDE_SHEET_PATTERN [INCLUDE_SHEET_PATTERN ...], --include_sheet_pattern INCLUDE_SHEET_PATTERN [INCLUDE_SHEET_PATTERN ...]
76-
only include sheets named matching given pattern, only
77-
effects when -a option is enabled.
76+
only include sheets with names matching the given pattern, only
77+
affects when -a option is enabled.
7878
-E EXCLUDE_SHEET_PATTERN [EXCLUDE_SHEET_PATTERN ...], --exclude_sheet_pattern EXCLUDE_SHEET_PATTERN [EXCLUDE_SHEET_PATTERN ...]
79-
exclude sheets named matching given pattern, only
80-
effects when -a option is enabled.
79+
exclude sheets with names matching the given pattern, only
80+
affects when -a option is enabled.
8181
-m, --merge-cells merge cells
8282
```
8383

84-
Usage with folder containing multiple `xlxs` files:
84+
Usage with a folder containing multiple `xlsx` files:
8585
```
8686
python xlsx2csv.py /path/to/input/dir /path/to/output/dir
8787
```
88-
will output each file in the input dir converted to `.csv` in the output dir. If omitting the output dir it will output the converted files in the input dir
88+
will output each file in the input directory converted to `.csv` in the output directory. If omitting the output directory, it will output the converted files in the input directory.
8989

9090
Usage from within Python:
9191
```
9292
from xlsx2csv import Xlsx2csv
9393
Xlsx2csv("myfile.xlsx", outputencoding="utf-8").convert("myfile.csv")
9494
```
9595

96-
Expat SAX parser used for xml parsing.
96+
Expat SAX parser is used for XML parsing.
9797

9898
See alternatives:
9999

@@ -119,6 +119,6 @@ http://poi.apache.org/
119119

120120
Dilshod Temirkhdojaev – [email protected]
121121

122-
Distributed under the MIT LICENSE. See ``LICENSE`` for more information.
122+
Distributed under the MIT license. See ``LICENSE`` for more information.
123123

124124
[https://github.com/dilshod](https://github.com/dilshod)

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ classifiers = [
3535
"Programming Language :: Python :: 3.11",
3636
"Programming Language :: Python :: 3.12",
3737
"Programming Language :: Python :: 3.13",
38+
"Programming Language :: Python :: 3.14",
3839
"Topic :: Office/Business",
3940
"Topic :: Utilities",
4041
]

xlsx2csv.py

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -313,7 +313,7 @@ def _convert(self, sheet_index, outfile):
313313
elif sys.version_info[0] == 3:
314314
outfile = open(outfile, 'w+', encoding=self.options['outputencoding'], newline="")
315315
else:
316-
raise XlsxException("error: version of your python is not supported: " + str(sys.version_info) + "\n")
316+
raise XlsxException("error: version of your Python is not supported: " + str(sys.version_info) + "\n")
317317
closefile = True
318318
elif hasattr(outfile, "open"):
319319
outfile = outfile.open("w+", encoding=self.options['outputencoding'], newline="")
@@ -1112,7 +1112,7 @@ def main():
11121112
if "ArgumentParser" in globals():
11131113
parser = ArgumentParser(description="xlsx to csv converter")
11141114
parser.add_argument('infile', metavar='xlsxfile', help="xlsx file path, use '-' to read from STDIN")
1115-
parser.add_argument('outfile', metavar='outfile', nargs='?', help="output csv file path")
1115+
parser.add_argument('outfile', metavar='outfile', nargs='?', help="output CSV file path")
11161116
parser.add_argument('-v', '--version', action='version', version=__version__)
11171117
nargs_plus = "+"
11181118
argparser = True
@@ -1129,17 +1129,17 @@ def main():
11291129
parser.add_argument("-a", "--all", dest="all", default=False, action="store_true",
11301130
help="export all sheets")
11311131
parser.add_argument("-c", "--outputencoding", dest="outputencoding", default="utf-8", action="store",
1132-
help="encoding of output csv ** Python 3 only ** (default: utf-8)")
1132+
help="encoding of output CSV **Python 3 only** (default: utf-8)")
11331133
parser.add_argument("-d", "--delimiter", dest="delimiter", default=",",
1134-
help="delimiter - columns delimiter in csv, 'tab' or 'x09' for a tab (default: comma ',')")
1134+
help="delimiter - column delimiter in CSV, 'tab' or 'x09' for a tab (default: comma ',')")
11351135
parser.add_argument("--hyperlinks", "--hyperlinks", dest="hyperlinks", action="store_true", default=False,
11361136
help="include hyperlinks")
11371137
parser.add_argument("-e", "--escape", dest='escape_strings', default=False, action="store_true",
1138-
help="Escape \\r\\n\\t characters")
1138+
help="escape \\r\\n\\t characters")
11391139
parser.add_argument("--no-line-breaks", "--no-line-breaks", dest='no_line_breaks', default=False, action="store_true",
1140-
help="Replace \\r\\n\\t with space")
1140+
help="replace \\r\\n\\t with space")
11411141
parser.add_argument("-E", "--exclude_sheet_pattern", nargs=nargs_plus, dest="exclude_sheet_pattern", default="",
1142-
help="exclude sheets named matching given pattern, only effects when -a option is enabled.")
1142+
help="exclude sheets with names matching the given pattern, only affects when -a option is enabled.")
11431143
parser.add_argument("-f", "--dateformat", dest="dateformat",
11441144
help="override date/time format (ex. %%Y/%%m/%%d)")
11451145
parser.add_argument("-t", "--timeformat", dest="timeformat",
@@ -1149,13 +1149,13 @@ def main():
11491149
parser.add_argument("--sci-float", dest="scifloat", default=False, action="store_true",
11501150
help="force scientific notation to float")
11511151
parser.add_argument("-I", "--include_sheet_pattern", nargs=nargs_plus, dest="include_sheet_pattern", default="^.*$",
1152-
help="only include sheets named matching given pattern, only effects when -a option is enabled.")
1152+
help="only include sheets with names matching the given pattern, only affects when -a option is enabled.")
11531153
parser.add_argument("--exclude_hidden_sheets", default=False, action="store_true",
1154-
help="Exclude hidden sheets from the output, only effects when -a option is enabled.")
1154+
help="exclude hidden sheets from the output, only affects when -a option is enabled.")
11551155
parser.add_argument("--ignore-formats", nargs=nargs_plus, type=str, dest="ignore_formats", default=[''],
1156-
help="Ignores format for specific data types.")
1156+
help="ignore format for specific data types")
11571157
parser.add_argument("-l", "--lineterminator", dest="lineterminator", default="\n",
1158-
help="line terminator - lines terminator in csv, '\\n' '\\r\\n' or '\\r' (default: \\n)")
1158+
help="line terminator - line terminator in CSV, '\\n' '\\r\\n' or '\\r' (default: \\n)")
11591159
parser.add_argument("-m", "--merge-cells", dest="merge_cells", default=False, action="store_true",
11601160
help="merge cells")
11611161
parser.add_argument("-n", "--sheetname", dest="sheetname", default=None,
@@ -1165,10 +1165,10 @@ def main():
11651165
parser.add_argument("--skipemptycolumns", dest="skip_trailing_columns", default=False, action="store_true",
11661166
help="skip trailing empty columns")
11671167
parser.add_argument("-p", "--sheetdelimiter", dest="sheetdelimiter", default="--------",
1168-
help="sheet delimiter used to separate sheets, pass '' if you do not need delimiter, or 'x07' "
1168+
help="sheet delimiter used to separate sheets, pass '' if you do not need a delimiter, or 'x07' "
11691169
"or '\\f' for form feed (default: '--------')")
11701170
parser.add_argument("-q", "--quoting", dest="quoting", default="minimal",
1171-
help="quoting - fields quoting in csv, 'none' 'minimal' 'nonnumeric' or 'all' (default: minimal)")
1171+
help="quoting - field quoting in CSV, 'none' 'minimal' 'nonnumeric' or 'all' (default: minimal)")
11721172
parser.add_argument("-s", "--sheet", dest="sheetid", default=1, type=inttype,
11731173
help="sheet number to convert")
11741174
parser.add_argument("--include-hidden-rows", dest="include_hidden_rows", default=False, action="store_true",

0 commit comments

Comments
 (0)