CSV Gen is a Python project that provides a command-line interface (CLI) to generate large CSV files using two different algorithms:
- ✅ NumPy based
- ✅ Faker based
- ❌ Pure Python (deprecated)
To install the project, you can use uv:
uv sync --managed-python --all-groups --compile-bytecodeYou can use the following command to generate a large CSV file:
csv-gen generate -s SIZE_BYTES -w NUM_CPUS -a ALGORITHM [FILENAME]Where:
SIZE_BYTES: The size of the generated file in bytes (default is a gigabyte:1 * 1024**3)NUM_CPUS: The number of CPU cores to use for generation (uses all cores when not provided)ALGORITHM: The algorithm to use, eitherfakerornumpy(default isnumpy)[FILENAME]: The name of the file to generate (default isgenerated.csv)
Or simply run the following to display the help menu:
csv-gen generate --helpFor example, to generate a 25 GB CSV file (25 * 1024**3 = 26843545600) called output.csv, using 8 CPU cores with the numpy algorithm, you can use:
# The verbose version
csv-gen generate --file-size-gb 26843545600 --cpus 8 output.csv
# Or the short version
csv-gen generate -s 26843545600 -w 8 output.csvTo generate a 1 GB CSV file, called generated.csv, utilising all available CPU cores (and the numpy algorithm), you can simply call the default command:
csv-gen generateTo generate a 50 MB CSV file (50 * 1024**2 = 52428800) called data_faker.csv, using 6 CPU cores with the faker algorithm, you can use:
# The verbose version
csv-gen generate --file-size-gb 52428800 --cpus 6 --algorithm faker data_faker.csv
# Or the short version
csv-gen generate -s 52428800 -w 6 -a faker data_faker.csvThis project is licensed under the terms of the MIT license.

