A high-throughput CSV parser optimized for numeric data, built with SWAR (SIMD Within A Register) techniques for maximum performance.
Test Environment: Apple M1, macOS, single-threaded
Test Data: 1M rows, 48 MB (7 columns: timestamp, bid, ask, bidqty, askqty, last, volume)
| Method | Time (s) | Rows | MB/s | Checksum |
|---|---|---|---|---|
| cpp-fread+scalar | 0.057 | 1,000,000 | 848 | 6966795218671251328 |
| rust-mmap | 0.058 | 1,000,000 | 823 | 6966795218671251328 |
| cpp-swar+fread | 0.060 | 1,000,000 | 803 | 6966795218671251328 |
| cpp-batch-all | 0.061 | 1,000,000 | 789 | 6966795218671251328 |
| cpp-stream-1MB | 0.064 | 1,000,000 | 756 | 6966795218671251328 |
| cpp-mmap-batch | 0.067 | 1,000,000 | 721 | 6966795218671251328 |
| cpp-stream-64KB | 0.067 | 1,000,000 | 715 | 6966795218671251328 |
| cpp-stream-4KB | 0.068 | 1,000,000 | 708 | 6966795218671251328 |
| java-mmap | 0.070 | 1,000,000 | 686 | 6966795218671251328 |
| cpp-stream-1.5KB | 0.078 | 1,000,000 | 617 | 6966795218671251328 |
| cpp-swar+mmap | 0.100 | 1,000,000 | 482 | 6966795218671251328 |
| rust-csv | 0.216 | 1,000,000 | 223 | 6966795218671251328 |
| rust-manual | 0.259 | 1,000,000 | 186 | 6966795218671251328 |
| python-optimized | 0.318 | 1,000,000 | 151 | 6966795218671251328 |
| python-simple | 0.339 | 1,000,000 | 142 | 6966795218671251328 |
| java-buffered | 0.523 | 1,000,000 | 92 | 6966795218671251328 |
| cpp-ifstream+getline | 0.600 | 1,000,000 | 80 | 6966795218671251328 |
All checksums match — verifies correctness across all implementations.
| Percentile | Latency |
|---|---|
| Mean | 77 ns |
| P50 | 83 ns |
| P99 | 166 ns |
| P99.9 | 916 ns |
git clone https://github.com/your-username/CSV-Parsing-Engine.git
cd CSV-Parsing-Engine
./setup.sh
./run_benchmarks.shOr run the consolidated results collector:
./collect_results.sh data/test.csv 7 ┌─────────────────────────────────────┐
│ INPUT │
│ CSV File / Network Stream │
└──────────────┬──────────────────────┘
│
┌────────────────────────┴────────────────────────┐
│ │
▼ ▼
┌───────────────────┐ ┌───────────────────┐
│ BATCH PARSER │ │ STREAMING PARSER │
│ (csvparser.cpp) │ │(streaming_parser) │
└─────────┬─────────┘ └─────────┬─────────┘
│ │
│ ┌─────────────────────────────────────────┐ │
└──► SWAR PROCESSING CORE ◄───┘
│ │
│ ┌─────────────────────────────────┐ │
│ │ has_byte() - Delimiter Detect │ │
│ │ XOR + subtract trick for │ │
│ │ parallel byte comparison │ │
│ └─────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────┐ │
│ │ parse_8_digits() - SWAR Parse │ │
│ │ 8 ASCII digits → int64 in │ │
│ │ 4 multiplications │ │
│ └─────────────────────────────────┘ │
└────────────────┬────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ STRUCTURE-OF-ARRAYS │
│ (Column-major storage) │
│ │
│ col[0]: [v0, v1, v2, v3, ...] │
│ col[1]: [v0, v1, v2, v3, ...] │
│ col[2]: [v0, v1, v2, v3, ...] │
│ │
│ Cache-friendly sequential access │
└─────────────────────────────────────────┘
# Generate CSV with N rows (7 columns: timestamp, bid, ask, bidqty, askqty, last, volume)
./build/tools/gen_csv data/custom.csv 5000000 # 5M rows
./build/tools/gen_csv data/large.csv 20000000 # 20M rows# All benchmarks with default test data
./run_benchmarks.sh
# Individual benchmarks with custom data
./build/src/benchmark_bin data/custom.csv # Batch methods comparison
./build/src/streaming_bench data/custom.csv # Streaming throughput
./build/src/latency_bench data/custom.csv # Per-row latencypython3 baselines/python_simple_bench.py data/test.csv
cd baselines && java JavaBench ../data/test.csv && cd ..
cd baselines/rust && cargo run --release -- ../../data/test.csv && cd ../..#include "csvparser.h"
CSVParser parser;
parser.load("data.csv");
parser.parse();
// Access column-major data
const auto& data = parser.data(); // vector<vector<int64_t>>
int64_t value = data[col][row];#include "streaming_parser.h"
StreamingCSVParser parser(7); // 7 columns
// Process data as it arrives
parser.feed(chunk, len, [](const std::vector<int64_t>& row) {
// Handle each row immediately
process(row[0], row[1], ...);
});
// Flush any remaining partial row
parser.flush(callback);CSV-Parsing-Engine/
├── src/
│ ├── csvparser.h/cpp # Batch parser with mmap/fread + SWAR
│ ├── streaming_parser.h/cpp # Streaming parser for chunked data
│ ├── benchmark.cpp # Batch benchmark harness
│ ├── streaming_bench.cpp # Streaming benchmark
│ └── latency_bench.cpp # Per-row latency measurement
├── tools/
│ └── gen_csv.cpp # Test data generator
├── baselines/
│ ├── python_simple_bench.py # Python stdlib benchmark
│ ├── python_bench.py # Python pandas benchmark (optional)
│ ├── JavaBench.java # Java mmap benchmark
│ └── rust/ # Rust csv crate benchmark
├── setup.sh # One-command dependency setup
├── run_benchmarks.sh # Run all benchmarks
├── collect_results.sh # Consolidated results table
└── README.md
// Check 8 bytes at once for delimiter presence
inline uint64_t has_byte(uint64_t chunk, uint8_t byte) {
uint64_t x = chunk ^ (0x0101010101010101ULL * byte);
return ((x - 0x0101010101010101ULL) & ~x) & 0x8080808080808080ULL;
}// Parse 8 ASCII digits to integer in parallel
inline int64_t parse_8_digits(const char* p) {
uint64_t chunk;
memcpy(&chunk, p, 8);
chunk = (chunk & 0x0F0F0F0F0F0F0F0FULL);
chunk = (chunk * 10 + (chunk >> 8)) & 0x00FF00FF00FF00FFULL;
chunk = (chunk * 100 + (chunk >> 16)) & 0x0000FFFF0000FFFFULL;
chunk = (chunk * 10000 + (chunk >> 32)) & 0x00000000FFFFFFFFULL;
return chunk;
}- C++17 compiler (GCC 7+, Clang 5+, MSVC 2017+)
- CMake 3.10+
- Python 3.6+ (for Python baseline)
- Java 11+ (for Java baseline)
- Rust/Cargo (optional, for Rust baseline)
- Numeric-only fields (no quoted strings or escape sequences)
- Single-threaded (multi-threaded sharding planned)
- Integer parsing only (float support planned)
MIT