ONE SINGLE Merge file (CSV or PARQUET file ) #282

navikaran2 · 2025-11-11T20:07:34Z

navikaran2
Nov 11, 2025

Sir, could you please schedule one more corn job, like one single parquet file

i attached the Python code for the same

import os
import polars as pl

🔹 Paths

input_folder = r"C:\3_ Wroking project\eod2-main\src\eod2_data\daily"
output_parquet = os.path.join(input_folder, "merged_data.parquet")

🔹 Fixed base columns and dtypes

schema_overrides = {
"Date": pl.Date,
"Open": pl.Float64,
"High": pl.Float64,
"Low": pl.Float64,
"Close": pl.Float64,
"Volume": pl.Float64,
"Series": pl.Utf8,
"TOTAL_TRADES": pl.Float64,
"QTY_PER_TRADE": pl.Float64,
"DLV_QTY": pl.Float64,
}

base_columns = list(schema_overrides.keys())

🔹 Collect all CSV files

files = [f for f in os.listdir(input_folder) if f.endswith(".csv")]
print(f"📁 Found {len(files)} CSV files")

frames = []

🔹 Read & merge all CSVs

for i, file in enumerate(files, 1):
file_path = os.path.join(input_folder, file)
symbol = file.replace(".csv", "").upper()

try:
    df = pl.read_csv(
        file_path,
        ignore_errors=True,
        infer_schema_length=1000,
        null_values=["nan", "NaN", "N/A", "null", ""],
        schema_overrides=schema_overrides
    )

    # Keep only required columns
    df = df.select([col for col in base_columns if col in df.columns])

    # Add SYMBOL column
    df = df.with_columns(pl.lit(symbol).alias("SYMBOL"))

    frames.append(df)
except Exception as e:
    print(f"⚠️ Skipped {file}: {e}")

print("🧩 Concatenating all frames...")
merged_df = pl.concat(frames, how="vertical_relaxed", rechunk=True)

🔹 Ensure column order & dtypes

merged_df = merged_df.select([
pl.col("Date").cast(pl.Date),
pl.col("Open").cast(pl.Float64),
pl.col("High").cast(pl.Float64),
pl.col("Low").cast(pl.Float64),
pl.col("Close").cast(pl.Float64),
pl.col("Volume").cast(pl.Float64),
pl.col("Series").cast(pl.Utf8),
pl.col("TOTAL_TRADES").cast(pl.Float64),
pl.col("QTY_PER_TRADE").cast(pl.Float64),
pl.col("DLV_QTY").cast(pl.Float64),
pl.col("SYMBOL").cast(pl.Utf8),
])

print("💾 Writing to Parquet...")
merged_df.write_parquet(output_parquet)

print(f"✅ Done! Merged {len(files)} files → {output_parquet}")

BennyThadikaran · 2025-11-14T18:59:25Z

BennyThadikaran
Nov 14, 2025
Maintainer

If I had to rewrite EOD2, I would strongly consider using parquet. But CSV is simple, versatile and everyone is familiar with it compared to parquet.

There are a few issues with switching to parquet:

A single or multiple large files (as you're suggesting) may exceed github per file size limits. (may require workarounds)
It would require a rewrite of the code.

As i check EOD2, the largest file in daily folder is 418KB and overall folder size is 400MB. These are manageable numbers for next few years at least.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ONE SINGLE Merge file (CSV or PARQUET file ) #282

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

ONE SINGLE Merge file (CSV or PARQUET file ) #282

Uh oh!

navikaran2 Nov 11, 2025

🔹 Paths

🔹 Fixed base columns and dtypes

🔹 Collect all CSV files

🔹 Read & merge all CSVs

🔹 Ensure column order & dtypes

Replies: 1 comment

Uh oh!

BennyThadikaran Nov 14, 2025 Maintainer

navikaran2
Nov 11, 2025

BennyThadikaran
Nov 14, 2025
Maintainer