ONE SINGLE Merge file (CSV or PARQUET file ) #282
navikaran2
started this conversation in
Ideas
Replies: 1 comment
-
|
If I had to rewrite EOD2, I would strongly consider using parquet. But CSV is simple, versatile and everyone is familiar with it compared to parquet. There are a few issues with switching to parquet:
As i check EOD2, the largest file in daily folder is 418KB and overall folder size is 400MB. These are manageable numbers for next few years at least. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Sir, could you please schedule one more corn job, like one single parquet file
i attached the Python code for the same
import os
import polars as pl
🔹 Paths
input_folder = r"C:\3_ Wroking project\eod2-main\src\eod2_data\daily"
output_parquet = os.path.join(input_folder, "merged_data.parquet")
🔹 Fixed base columns and dtypes
schema_overrides = {
"Date": pl.Date,
"Open": pl.Float64,
"High": pl.Float64,
"Low": pl.Float64,
"Close": pl.Float64,
"Volume": pl.Float64,
"Series": pl.Utf8,
"TOTAL_TRADES": pl.Float64,
"QTY_PER_TRADE": pl.Float64,
"DLV_QTY": pl.Float64,
}
base_columns = list(schema_overrides.keys())
🔹 Collect all CSV files
files = [f for f in os.listdir(input_folder) if f.endswith(".csv")]
print(f"📁 Found {len(files)} CSV files")
frames = []
🔹 Read & merge all CSVs
for i, file in enumerate(files, 1):
file_path = os.path.join(input_folder, file)
symbol = file.replace(".csv", "").upper()
print("🧩 Concatenating all frames...")
merged_df = pl.concat(frames, how="vertical_relaxed", rechunk=True)
🔹 Ensure column order & dtypes
merged_df = merged_df.select([
pl.col("Date").cast(pl.Date),
pl.col("Open").cast(pl.Float64),
pl.col("High").cast(pl.Float64),
pl.col("Low").cast(pl.Float64),
pl.col("Close").cast(pl.Float64),
pl.col("Volume").cast(pl.Float64),
pl.col("Series").cast(pl.Utf8),
pl.col("TOTAL_TRADES").cast(pl.Float64),
pl.col("QTY_PER_TRADE").cast(pl.Float64),
pl.col("DLV_QTY").cast(pl.Float64),
pl.col("SYMBOL").cast(pl.Utf8),
])
print("💾 Writing to Parquet...")
merged_df.write_parquet(output_parquet)
print(f"✅ Done! Merged {len(files)} files → {output_parquet}")
Beta Was this translation helpful? Give feedback.
All reactions