Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions airbyte_cdk/sources/file_based/file_types/excel_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,9 @@ def parse_records(
# DataFrame.to_dict() method returns datetime values in pandas.Timestamp values, which are not serializable by orjson
# DataFrame.to_json() returns string with datetime values serialized to iso8601 with microseconds to align with pydantic behavior
# see PR description: https://github.com/airbytehq/airbyte/pull/44444/
yield from orjson.loads(
df.to_json(orient="records", date_format="iso", date_unit="us")
)
for index, row in df.iterrows():
# Convert each row (as a Series) to a JSON string
yield orjson.loads(row.to_json(date_format="iso", date_unit="us"))
Comment on lines +121 to +123
Copy link

Copilot AI Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using df.iterrows() is inefficient for large DataFrames as it returns copies of data and has significant overhead. Consider using df.to_dict('records') with manual datetime conversion, or df.itertuples() for better performance while maintaining the memory benefits.

Suggested change
for index, row in df.iterrows():
# Convert each row (as a Series) to a JSON string
yield orjson.loads(row.to_json(date_format="iso", date_unit="us"))
# Efficiently convert the DataFrame to a list of records with proper datetime serialization
records = orjson.loads(df.to_json(orient="records", date_format="iso", date_unit="us"))
for record in records:
yield record

Copilot uses AI. Check for mistakes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this copilot recommendation, should we use itertuples instead of iterrows?


except Exception as exc:
# Raise a RecordParseError if any exception occurs during parsing
Expand Down
Loading