Description
Requested by: @aaronsteers
Context
This issue documents the plan to replace datetime functions in the Airbyte Python CDK with new datetime utils from the helper module, ensuring backward compatibility and maintaining consistent datetime handling.
Affected Components
Primary Datetime Classes
-
DatetimeParser (in
airbyte_cdk/sources/declarative/datetime/datetime_parser.py
)- Handles parsing and formatting of datetime objects
- Provides special handling for Unix timestamps and milliseconds
- References: 12
-
MinMaxDatetime (in
airbyte_cdk/sources/declarative/datetime/min_max_datetime.py
)- Compares datetimes against min/max boundaries
- Supports interpolated datetime strings
- References: 147
-
DatetimeBasedCursor (in
airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py
)- Handles incremental syncs over datetime ranges
- Creates state with format {<cursor_field>: }
- References: 163
-
DatetimeFormatInferrer (in
airbyte_cdk/utils/datetime_format_inferrer.py
)- Detects datetime fields in records
- Infers datetime formats
- References: 11
Datetime Conversion Classes
- DateTimeStreamStateConverter (in
airbyte_cdk/sources/streams/concurrent/state_converters/datetime_stream_state_converter.py
) - Handles datetime state conversions for concurrent streams - Already uses some of the new datetime utils
Direct Datetime Method Usage
- datetime.strptime - References: 79
- datetime.strftime - References: 10
- datetime.now - References: 46
- datetime.fromtimestamp - References: 36
- timestamp() * 1000 - References: 6
Migration Plan
1. Update DatetimeParser to use AirbyteDateTime utilities
- Replace direct datetime method calls with AirbyteDateTime utilities
- Maintain special case handling for specific formats
- Ensure backward compatibility with existing code
2. Update MinMaxDatetime to use AirbyteDateTime utilities
- Modify to use updated DatetimeParser
- Ensure consistent datetime format handling
3. Update DatetimeBasedCursor to use AirbyteDateTime utilities
- Replace datetime.now with ab_datetime_now
- Update datetime parsing and formatting
4. Update DatetimeFormatInferrer to use AirbyteDateTime utilities
- Modify _matches_format method to use ab_datetime_try_parse as a fallback
- Maintain existing format detection capabilities
5. Update datetime state converters
- Ensure consistent usage of AirbyteDateTime utilities
- Add fallback to more forgiving parsing
6. Search for and update other datetime usages
- Replace direct datetime method calls with appropriate AirbyteDateTime utility functions
- Focus on maintaining backward compatibility
Implementation Guidelines
-
Parsing Strategy: Use expected parsing inputs but always fallback to more-forgiving parsing logic.
-
Serialization Strategy: Always use the standard RFC/ISO protocol with "T" delimiter when serializing datetime objects to strings.
-
Backward Compatibility: Maintain the same public interfaces to ensure backward compatibility for CDK consumers.
-
ISO8601/RFC3339 Compliance: Ensure all datetime string representations are ISO8601 and RFC3339 compliant, specifically using "T" as the delimiter between date and time components.
-
Timestamp Conversions: Use dedicated AirbyteDateTime class methods for timestamp conversions rather than arbitrary integer math. For example, use to_epoch_millis() instead of timestamp() * 1000.
-
Timezone Handling: Ensure consistent timezone handling, defaulting to UTC when no timezone is specified.
-
Error Handling: Implement proper error handling and fallback mechanisms for parsing to ensure robustness.
-
Testing: Thoroughly test the changes with various datetime formats to ensure compatibility.
Implementation Details
Docs link: https://airbytehq.github.io/airbyte-python-cdk/airbyte_cdk/utils/datetime_helpers.html
Link to Devin run: https://app.devin.ai/sessions/1d967bd422b949e5b4876688033fbaaa
Requested by: Aaron