Skip to content

Conversation

artem-shelkovnikov
Copy link
Member

@artem-shelkovnikov artem-shelkovnikov commented Jun 17, 2025

Closes #3444

This PR adds support for datetime values that cannot be handled by python.

Before this change instances of datetime that are outside of [datetime.min; datetime.max] range will raise an error:

year 643385 is out of range (Consider Using CodecOptions(datetime_conversion=DATETIME_AUTO) or MongoClient(datetime_conversion='DATETIME_AUTO')). See: https://pymongo.readthedocs.io/en/stable/examples/datetimes.html#handling-out-of-range-datetimes

This PR makes use of DatetimeConversion.DATETIME_AUTO conversion setting.

In practice it will mean that:

  • datetime objects that can be Python datetimes will be parsed as such and sent to Elasticsearch as datetimes
  • If driver cannot parse the object as Python datetime it will be serlaised as a long

This, though, can be problematic due to internal Elasticsearch type conversion (something that is already a problem with MongoDB connector to some extent):

Elasticsearch allows conversion of long to datetime, but does not allow the opposite.

Demonstration:

PUT connector-mongodb-f1e8/_doc/zdO8fZcBkReHFeyYyxGV
{
    "address": """3821 Jennifer Key
Victoriaville, MD 12105""",
    "birthdate": "1976-11-08",
    "unique_id": "46292e6c-5227-4715-879c-1932ccae9062",
    "some_small_datetime": -4611686018427388000, # this field is a long
    "some_zero_datetime": "1970-01-01T00:00:00", # this field is a datetime
    "name": "Earl Phillips",
    "comment": "Four tax per.",
    "id": "6851474d557891815b63f1e3",
    "time": "12:23:15",
    "some_large_datetime": 4611686018427388000,
    "fun_field": 4611686018427388000
}

OK

{
    "address": """3821 Jennifer Key
Victoriaville, MD 12105""",
    "birthdate": "1976-11-08",
    "unique_id": "46292e6c-5227-4715-879c-1932ccae9062",
    "some_small_datetime": -4611686018427388000, # this field is a long
    "some_zero_datetime": -4611686018427388000, # this field is a datetime
    "name": "Earl Phillips",
    "comment": "Four tax per.",
    "id": "6851474d557891815b63f1e3",
    "time": "12:23:15",
    "some_large_datetime": 4611686018427388000,
    "fun_field": 4611686018427388000
}

OK


{
    "address": """3821 Jennifer Key
Victoriaville, MD 12105""",
    "birthdate": "1976-11-08",
    "unique_id": "46292e6c-5227-4715-879c-1932ccae9062",
    "some_small_datetime": "1970-01-01T00:00:00", # this field is a long
    "some_zero_datetime": -4611686018427388000, # this field is a datetime
    "name": "Earl Phillips",
    "comment": "Four tax per.",
    "id": "6851474d557891815b63f1e3",
    "time": "12:23:15",
    "some_large_datetime": 4611686018427388000,
    "fun_field": 4611686018427388000
}

[1:175] failed to parse field [some_small_datetime] of type [long] in document with id 'zdO8fZcBkReHFeyYyxGV'. Preview of field's value: '1970-01-01T00:00:00'

In practice it means that if a field in MongoDB collection contains both valid and invalid datetimes (from python's perspective) then the ingestion might fail depending on order of insertion, because the mapping are dynamic.

For example, if first record will contain an out-of-range datetime, then the field will be inferred as long, and next time a in-range datetime is met the ingestion will fail, because instances of datetime cannot be converted to long in Elasticsearch.

The way to fix it would be to manually define the mapping for the index, which is imperfect.

Alternative solution would be introduction of a flag in MongoDB connector configuration that exposes datetime_conversion option. Additionally we can have a flag that will be "treat datetimes as longs" if needed, but it looks like an overkill.

Checklists

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • if there is no GH issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)
  • For bugfixes: backport safely to all minor branches still receiving patch releases
  • Considered corresponding documentation changes
  • Contributed any configuration settings changes to the configuration reference
  • if you added or changed Rich Configurable Fields for a Native Connector, you made a corresponding PR in Kibana

Release Note

MongoDB connector: set default datetime_conversion to DatetimeConversion.DATETIME_AUTO to try to prevent errors when receiving out-of-range datetime values from MongoDB. See https://www.mongodb.com/docs/languages/python/pymongo-driver/current/data-formats/dates-and-times/#handling-out-of-range-datetimes for additional information.

@mattnowzari
Copy link
Contributor

@artem-shelkovnikov giving ✅ but if this has to sit longer until we get more info from the customer that is OK ofc 😄

@artem-shelkovnikov
Copy link
Member Author

Thanks a ton @mattnowzari!

I'm indeed not planning to merge it, but wait for more information about the bug and discuss the approach after we've got the full bug details :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MongoDB connector - add support for DatetimeMS objects

3 participants