Skip to content

Dateparser does not currently recognized time before date format #903

Open
@reyesjd86

Description

@reyesjd86

Dateparser currently does not recognize 'times' before 'dates' formats. When parsing through a document sometimes 'times' are before 'dates'. It currently brings back a null value if 'times' are before 'dates'. I suggest incorporating something like the following code to find dates and move them to be in front of times. Something like the example below will help to keep your dateparser format working if times are before dates.

import re
import logging
import dateparser
from dateparser import parse

def dateparser(dt_input): 
        try:
            # Move 'Dates' to be in front of 'Times'
            date = re.search(r'(\d+(/|-|\.){1}\d+(/|-|\.){1}\d{1,4})', dt_input)
            dt_input = re.sub(date[1], '', dt_input).rstrip()
            dt_input = str(date[1] +' '+ dt_input)
            dt_input_parsed = parse(str(dt_input))
            return dt_input_parsed
        except Exception as e:
                logging.warning(f"Error in finding time in {dt_input}: {e}")

example:

dateparser("10:56:58 PM UTC+2:00 2/22/2018")
datetime.datetime(2019, 2, 22, 22, 56, 58, tzinfo=<StaticTzInfo 'UTC+02:00'>)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions