Skip to content

Parse whole words only #856

Open
Open
@Rajmehta123

Description

@Rajmehta123

Question/Bug

When using search_dates, it picks partial letters from neighboring words as well thus forming an invalid token and wrong datetime objects. For eg:

sample_string = 'Bubble -58.5 06 Mar 2009 in need of -43.4 30 Oct 1974 also contributed for -17.7 26 Dec 2018 '
en_dates = search_dates(sample_string, languages=['en'],settings={'STRICT_PARSING': True})

Output

[('5 06 Mar 2009 in', datetime.datetime(2009, 3, 5, 0, 0)),
 ('4 30 Oct 1974', datetime.datetime(1974, 10, 4, 0, 0)),
 ('7 26 Dec 2018', datetime.datetime(2018, 12, 7, 0, 0))]

It also picked

  1. letter 5 from -58.5 to 06 Mar 2009 forming '5 06 Mar 2009 in'
  2. letter 4 from -43.4 to 30 Oct 1974 forming '4 30 Oct 1974'
  3. letter 7 from -17.7 to 26 Dec 2018 forming '7 26 Dec 2018'

Either include the whole word or exclude it. Just including partial numbers/letters from previous words makes it an invalid token and wrong DateTime objects.

Expected Output

[('06 Mar 2009 in', datetime.datetime(2009, 3, 5, 0, 0)),
 ('30 Oct 1974', datetime.datetime(1974, 10, 4, 0, 0)),
 ('26 Dec 2018', datetime.datetime(2018, 12, 7, 0, 0))]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions