I noticed that htmldate utilizes the find_date function, which internally relies on examine_header.
Does it make sense to parse the response header from the server? Do servers typically default this to the current date?
Here’s an example where this date is extracted: '2024-12-02'...
from htmldate import find_date
find_date(
"https://octopus.energy/blog/agile-octopus-bigger-story/",
original_date=True,
extensive_search=True,
)
But the published at is actually...
If I comment lines on examine_header we do extract out the correct date (2022-12-13) during # last resort
I noticed that
htmldateutilizes thefind_datefunction, which internally relies onexamine_header.Does it make sense to parse the response header from the server? Do servers typically default this to the current date?
Here’s an example where this date is extracted: '2024-12-02'...
But the published at is actually...
If I comment lines on
examine_headerwe do extract out the correct date (2022-12-13) during# last resort