While reviewing packagedb/package_managers.py, I noticed a bug in PypiVersionAPI.get_latest_date().
The function currently assumes that every download entry in a PyPI release contains upload_time_iso_8601. If one of the entries is missing that field, current_date is never assigned, but it is still used later in the loop. That can raise an UnboundLocalError instead of simply skipping the invalid entry.
Current code:
latest_date = None
for download in downloads:
upload_time = download.get("upload_time_iso_8601")
if upload_time:
current_date = dateparser.parse(upload_time)
if not latest_date:
latest_date = current_date
else:
if current_date > latest_date:
latest_date = current_date
return latest_date
A minimal example that can trigger this is:
downloads = [
{},
{"upload_time_iso_8601": "2010-12-23T05:14:23.509436Z"},
]
In this case, the function should ignore the first item and return the parsed date from the second one, but it can fail on the first item because current_date is undefined.
Expected behavior:
skip entries that do not have upload_time_iso_8601
continue processing the remaining valid entries
return the latest valid date
return None if none of the entries contain a usable timestamp
A simple fix would be to initialize current_date = None inside the loop and continue when upload_time_iso_8601 is missing.
Reference:
PyPI JSON API documentation
While reviewing packagedb/package_managers.py, I noticed a bug in PypiVersionAPI.get_latest_date().
The function currently assumes that every download entry in a PyPI release contains upload_time_iso_8601. If one of the entries is missing that field, current_date is never assigned, but it is still used later in the loop. That can raise an UnboundLocalError instead of simply skipping the invalid entry.
Current code:
latest_date = None
for download in downloads:
upload_time = download.get("upload_time_iso_8601")
if upload_time:
current_date = dateparser.parse(upload_time)
if not latest_date:
latest_date = current_date
else:
if current_date > latest_date:
latest_date = current_date
return latest_date
A minimal example that can trigger this is:
downloads = [
{},
{"upload_time_iso_8601": "2010-12-23T05:14:23.509436Z"},
]
In this case, the function should ignore the first item and return the parsed date from the second one, but it can fail on the first item because current_date is undefined.
Expected behavior:
skip entries that do not have upload_time_iso_8601
continue processing the remaining valid entries
return the latest valid date
return None if none of the entries contain a usable timestamp
A simple fix would be to initialize current_date = None inside the loop and continue when upload_time_iso_8601 is missing.
Reference:
PyPI JSON API documentation