Skip to content

Commit a3372fe

Browse files
authored
Fix target_window when redirect is > 1 day (#53)
The target_window parameter for `get_memento()` is supposed to limit how far off in time from the requested time you can get a memento for when the `exact` parameter is `False`. However, it only works correctly when the target is off by less than a day! (The scenarios we were originally concerned with in EDGI almost invariably were on the scale of a few hours, so I guess that's how this error snuck in.) If you set `exact=True` (the default), this bug wouldn’t be triggered. We were checking the target offset by the number of seconds, irrespective of the additional number of days involved. This fixes the issue by checking by total_seconds, which converts the days into seconds and includes them.
1 parent 4dd3814 commit a3372fe

File tree

5 files changed

+624
-2
lines changed

5 files changed

+624
-2
lines changed

docs/source/release-history.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,12 @@
22
Release History
33
===============
44

5+
v0.2.5 (2020-10-19)
6+
-------------------
7+
8+
This release fixes a bug where the ``target_window`` parameter for :meth:`wayback.WaybackClient.get_memento` did not work correctly if the memento you were redirected to was off by more than a day from the reequested time. See `#53 <https://github.com/edgi-govdata-archiving/wayback/pull/53>`_ for more.
9+
10+
511
v0.2.4 (2020-09-07)
612
-------------------
713

wayback/_client.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -748,7 +748,7 @@ def get_memento(self, url, exact=True, exact_redirects=None,
748748
# been produced by an earlier memento redirect -- it's
749749
# just the *closest* one. The first job here is to make
750750
# sure it fits within our target window.
751-
if abs(target_date - original_date).seconds <= target_window:
751+
if abs(target_date - original_date).total_seconds() <= target_window:
752752
# The redirect will point to the closest-in-time
753753
# SURT URL, which will often not be an exact URL
754754
# match. If we aren't looking for exact matches,
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
interactions:
2+
- request:
3+
body: null
4+
headers:
5+
Accept-Encoding:
6+
- gzip, deflate
7+
User-Agent:
8+
- wayback/0.2.4.post2.dev0+g0c2a63c (+https://github.com/edgi-govdata-archiving/wayback)
9+
method: GET
10+
uri: http://web.archive.org/web/20171101000000id_/https://www.fws.gov/birds/
11+
response:
12+
body:
13+
string: ''
14+
headers:
15+
Connection:
16+
- keep-alive
17+
Content-Length:
18+
- '0'
19+
Content-Type:
20+
- text/plain; charset=utf-8
21+
Date:
22+
- Mon, 19 Oct 2020 08:07:14 GMT
23+
Location:
24+
- http://web.archive.org/web/20171124151315id_/https://www.fws.gov/birds/
25+
Server:
26+
- nginx/1.15.8
27+
Server-Timing:
28+
- RedisCDXSource;dur=220.587665, PetaboxLoader3.resolve;dur=21.099726, exclusion.robots.policy;dur=0.321841,
29+
CDXLines.iter;dur=21.136237, LoadShardBlock;dur=1556.284805, captures_list;dur=1886.300549,
30+
PetaboxLoader3.datanode;dur=119.223321, esindex;dur=0.009743, exclusion.robots;dur=0.338801
31+
X-App-Server:
32+
- wwwb-app13
33+
X-Archive-Redirect-Reason:
34+
- found capture at 20171124151315
35+
X-Archive-Screenname:
36+
- '0'
37+
X-Cache-Key:
38+
- httpweb.archive.org/web/20171101000000id_/https://www.fws.gov/birds/US
39+
X-Page-Cache:
40+
- HIT
41+
X-location:
42+
- All
43+
X-ts:
44+
- '302'
45+
status:
46+
code: 302
47+
message: FOUND
48+
version: 1

0 commit comments

Comments
 (0)