You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGES.rst
+30Lines changed: 30 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,33 @@
1
+
pywb 0.5.0 changelist
2
+
~~~~~~~~~~~~~~~~~~~~~
3
+
4
+
* Catch live rewrite errors and display more friendly pywb error message.
5
+
6
+
* LiveRewriteHandler and WBHandler refactoring: LiveRewriteHandler now supports a root search page html template.
7
+
8
+
* Proxy mode option: 'unaltered_replay' to proxy archival data with no modifications (no banner, no server or client side rewriting).
9
+
10
+
* Fix client side rewriting (wombat.js) for proxy mode: only rewrite https -> http in absolute urls.
11
+
12
+
* Fixes to memento timemap/timegate to work with framed replay mode.
13
+
14
+
* Support for a fallback handler which will be called from a replay handler instead of a 404 response.
15
+
16
+
The handler, specified via the ``fallback`` option, can be the name of any other replay handler. Typically, it can be used with a live rewrite handler to fetch missing content from live instead of showing a 404.
17
+
18
+
* Live Rewrite can now be included as a 'collection type' in a pywb deployment by setting index path to ``$liveweb``.
19
+
20
+
* ``live-rewrite-server`` has optional ``--proxy host:port`` param to specify a loading live web data through an HTTP/S proxy, such as for use with a recording proxy.
21
+
22
+
* wombat: add document.cookie -> document.WB_wombat_cookie rewriting to check and rewrite Path= to archival url
23
+
24
+
* Better parent relative '../' path rewriting, resolved to correct absolute urls when rewritten. Additional testing for parent relative urls.
25
+
26
+
* New 'proxy_options' block, including 'use_default_coll' to allow defaulting to first collection w/o proxy auth.
27
+
28
+
* Improved support for proxy mode, allow different collections to be selected via proxy auth
@@ -11,9 +11,25 @@ pywb is a python implementation of web archival replay tools, sometimes also kno
11
11
12
12
pywb allows high-quality replay (browsing) of archived web data stored in standardized `ARC <http://en.wikipedia.org/wiki/ARC_(file_format)>`_ and `WARC <http://en.wikipedia.org/wiki/Web_ARChive>`_.
13
13
14
-
*For an example of deployed service using pywb, please see the https://webrecorder.io project*
15
14
16
-
pywb Tools
15
+
Usage Examples
16
+
-----------------------------
17
+
18
+
This README contains a basic overview of using pywb. After reading this intro, consider also taking a look at these seperate projects:
19
+
20
+
* `pywb-webrecorder <https://github.com/ikreymer/pywb-webrecorder>`_ demonstrates a way to use pywb and warcprox to record web content while browsing.
21
+
22
+
* `pywb-samples <https://github.com/ikreymer/pywb-samples>`_ provides additional archive samples with difficult-to-replay content.
23
+
24
+
25
+
The following deployed applications use pywb:
26
+
27
+
* https://perma.cc embeds pywb as part of a larger `open source application <https://github.com/harvard-lil/perma>`_ to provide web archive replay for law libraries.
28
+
29
+
* https://webrecorder.io uses pywb and builds upon pywb-webrecorder to create a hosted web recording and replay system.
30
+
31
+
32
+
pywb Tools Overview
17
33
-----------------------------
18
34
19
35
In addition to the standard wayback machine (explained further below), pywb tool suite includes a
@@ -72,7 +88,7 @@ This process can be done by running the ``cdx-indexer`` script and only needs to
72
88
73
89
Given an archive of warcs at ``myarchive/warcs``
74
90
75
-
1. Create a dir for indexs, .eg. ``myarchive/cdx``
91
+
1. Create a dir for indexes, .eg. ``myarchive/cdx``
76
92
77
93
2. Run ``cdx-indexer --sort myarchive/cdx myarchive/warcs`` to generate .cdx files for each
0 commit comments