Skip to content

Commit 8d09f07

Browse files
authored
0.4.0-alpha: save on windows, remove form history, cli (#44)
breaking/important changes: * bugfix: stream didnt print newline after each item * removed form-history backup, see #43 * lib: `browserexport.save.backup_history` can return None, if you passed `to="-"` (this tries to print the database to STDOUT) New Features/Improvements: * Supports lots more windows paths * Added opera, librewolf, floorp * better CLI error handling/help text * can parse jsonl, jsonl.gz, json.gz files * can write database to STDOUT/read databases from STDIN
1 parent 01bb89a commit 8d09f07

32 files changed

+710
-264
lines changed

README.md

Lines changed: 97 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
# browserexport
22

3-
[![PyPi version](https://img.shields.io/pypi/v/browserexport.svg)](https://pypi.python.org/pypi/browserexport) [![Python 3.7|3.8|3.9](https://img.shields.io/pypi/pyversions/browserexport.svg)](https://pypi.python.org/pypi/browserexport) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)
3+
[![PyPi version](https://img.shields.io/pypi/v/browserexport.svg)](https://pypi.python.org/pypi/browserexport) [![Python 3.8|3.9|3.10|3.11](https://img.shields.io/pypi/pyversions/browserexport.svg)](https://pypi.python.org/pypi/browserexport) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)
44

55
- [Supported Browsers](#supported-browsers)
66
- [Install](#install)
77
- [Usage](#usage)
88
- [`save`](#save)
99
- [`inspect`/`merge`](#inspectmerge)
1010
- [Serializing to JSON](#json)
11+
- [Shell Completion](#shell-completion)
1112
- [Usage with HPI](#hpi)
1213
- [Library Usage](#library-usage)
1314
- [Comparisons with promnesia](#comparisons-with-promnesia)
@@ -39,11 +40,14 @@ This currently supports:
3940

4041
- [Firefox](https://www.mozilla.org/en-US/firefox/new/)
4142
- [Waterfox](https://www.waterfox.net/)
43+
- [Floorp](https://floorp.app/)
44+
- [Librewolf](https://librewolf.net/)
4245
- Firefox Android (pre-2020 schema and current [Fenix](https://github.com/mozilla-mobile/fenix))
4346
- [Chrome](https://www.google.com/chrome/)
4447
- [Chromium](https://www.chromium.org/chromium-projects/)
4548
- [Brave](https://brave.com/)
4649
- [Vivaldi](https://vivaldi.com/)
50+
- [Opera](https://www.opera.com/)
4751
- [Arc](https://arc.net/)
4852
- [Edge](https://www.microsoft.com/edge) (and [Dev Channel](https://www.microsoft.com/edge/download/insider))
4953
- [Safari](https://www.apple.com/safari/)
@@ -67,30 +71,30 @@ Usage: browserexport save [OPTIONS]
6771
Backs up a current browser database file
6872
6973
Options:
70-
-b, --browser [chrome|firefox|safari|brave|waterfox|chromium|vivaldi|palemoon|arc|edge|edgedev]
74+
-b, --browser
75+
[chrome | firefox | opera | safari | brave | waterfox |
76+
librewolf | floorp | chromium | vivaldi | palemoon | arc |
77+
edge | edgedev]
7178
Browser name to backup history for
72-
--form-history [firefox] Browser name to backup form (input field)
73-
history for
74-
--pattern TEXT Pattern for the resulting timestamped
75-
filename, should include an str.format
76-
replacement placeholder
77-
-p, --profile TEXT Use to pick the correct profile to back up.
78-
If unspecified, will assume a single profile
79-
[default: *]
80-
--path FILE Specify a direct path to a database to back
81-
up
82-
-t, --to DIRECTORY Directory to store backup to [required]
83-
--help Show this message and exit.
79+
--pattern TEXT Pattern for the resulting timestamped filename, should include an
80+
str.format replacement placeholder for the date [default:
81+
browser_name-{}.extension]
82+
-p, --profile TEXT Use to pick the correct profile to back up. If unspecified, will assume a
83+
single profile [default: *]
84+
--path FILE Specify a direct path to a database to back up
85+
-t, --to DIRECTORY Directory to store backup to. Pass '-' to print database to STDOUT
86+
[required]
87+
-h, --help Show this message and exit.
8488
```
8589

86-
Must specify one of `--browser`, `--form-history` or `--path`
90+
Must specify one of `--browser`, or `--path`
8791

8892
After your browser history reaches a certain size, browsers typically remove old history over time, so I'd recommend backing up your history periodically, like:
8993

9094
```shell
91-
$ browserexport save -b firefox --to ~/data/browser_history
92-
$ browserexport save -b chrome --to ~/data/browser_history
93-
$ browserexport save -b safari --to ~/data/browser_history
95+
$ browserexport save -b firefox --to ~/data/browsing
96+
$ browserexport save -b chrome --to ~/data/browsing
97+
$ browserexport save -b safari --to ~/data/browsing
9498
```
9599

96100
That copies the sqlite databases which contains your history `--to` some backup directory.
@@ -99,7 +103,7 @@ If a browser you want to backup is Firefox/Chrome-like (so this would be able to
99103

100104
```shell
101105
$ browserexport save --path ~/.somebrowser/profile/places.sqlite \
102-
--to ~/data/browser_history
106+
--to ~/data/browsing
103107
```
104108

105109
The `--pattern` argument can be used to change the resulting filename for the browser, e.g. `--pattern 'places-{}.sqlite'` or `--pattern "$(uname)-{}.sqlite"`. The `{}` is replaced by the browser name.
@@ -125,19 +129,7 @@ For Firefox Android [Fenix](https://github.com/mozilla-mobile/fenix/), the datab
125129

126130
### `inspect`/`merge`
127131

128-
```
129-
Usage: browserexport inspect [OPTIONS] SQLITE_DB
130-
131-
Extracts visits from a single sqlite database
132-
133-
Provide a history database as the first argument
134-
Drops you into a REPL to access the data
135-
136-
Options:
137-
-s, --stream Stream JSON objects instead of printing a JSON list
138-
-j, --json Print result to STDOUT as JSON
139-
--help Show this message and exit.
140-
```
132+
These work very similarly, `inspect` is for a single database, `merge` is for multiple databases.
141133

142134
```
143135
Usage: browserexport merge [OPTIONS] SQLITE_DB...
@@ -149,17 +141,17 @@ Usage: browserexport merge [OPTIONS] SQLITE_DB...
149141
150142
Drops you into a REPL to access the data
151143
144+
Pass '-' to read from STDIN
145+
152146
Options:
153147
-s, --stream Stream JSON objects instead of printing a JSON list
154148
-j, --json Print result to STDOUT as JSON
155-
--help Show this message and exit.
149+
-h, --help Show this message and exit.
156150
```
157151

158-
Logs are hidden by default. To show the debug logs set `export BROWSEREXPORT_LOGS=10` (uses [logging levels](https://docs.python.org/3/library/logging.html#logging-levels)) or pass the `--debug` flag.
159-
160152
As an example:
161153

162-
```bash
154+
```
163155
browserexport --debug merge ~/data/firefox/* ~/data/chrome/*
164156
[D 210417 21:12:18 merge:38] merging information from 24 sources...
165157
[D 210417 21:12:18 parse:19] Reading visits from /home/sean/data/firefox/places-20200828223058.sqlite...
@@ -180,12 +172,35 @@ Use vis to interact with the data
180172
[1] ...
181173
```
182174

175+
You can also read from STDIN, so this can be used in conjunction with `save`, to merge databases you've backed up and combine your current browser history:
176+
177+
```bash
178+
browserexport save -b firefox -t - | browserexport merge --json --stream - ~/data/browsing/* >all.jsonl
179+
```
180+
181+
Or, to just print the demo for your current browser history:
182+
183+
```bash
184+
$ browserexport save -b firefox -t - | browserexport inspect -
185+
Demo: Your most common sites....
186+
[('github.com', 21033),
187+
...
188+
```
189+
190+
Or, use [process substitution](https://tldp.org/LDP/abs/html/process-sub.html) to save multiple dbs in parallel and then merge them:
191+
192+
```bash
193+
$ browserexport merge <(browserexport save -b firefox -t -) <(browserexport save -b chrome -t -)
194+
```
195+
196+
Logs are hidden by default. To show the debug logs set `export BROWSEREXPORT_LOGS=10` (uses [logging levels](https://docs.python.org/3/library/logging.html#logging-levels)) or pass the `--debug` flag.
197+
183198
### JSON
184199
185200
To dump all that info to JSON:
186201
187-
```
188-
browserexport merge --json ~/data/browser_history/*.sqlite > ./history.json
202+
```bash
203+
$ browserexport merge --json ~/data/browsing/*.sqlite > ./history.json
189204
du -h history.json
190205
67M history.json
191206
```
@@ -194,24 +209,58 @@ Or, to create a quick searchable interface, using [`jq`](https://github.com/sted
194209
195210
`browserexport merge -j --stream ~/data/browsing/*.sqlite | jq '"\(.url)|\(.metadata.description)"' | awk '!seen[$0]++' | fzf`
196211
197-
Merged files like `history.json` above can also be used as inputs files themselves, this reads those by mapping the JSON onto the `Visit` schema directly. If you don't care about keeping the raw databases for any other auxiliary info like form, bookmark data, or [from_visit](https://github.com/seanbreckenridge/browserexport/issues/30) info and just want the URL, visit date and metadata, you could use `merge` to periodically merge the bulky `.sqlite` files into a JSON dump:
212+
Merged files like `history.json` can also be used as inputs files themselves, this reads those by mapping the JSON onto the `Visit` schema directly.
213+
214+
In addition to `.json` files, this can parse `.jsonl` ([JSON lines](http://jsonlines.org/)) files, which are files which contain newline delimited JSON objects. This allows you to parse JSON objects one at a time, instead of loading the entire file into memory. The `.jsonl` file can be generated with the `--stream` flag:
215+
216+
```
217+
browserexport merge --stream --json ~/data/browsing/*.sqlite > ./history.jsonl
218+
```
219+
220+
_Additionally_, this can parse gzipped versions of those files - files like `history.json.gz` or `history.jsonl.gz`
221+
222+
If you don't care about keeping the raw databases for any other auxiliary info like form, bookmark data, or [from_visit](https://github.com/seanbreckenridge/browserexport/issues/30) info and just want the URL, visit date and metadata, you could use `merge` to periodically merge the bulky `.sqlite` files into a gzipped JSONL dump:
198223
199224
```bash
200-
cd ~/data/browsing
201225
# backup databases
202226
rsync -Pavh ~/data/browsing ~/.cache/browsing
203-
# merge all sqlite databases into a single JSON file
204-
browserexport --debug merge --json * > '/tmp/browsing.json'
205-
# remove sqlite databases
206-
rm *.sqlite *.db
227+
# merge all sqlite databases into a single compressed, jsonl file
228+
browserexport --debug merge --json --stream ~/data/browsing/* > '/tmp/browsing.jsonl'
229+
gzip '/tmp/browsing.jsonl'
230+
# test reading gzipped file
231+
browserexport --debug inspect '/tmp/browsing.jsonl.gz'
232+
# remove all old datafiles
233+
rm ~/data/browsing/*
207234
# move merged data to database directory
208-
mv /tmp/browsing.json ~/data/browsing
209-
# test reading the merged data
210-
browserexport merge ~/data/browsing/*
235+
mv /tmp/browsing.jsonl.gz ~/data/browsing
211236
```
212237
213238
I do this every couple months with a script [here](https://github.com/seanbreckenridge/bleanser/blob/master/bin/merge-browser-history), and then sync my old databases to a harddrive for more long-term storage
214239
240+
## Shell Completion
241+
242+
This uses `click`, which supports [shell completion](https://click.palletsprojects.com/en/8.1.x/options/) for `bash`, `zsh` and `fish`. To generate the completion on startup, put one of the following in your shell init file (`.bashrc`/`.zshrc` etc)
243+
244+
```bash
245+
eval "$(_BROWSEREXPORT_COMPLETE=bash_source browserexport)" # bash
246+
eval "$(_BROWSEREXPORT_COMPLETE=zsh_source browserexport)" # zsh
247+
_BROWSEREXPORT_COMPLETE=fish_source browserexport | source # fish
248+
```
249+
250+
Instead of `eval`ing, you could of course save the generated completion to a file and/or lazy load it in your shell config, see [bash completion docs](https://github.com/scop/bash-completion/blob/master/README.md#faq), [zsh functions](https://zsh.sourceforge.io/Doc/Release/Functions.html), [fish completion docs](https://fishshell.com/docs/current/completions.html). For example for `zsh` that might look like:
251+
252+
```bash
253+
mkdir -p ~/.config/zsh/functions/
254+
_BROWSEREXPORT_COMPLETE=zsh_source browserexport > ~/.config/zsh/functions/_browserexport
255+
```
256+
257+
```bash
258+
# in your ~/.zshrc
259+
# update fpath to include the directory you saved the completion file to
260+
fpath=(~/.config/zsh/functions $fpath)
261+
autoload -Uz compinit && compinit
262+
```
263+
215264
## HPI
216265
217266
If you want to cache the merged results, this has a [module in HPI](https://github.com/karlicoss/HPI) which handles locating/caching and querying the results. See [setup](https://github.com/karlicoss/HPI/blob/master/doc/SETUP.org#install-main-hpi-package) and [module setup](https://github.com/karlicoss/HPI/blob/master/doc/MODULES.org#mybrowser).
@@ -257,7 +306,7 @@ from browserexport.merge import read_and_merge
257306
read_and_merge(["/path/to/database", "/path/to/second/database", "..."])
258307
```
259308
260-
You can also use [`sqlite_backup`](https://github.com/seanbreckenridge/sqlite_backup) to copy your current browser history into a sqlite connection in memory, without ever writing to disk:
309+
You can also use [`sqlite_backup`](https://github.com/seanbreckenridge/sqlite_backup) to copy your current browser history into a sqlite connection in memory, as a `sqlite3.Connection`
261310
262311
```python
263312
from browserexport.browsers.all import Firefox

0 commit comments

Comments
 (0)