Get tabular data from local files, URLs (http/https &
dathere://) & CKAN (ckan://) into a managed, queryable disk cache - with conditional revalidation (ETag/Last-Modified), transparent zstd compression, BLAKE3 hashing & automatic indexing. Cached resources are reusable by ANY qsv command via thedc:prefix (e.g.qsv stats dc:data.csv), with stale entries auto-refreshed. Efficiently seedsluaulookup tables,validatedynamicEnum reference data & speeds up Datapusher+ harvesting.
Table of Contents | Source: src/cmd/get.rs | 📇🧠🌐 
Description | Examples | Usage | Arguments | Get Options | Common Options
Description ↩
Get tabular data from various sources into a managed, queryable disk cache.
get fetches a resource once, stores it compressed (zstd) and content-addressed
(BLAKE3) in the qsv cache, auto-builds a qsv index for it (for instant random
access & exact record counts), and records rich metadata (ETag, Last-Modified,
sizes, record count, TTL). Re-fetches send a conditional request
(ETag/Last-Modified) so unchanged resources are revalidated, not re-downloaded.
Large remote resources stream into the cache as parallel byte-ranges (tune with
the QSV_GET_PART_SIZE and QSV_GET_CONCURRENCY env vars).
Once cached, a resource can be read by ANY qsv command using the dc: prefix,
e.g. qsv stats dc:data.csv. Stale dc: entries are auto-refreshed.
Supported sources:
local file path
http:// or https:// URL
dathere:// datHere qsv-lookup-tables repo
ckan:// a CKAN resource by id
ckan://? a CKAN resource by name (resource_search)
s3:/// AWS S3 / S3-compatible (get_cloud feature)
gs:/// Google Cloud Storage (get_cloud feature)
az:/// Azure Blob Storage (get_cloud feature)
Cloud credentials are read from the standard AWS_/AZURE_/GOOGLE_* environment
variables (and IAM roles); use --cloud-opt for one-off overrides such as region
or endpoint. (sftp:// is planned for a later release.)
Examples ↩
Fetch a CSV into the cache and read it back with another command:
qsv get https://example.com/data.csv --name data.csvqsv stats dc:data.csvSeed a CKAN reference table:
qsv get "ckan://covid-vaccinations?" --name vax.csvFetch from cloud object storage (requires the get_cloud feature):
qsv get s3://my-bucket/data.csv --name data.csvqsv get gs://my-bucket/data.csv --cloud-opt skip_signature=trueShow what's in the cache, then prune old entries:
qsv get cache-listqsv get cache-prune --older-than=30dVerify cached blob integrity, then retune an entry's TTL & policy:
qsv get cache-list --verifyqsv get cache-set-ttl data.csv --ttl=86400qsv get cache-set-policy data.csv --refresh=neverFor more examples, see tests.
Usage ↩
qsv get cache-list [--verify] [options]
qsv get cache-info [options]
qsv get cache-clear [options]
qsv get cache-prune --older-than=<val> [options]
qsv get cache-set-ttl <name> --ttl=<secs> [options]
qsv get cache-set-policy <name> --refresh=<policy> [options]
qsv get [--cloud-opt <kv>...] [options] <source>...
qsv get --helpArguments ↩
| Argument | Description |
|---|---|
<source> |
One or more sources to fetch into the cache. |
<name> |
For cache-set-ttl / cache-set-policy: the cached logical name (dc: handle) to modify. |
Get Options ↩
| Option | Type | Description | Default |
|---|---|---|---|
‑‑name |
string | Logical cache name (the dc: handle) for the fetched entry. Defaults to the source's terminal path segment. Ignored when multiple sources are given. |
|
‑‑ttl |
integer | Per-entry time-to-live in seconds. -1 = never expire. Also the value applied by cache-set-ttl. | 2419200 |
‑‑refresh |
string | Staleness policy for dc: use: on-stale, always or never. Also the value applied by cache-set-policy. |
on-stale |
‑‑compress |
string | Transparent blob compression: zstd or none. | zstd |
‑‑force |
flag | Re-fetch even if a fresh cached copy exists. | |
‑‑cloud‑opt |
string | Extra cloud object-store config as a key=value pair (repeatable), e.g. region=us-east-1 or skip_signature=true. Overrides the AWS_/AZURE_/GOOGLE_* environment. (get_cloud only) |
|
‑‑ckan‑api |
string | CKAN Action API base URL. Overrides the QSV_CKAN_API env var. | https://data.dathere.com/api/3/action |
‑‑ckan‑token |
string | CKAN API token. Overrides the QSV_CKAN_TOKEN env var. | |
‑‑timeout |
integer | HTTP request timeout in seconds. | 30 |
‑‑older‑than |
string | For cache-prune: remove entries older than this age. Accepts seconds, or a value with an s/m/h/d/w suffix (e.g. 3600, 90m, 30d, 2w). | |
‑‑json |
flag | For cache-list/cache-info: output JSON instead of a table. | |
‑‑verify |
flag | For cache-list: recompute each cached blob's BLAKE3 and report OK/FAIL per name (exits non-zero on any failure). |
Common Options ↩
| Option | Type | Description | Default |
|---|---|---|---|
‑h,‑‑help |
flag | Display this message | |
‑‑cache‑dir |
string | The qsv cache directory. Overrides the QSV_CACHE_DIR env var. | ~/.qsv-cache |
‑o,‑‑output |
string | For a single - for stdout). |
|
‑q,‑‑quiet |
flag | Do not print progress/summary messages to stderr. |
Source: src/cmd/get.rs
| Table of Contents | README