Skip to content

Latest commit

 

History

History
149 lines (112 loc) · 7.02 KB

File metadata and controls

149 lines (112 loc) · 7.02 KB

get

Get tabular data from local files, URLs (http/https & dathere://) & CKAN (ckan://) into a managed, queryable disk cache - with conditional revalidation (ETag/Last-Modified), transparent zstd compression, BLAKE3 hashing & automatic indexing. Cached resources are reusable by ANY qsv command via the dc: prefix (e.g. qsv stats dc:data.csv), with stale entries auto-refreshed. Efficiently seeds luau lookup tables, validate dynamicEnum reference data & speeds up Datapusher+ harvesting.

Table of Contents | Source: src/cmd/get.rs | 📇🧠🌐 CKAN

Description | Examples | Usage | Arguments | Get Options | Common Options

Description

Get tabular data from various sources into a managed, queryable disk cache.

get fetches a resource once, stores it compressed (zstd) and content-addressed (BLAKE3) in the qsv cache, auto-builds a qsv index for it (for instant random access & exact record counts), and records rich metadata (ETag, Last-Modified, sizes, record count, TTL). Re-fetches send a conditional request (ETag/Last-Modified) so unchanged resources are revalidated, not re-downloaded. Large remote resources stream into the cache as parallel byte-ranges (tune with the QSV_GET_PART_SIZE and QSV_GET_CONCURRENCY env vars).

Once cached, a resource can be read by ANY qsv command using the dc: prefix, e.g. qsv stats dc:data.csv. Stale dc: entries are auto-refreshed.

Supported sources:
local file path http:// or https:// URL dathere:// datHere qsv-lookup-tables repo ckan:// a CKAN resource by id ckan://? a CKAN resource by name (resource_search) s3:/// AWS S3 / S3-compatible (get_cloud feature) gs:/// Google Cloud Storage (get_cloud feature) az:/// Azure Blob Storage (get_cloud feature) Cloud credentials are read from the standard AWS_/AZURE_/GOOGLE_* environment variables (and IAM roles); use --cloud-opt for one-off overrides such as region or endpoint. (sftp:// is planned for a later release.)

Examples

Fetch a CSV into the cache and read it back with another command:

qsv get https://example.com/data.csv --name data.csv
qsv stats dc:data.csv

Seed a CKAN reference table:

qsv get "ckan://covid-vaccinations?" --name vax.csv

Fetch from cloud object storage (requires the get_cloud feature):

qsv get s3://my-bucket/data.csv --name data.csv
qsv get gs://my-bucket/data.csv --cloud-opt skip_signature=true

Show what's in the cache, then prune old entries:

qsv get cache-list
qsv get cache-prune --older-than=30d

Verify cached blob integrity, then retune an entry's TTL & policy:

qsv get cache-list --verify
qsv get cache-set-ttl data.csv --ttl=86400
qsv get cache-set-policy data.csv --refresh=never

For more examples, see tests.

Usage

qsv get cache-list [--verify] [options]
qsv get cache-info [options]
qsv get cache-clear [options]
qsv get cache-prune --older-than=<val> [options]
qsv get cache-set-ttl <name> --ttl=<secs> [options]
qsv get cache-set-policy <name> --refresh=<policy> [options]
qsv get [--cloud-opt <kv>...] [options] <source>...
qsv get --help

Arguments

 Argument  Description
 <source>  One or more sources to fetch into the cache.
 <name>  For cache-set-ttl / cache-set-policy: the cached logical name (dc: handle) to modify.

Get Options

     Option      Type Description Default
 ‑‑name  string Logical cache name (the dc: handle) for the fetched entry. Defaults to the source's terminal path segment. Ignored when multiple sources are given.
 ‑‑ttl  integer Per-entry time-to-live in seconds. -1 = never expire. Also the value applied by cache-set-ttl. 2419200
 ‑‑refresh  string Staleness policy for dc: use: on-stale, always or never. Also the value applied by cache-set-policy. on-stale
 ‑‑compress  string Transparent blob compression: zstd or none. zstd
 ‑‑force  flag Re-fetch even if a fresh cached copy exists.
 ‑‑cloud‑opt  string Extra cloud object-store config as a key=value pair (repeatable), e.g. region=us-east-1 or skip_signature=true. Overrides the AWS_/AZURE_/GOOGLE_* environment. (get_cloud only)
 ‑‑ckan‑api  string CKAN Action API base URL. Overrides the QSV_CKAN_API env var. https://data.dathere.com/api/3/action
 ‑‑ckan‑token  string CKAN API token. Overrides the QSV_CKAN_TOKEN env var.
 ‑‑timeout  integer HTTP request timeout in seconds. 30
 ‑‑older‑than  string For cache-prune: remove entries older than this age. Accepts seconds, or a value with an s/m/h/d/w suffix (e.g. 3600, 90m, 30d, 2w).
 ‑‑json  flag For cache-list/cache-info: output JSON instead of a table.
 ‑‑verify  flag For cache-list: recompute each cached blob's BLAKE3 and report OK/FAIL per name (exits non-zero on any failure).

Common Options

     Option      Type Description Default
 ‑h,
‑‑help 
flag Display this message
 ‑‑cache‑dir  string The qsv cache directory. Overrides the QSV_CACHE_DIR env var. ~/.qsv-cache
 ‑o,
‑‑output 
string For a single , also write the fetched (decompressed) data to (use - for stdout).
 ‑q,
‑‑quiet 
flag Do not print progress/summary messages to stderr.

Source: src/cmd/get.rs | Table of Contents | README