Skip to content

Commit 2b22c44

Browse files
feat: add raw-grid block layout with bilinear lat/lon point sampling and JS parity
1 parent 949878a commit 2b22c44

20 files changed

Lines changed: 2620 additions & 80 deletions

File tree

FORMAT.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -366,6 +366,7 @@ Block flag bits:
366366
|---:|---|---|
367367
| `0` | `has_leaf_directories` | The block has leaf directories |
368368
| `1` | `has_dict` | The block carries a per-block zstd dictionary |
369+
| `2` | `raw_grid` | Block stores a native source-grid raster instead of a Hilbert tile pyramid; see "Raw-Grid Blocks" below |
369370

370371
Other bits are reserved.
371372

@@ -570,6 +571,103 @@ The payload is zstd-compressed vertical deltas. This codec is valid for `u8` and
570571
- Each following row stores `current - value_above`.
571572
- Arithmetic wraps modulo `2^(8 * dtype_bytes)`.
572573

574+
### Raw-Grid Blocks
575+
576+
Files encoded with `--no-tiles` skip the Web-Mercator pyramid. Each
577+
`(variable_id, time_id)` pair becomes one **raw-grid block** that stores the
578+
native source lat-lon grid chunked into fixed-size sub-tiles. Point queries by
579+
`(lat, lon)` are O(1) range requests; the file does not drive a slippy-map
580+
viewer without on-the-fly resampling.
581+
582+
A raw-grid block reuses the 64-byte block header with `BlockFlagRawGrid`
583+
(`1 << 2`) set in `block_flags`. Field interpretation changes:
584+
585+
- `root_directory_offset` and `root_directory_length` point at the compressed
586+
raw-grid section (the equivalent of the tile directory).
587+
- `leaf_directories_offset` and `leaf_directories_length` are zero.
588+
- `tile_data_offset` and `tile_data_length` describe the concatenated chunk
589+
payload region.
590+
- `num_addressed_tiles` and `num_directory_entries` hold the total chunk count
591+
(`chunk_count_x * chunk_count_y`).
592+
- `num_tile_contents` holds the deduplicated chunk count.
593+
594+
The block-table entry's `codec` is the dominant chunk codec ID for stats; each
595+
chunk payload still carries its own one-byte codec tag.
596+
597+
#### Raw-Grid Section
598+
599+
The raw-grid section is internally compressed (same compression as the file's
600+
catalogs and directories). After decompression:
601+
602+
| Offset | Size | Type | Field |
603+
|---:|---:|---|---|
604+
| `0` | 1 | `u8` | `schema_version`, currently `1` |
605+
| `1` | 1 | `u8` | `chunk_size_log2`; chunk side in source pixels = `1 << value` |
606+
| `2` | 2 | bytes | Reserved, written as zero |
607+
| `4` | 4 | `u32` | `nx`, source grid width in pixels |
608+
| `8` | 4 | `u32` | `ny`, source grid height in pixels |
609+
| `16` | 8 | `f64` | `lat0`, latitude at source row `0` |
610+
| `24` | 8 | `f64` | `lon0`, longitude at source column `0` |
611+
| `32` | 8 | `f64` | `dy`, latitude step per row (may be negative) |
612+
| `40` | 8 | `f64` | `dx`, longitude step per column (may be negative) |
613+
| `48` | 8 | `f64` | `missing_value`, source NoData sentinel (may be NaN) |
614+
| `56` | 4 | `u32` | `chunk_count_x`, = `ceil(nx / chunk_size)` |
615+
| `60` | 4 | `u32` | `chunk_count_y`, = `ceil(ny / chunk_size)` |
616+
| `64` | varint × N | LEB128 `u64` each | `chunk_offsets`, one per chunk row-major |
617+
| ... | varint × N | LEB128 `u64` each | `chunk_lengths`, one per chunk row-major |
618+
619+
`chunk_size_log2` must be in `[4, 12]`, allowing chunk sides of 16..4096
620+
source pixels. `N = chunk_count_x * chunk_count_y`. Chunk index is
621+
`cy * chunk_count_x + cx`, so chunks are stored in row-major order.
622+
623+
`chunk_offsets[i]` is a byte offset relative to the block's `tile_data_offset`.
624+
`chunk_lengths[i]` is the chunk payload length. A `(0, 0)` pair signals
625+
**absent chunk**; decoders fill all pixels with NaN without fetching anything.
626+
627+
#### Chunk Payloads
628+
629+
Each chunk payload is a normal tile blob: one codec tag byte followed by the
630+
codec-specific stream. The current encoder uses `0x01` (constant) when all
631+
quantised values in the chunk are identical, otherwise `0x03`
632+
(bitshuffle + zstd). Lorenzo and delta codecs are not emitted for raw-grid
633+
chunks because edge chunks at the right/bottom border may be non-square.
634+
635+
Chunk pixel count is `chunk_width(cx) * chunk_height(cy)` where:
636+
637+
```text
638+
chunk_width(cx) = min(chunk_size, nx - cx*chunk_size)
639+
chunk_height(cy) = min(chunk_size, ny - cy*chunk_size)
640+
```
641+
642+
Pixels inside a chunk are row-major: `chunk[row*chunk_width + col]` is the
643+
quantised value at source coordinates `(cx*chunk_size + col, cy*chunk_size + row)`,
644+
which corresponds to lat/lon `(lat0 + (cy*chunk_size + row)*dy, lon0 + (cx*chunk_size + col)*dx)`.
645+
646+
#### Point Sampling
647+
648+
To compute the value at `(lat, lon)`:
649+
650+
1. Compute source-grid coordinates `gx = (lon - lon0) / dx`,
651+
`gy = (lat - lat0) / dy`. NaN or out-of-range inputs return NaN.
652+
2. Find the four neighbours `(x0, y0)`, `(x1, y0)`, `(x0, y1)`, `(x1, y1)` with
653+
`x0 = floor(gx)`, `y0 = floor(gy)`, `x1 = x0+1`, `y1 = y0+1` clamped to
654+
`[0, nx-1]` / `[0, ny-1]`.
655+
3. For each neighbour, locate the chunk `cx = x / chunk_size`,
656+
`cy = y / chunk_size`, fetch and decode the chunk, then read the pixel.
657+
4. Bilinearly interpolate: `wx = gx - x0`, `wy = gy - y0`,
658+
`v = ((1-wx)*v00 + wx*v10)*(1-wy) + ((1-wx)*v01 + wx*v11)*wy`.
659+
If any neighbour is NaN, the result is NaN.
660+
661+
For batched queries the reader unions the chunk indices touched by all points
662+
(including the 2×2 bilinear neighbourhood) and coalesces adjacent chunk byte
663+
ranges into shared HTTP-range requests, the same strategy as tile coalescing.
664+
665+
A raw-grid file's header records `min_zoom = 0`, `max_zoom = 0`, and the
666+
authoritative signal that consumers should branch on is `BlockFlagRawGrid` on
667+
each block header. Mixing tiled and raw-grid blocks in the same file is
668+
permitted by the wire format but the current encoder does not produce mixed
669+
files.
670+
573671
### Lorenzo Zstd (`0x05`)
574672

575673
The payload is zstd-compressed 2D Lorenzo residuals. This codec is valid for

README.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,11 @@ wmtiles encode 'composite_wn_*-hd5' -o radar.wmt --max-zoom 7
8282

8383
# CF-1.x / NetCDF4 file (regular lat-lon coords)
8484
wmtiles encode model.nc4 -o model.wmt
85+
86+
# Weather-API mode: skip the Web-Mercator pyramid, store the source grid
87+
# chunked in source coords. Bilinear point queries via Sample / sample();
88+
# encodes ~50-100x faster, files shrink to ~GRIB×0.5..1.0.
89+
wmtiles encode forecast.grib2 -o api.wmt --no-tiles
8590
```
8691

8792
The input format is auto-detected by magic bytes (`GRIB` vs `\x89HDF`) with a
@@ -128,6 +133,23 @@ const px = await t2m.tile({ time: 12, z: 5, x: 16, y: 11 });
128133
// Float32Array(256*256), NaN where the encoder marked NoData
129134
```
130135

136+
For `--no-tiles` archives use the lat/lon sample API; the same range
137+
coalescing keeps a batch of points down to a single byte-range request when
138+
they fall in the same source-grid chunk neighbourhood:
139+
140+
```ts
141+
const v = wmt.variable("2t_2m");
142+
const tempBerlin = await v.sample({ time: 0, lat: 52.52, lon: 13.40 });
143+
144+
const cities = [
145+
{ lat: 52.52, lon: 13.40 }, // Berlin
146+
{ lat: 48.14, lon: 11.58 }, // Munich
147+
{ lat: 53.55, lon: 9.99 }, // Hamburg
148+
];
149+
const values = await v.samples({ time: 0, points: cities });
150+
// Float32Array(3) — NaN outside the source bbox.
151+
```
152+
131153
For multi-tile fetches at the same `(variable, time)`, `tiles()`
132154
coalesces 9 viewport tiles into 1 to 2 range requests:
133155

@@ -171,6 +193,18 @@ Read one tile:
171193
pixels, err := wmt.ReadTile("2t", 12, decode.Coord(5, 16, 11))
172194
```
173195

196+
For `--no-tiles` files use point sampling:
197+
198+
```go
199+
v, err := wmt.Sample("2t", 12, 52.52, 13.40) // lat, lon
200+
// Float32; NaN outside the source bbox.
201+
202+
values, err := wmt.Samples("2t", 12, []decode.SamplePoint{
203+
{Lat: 52.52, Lon: 13.40},
204+
{Lat: 48.14, Lon: 11.58},
205+
})
206+
```
207+
174208
Read a viewport worth of tiles with range coalescing:
175209

176210
```go
@@ -400,6 +434,8 @@ wmtiles serve <file.wmt> [--addr :8080] bundled web viewer
400434
| `--tile-size-log2 N` | `8` (256 px) | tile pixel size, allowed `7..10` (128..1024) |
401435
| `--filter SHORTNAMES` | (none = all) | comma-separated shortNames to keep (GRIB shortName, ODIM quantity, or CF mapping) |
402436
| `--precision NAME=K,…` | shortName/unit lookup, then 10-bit auto-cap | quantisation precision overrides; `=0` forces full-range u16 |
437+
| `--no-tiles` | off | skip the Web-Mercator pyramid; store source-grid chunks for point-query (lat/lon) API use. Output is not viewable on a slippy map without on-the-fly tiling |
438+
| `--raw-chunk-size-log2 N` | `8` (256 px) | source-pixel side of one raw-grid chunk as log2 (4..12 → 16..4096). Only consulted with `--no-tiles` |
403439

404440
---
405441

cmd/gen-testdata/main.go

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,57 @@ func main() {
3939
multistepPath := filepath.Join(outDir, "multistep.wmt")
4040
makeMultistep(multistepPath, pixSize, refTime, fixedNow)
4141
fmt.Printf("wrote %s\n", multistepPath)
42+
43+
rawGridPath := filepath.Join(outDir, "rawgrid.wmt")
44+
makeRawGrid(rawGridPath, refTime, fixedNow)
45+
fmt.Printf("wrote %s\n", rawGridPath)
46+
}
47+
48+
// makeRawGrid writes a small --no-tiles file with a deterministic synthetic
49+
// grid. Used by the JS round-trip tests to lock in raw-grid sample values.
50+
func makeRawGrid(path string, refTime time.Time, now time.Time) {
51+
const nx, ny = 64, 33
52+
const lat0, lon0 = 30.0, 0.0
53+
const dy, dx = 0.5, 0.5
54+
values := make([]float32, nx*ny)
55+
for y := 0; y < ny; y++ {
56+
for x := 0; x < nx; x++ {
57+
// pixel = lat + lon/1000, deterministic and easy to verify.
58+
values[y*nx+x] = float32(lat0+float64(y)*dy) + float32(lon0+float64(x)*dx)/1000
59+
}
60+
}
61+
62+
opts := encoder.Options{
63+
TilePixelSizeLog2: 7,
64+
MinZoom: 0,
65+
MaxZoom: 0,
66+
ReferenceForecastTime: refTime,
67+
TimeCatalog: format.TimeCatalog{
68+
Regular: true, StartMs: refTime.UnixMilli(), IntervalMs: 0, Count: 1,
69+
},
70+
BBox: [4]float64{lon0, lat0, lon0 + float64(nx-1)*dx, lat0 + float64(ny-1)*dy},
71+
Variables: []encoder.VariableSpec{
72+
{Name: "temp", Unit: "K", Precision: 0.001},
73+
},
74+
CreationTime: now,
75+
SkipInternalWorkers: true,
76+
}
77+
enc, err := encoder.NewStreamingEncoder(opts, path)
78+
if err != nil {
79+
die(err)
80+
}
81+
if err := enc.EncodeRawGridBlock(encoder.RawGridSpec{
82+
Variable: "temp", TimeStep: 0,
83+
Nx: nx, Ny: ny,
84+
Lat0: lat0, Lon0: lon0,
85+
DY: dy, DX: dx,
86+
Precision: 0.001,
87+
}, values); err != nil {
88+
die(err)
89+
}
90+
if err := enc.Finish(); err != nil {
91+
die(err)
92+
}
4293
}
4394

4495
// 4 hourly steps × 2 variables. Pixel values encode (timeStep, variable) so a

cmd/wmtiles/cli.go

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,6 @@ import (
88
"github.com/hstin-de/wmtiles/parser"
99
)
1010

11-
// Each field is one subcommand for kong; the Run() method on the subcommand
12-
// struct receives the parsed flags and drives the real work.
1311
type CLI struct {
1412
Encode encodeCmd `cmd:"" help:"encode GRIB2 or HDF5 sources into a fresh .wmt"`
1513
Extend extendCmd `cmd:"" help:"append blocks for new (variable, time) pairs to an existing .wmt"`
@@ -24,9 +22,11 @@ type CLI struct {
2422
type encodeCmd struct {
2523
Output string `short:"o" required:"" placeholder:"PATH" help:"output .wmt path"`
2624
Format string `enum:"auto,grib2,hdf5" default:"auto" help:"override input-format auto-detection"`
27-
MinZoom uint8 `default:"0" help:"minimum zoom level"`
28-
MaxZoom uint8 `default:"5" help:"maximum zoom level"`
25+
MinZoom uint8 `default:"0" help:"minimum zoom level (ignored when --no-tiles is set)"`
26+
MaxZoom uint8 `default:"5" help:"maximum zoom level (ignored when --no-tiles is set)"`
2927
TileSizeLog2 uint8 `name:"tile-size-log2" default:"8" help:"tile size as log2 of pixel count (7..10 -> 128..1024)"`
28+
NoTiles bool `name:"no-tiles" help:"skip the Web-Mercator pyramid; store source-grid chunks for point-query (lat/lon) API use. Output is not viewable on a slippy map without on-the-fly tiling"`
29+
RawChunkSizeLog2 uint8 `name:"raw-chunk-size-log2" default:"8" help:"source-pixel side of one raw-grid chunk as log2 (4..12 -> 16..4096). Only consulted with --no-tiles"`
3030
Filter string `help:"comma-separated source variable shortNames to keep (default: all)"`
3131
Precision string `placeholder:"NAME=K,..." help:"per-variable quantisation precision overrides (default: lookup table + auto-cap)"`
3232
DisableDeltaCodec bool `help:"force bitshuffle-only encoding (faster, larger files)"`
@@ -44,9 +44,12 @@ func (c *encodeCmd) Validate() error {
4444
if c.TileSizeLog2 < 7 || c.TileSizeLog2 > 10 {
4545
return fmt.Errorf("tile-size-log2 must be 7..10, got %d", c.TileSizeLog2)
4646
}
47-
if c.MaxZoom < c.MinZoom {
47+
if !c.NoTiles && c.MaxZoom < c.MinZoom {
4848
return fmt.Errorf("max-zoom %d < min-zoom %d", c.MaxZoom, c.MinZoom)
4949
}
50+
if c.NoTiles && (c.RawChunkSizeLog2 < 4 || c.RawChunkSizeLog2 > 12) {
51+
return fmt.Errorf("raw-chunk-size-log2 must be 4..12, got %d", c.RawChunkSizeLog2)
52+
}
5053
return nil
5154
}
5255

cmd/wmtiles/encode_cmd.go

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -187,8 +187,13 @@ func runEncode(c *encodeCmd) error {
187187

188188
ui.Section("Settings")
189189
ui.KV("source format", formatLabel)
190-
ui.KVf("zoom range", "%d..%d", c.MinZoom, c.MaxZoom)
191-
ui.KVf("tile size", "%d px", 1<<c.TileSizeLog2)
190+
if c.NoTiles {
191+
ui.KV("mode", "raw-grid (no tile pyramid)")
192+
ui.KVf("raw chunk size", "%d px", 1<<c.RawChunkSizeLog2)
193+
} else {
194+
ui.KVf("zoom range", "%d..%d", c.MinZoom, c.MaxZoom)
195+
ui.KVf("tile size", "%d px", 1<<c.TileSizeLog2)
196+
}
192197
if c.Filter == "" {
193198
ui.KV("filter", "all source variables")
194199
} else {
@@ -232,14 +237,16 @@ func runEncode(c *encodeCmd) error {
232237
}
233238

234239
opts := encode.Options{
235-
TileSize: 1 << c.TileSizeLog2,
236-
MinZoom: c.MinZoom,
237-
MaxZoom: c.MaxZoom,
238-
Precision: overrides,
239-
Metadata: map[string]any{"sourceFormat": formatLabel, "sourceCount": len(inputs)},
240-
DisableDeltaCodec: c.DisableDeltaCodec,
241-
ZstdLevel: c.ZstdLevel,
242-
EnableTileDict: c.TileDict,
240+
TileSize: 1 << c.TileSizeLog2,
241+
MinZoom: c.MinZoom,
242+
MaxZoom: c.MaxZoom,
243+
NoTiles: c.NoTiles,
244+
RawGridChunkSizeLog2: c.RawChunkSizeLog2,
245+
Precision: overrides,
246+
Metadata: map[string]any{"sourceFormat": formatLabel, "sourceCount": len(inputs)},
247+
DisableDeltaCodec: c.DisableDeltaCodec,
248+
ZstdLevel: c.ZstdLevel,
249+
EnableTileDict: c.TileDict,
243250
OnInputScanned: func(name string, records int) {
244251
if scanPhase == nil {
245252
scanStart.Store(time.Now().UnixNano())

0 commit comments

Comments
 (0)