Skip to content

Conversation

@tawera-manaena
Copy link
Contributor

@tawera-manaena tawera-manaena commented May 6, 2025

  1. This pull request extends the feat/etl-create branch, not the master branch.
  2. The changed files contain short-circuiting code to enable faster debugging that needs to be purged before approval.

Motivation

We are currently migrating our Vector ETL system into basemaps as a new cli-vector package. We are also re-building the system in the process. As a part of the re-build, we have streamlined the system's architecture in the following four steps:

  1. extract
  2. create
  3. join
  4. analyse

This work implements an extension to the base branch's implementation of the create CLI command. This work introduces a series of functions for handling dataset features of specific Shortbread layers for which we require special tagging.

Modifications

Diagram
  1. Implements a switch-style pattern for identifying the Shortbread layer to which a feature is assigned.
  2. Implements bespoke functions for overriding a feature's metadata and properties for the following Shortbread layers:
    • contours
    • place_labels - This one is weird. We will need to discuss the approach for this one.
    • pois - This one doesn't actually do anything as it never receives any features that will meet its conditions for special tagging. We can discuss this also.
    • public_transport
    • street_labels - This one's logic is very similar to the streets logic.
    • streets - This one's logic is very similar to the street_labels logic.
    • water_polygons

Processing a Vector Stac Item

The system processes a Stac Item file created via the cli-vector package's new extract command like so:

  1. Parse the file as a runtime VectorStacItem type.
  2. Download the source file (i.e. a GeoPackage file).
  3. Convert the source file into an NDJSON file.
  4. Parse each line (i.e. feature) of the NDJSON file, generalise it (i.e. simplify, add/remove attributes, or discard).
  5. Process the collection of generalised features into an mbtiles file.
  6. Upload a copy of the mbtiles file to the same directory as the Stac Item file.
  7. Update the Stac Item file's contents and overwrite it.
Diagram

Verification

  1. Established test cases for the special tagging functions for the following Shortbread layers:
    • contours

More to come if desired.

@ccbblin ccbblin changed the base branch from master to feat/etl-create May 6, 2025 21:03
@tawera-manaena tawera-manaena changed the title [WIP] feat(cli-vector): introduce logic for overriding feature metadata and properties [WIP] feat(cli-vector): introduce logic for overriding feature metadata and properties BM-1268 May 8, 2025
@tawera-manaena tawera-manaena changed the title [WIP] feat(cli-vector): introduce logic for overriding feature metadata and properties BM-1268 feat(cli-vector): introduce logic for overriding feature metadata and properties BM-1268 May 8, 2025
@tawera-manaena tawera-manaena marked this pull request as ready for review May 21, 2025 03:13
@tawera-manaena tawera-manaena requested a review from a team as a code owner May 21, 2025 03:13
@tawera-manaena tawera-manaena requested review from Wentao-Kuang and ccbblin and removed request for a team May 21, 2025 03:13
@tawera-manaena
Copy link
Contributor Author

tawera-manaena commented May 22, 2025

Action Items

  • 677ebbe Too many logs when running the create command without verbose flags. Remove superfluous logs and convert debug-level logs to trace-level logs.

  • c404171 Can't process an individual schema JSON file. Update the extract cli command so that the path parameter supports directories and single JSON files. Probably need to update the logic to handle this, too.

  • 635eeaa tmp folder could be organised better. Update the create command's logic so that the tmp files adhere to the following structure:

    tmp/
      create/
        layers/
          {layer.id}/
            {layer.id}.gpkg
            {layer.id}.ndjson
        transform/
          {shortbread_layer}/
            {layer.id}/
              {layer.id}-gen.ndjson
              {layer.id}.mbtiles
    
  • af0b7bf split the logic of the createMbtiles function into two parts:

    • downloadSources - function for fetching the Vector Stac Item file and downloading the source dataset.
    • createMbtiles - function for converting the source dataset into an, ndjson, gen-ndjson, and mbtiles file.

@tawera-manaena tawera-manaena marked this pull request as draft May 25, 2025 20:31
@tawera-manaena tawera-manaena marked this pull request as ready for review May 25, 2025 21:32
@Wentao-Kuang Wentao-Kuang merged commit a7a9473 into feat/etl-create May 25, 2025
9 of 10 checks passed
@Wentao-Kuang Wentao-Kuang deleted the feat/etl-create-special-tags branch May 25, 2025 22:43
ccbblin pushed a commit that referenced this pull request Jun 4, 2025
… properties BM-1268 (#3440)

1. This pull request extends the [feat/etl-create] branch, not the
[master] branch.
2. The changed files contain short-circuiting code to enable faster
debugging that needs to be purged before approval.

---

### Motivation

We are currently migrating our Vector ETL system into basemaps as a new
`cli-vector` package. We are also re-building the system in the process.
As a part of the re-build, we have streamlined the system's architecture
in the following four steps:

1. `extract`
2. `create`
3. `join`
4. `analyse`

This work implements an extension to the base branch's implementation of
the `create` CLI command. This work introduces a series of functions for
handling dataset features of specific Shortbread layers for which we
require special tagging.

### Modifications

| Diagram |
| - |
| ![][diagram1] |

1. Implements a switch-style pattern for identifying the Shortbread
layer to which a feature is assigned.
2. Implements bespoke functions for overriding a feature's metadata and
properties for the following Shortbread layers:
   - `contours`
- `place_labels` - This one is weird. We will need to discuss the
approach for this one.
- `pois` - This one doesn't actually do anything as it never receives
any features that will meet its conditions for special tagging. We can
discuss this also.
   - `public_transport`
- `street_labels` - This one's logic is very similar to the `streets`
logic.
- `streets` - This one's logic is very similar to the `street_labels`
logic.
   - `water_polygons`

#### Processing a Vector Stac Item

The system processes a Stac Item file created via the **cli-vector**
package's new `extract` command like so:

1. Parse the file as a runtime `VectorStacItem` type.
2. Download the source file (i.e. a GeoPackage file).
3. Convert the source file into an NDJSON file.
4. Parse each line (i.e. feature) of the NDJSON file, generalise it
(i.e. simplify, add/remove attributes, or discard).
5. Process the collection of generalised features into an mbtiles file.
6. Upload a copy of the mbtiles file to the same directory as the Stac
Item file.
7. Update the Stac Item file's contents and overwrite it.

| Diagram |
| - |
| ![][diagram2] |

### Verification

1. Established test cases for the special tagging functions for the
following Shortbread layers:
   - `contours`

More to come if desired.

<!-- external links -->

[feat/etl-create]: https://github.com/linz/basemaps/tree/feat/etl-create
[master]: https://github.com/linz/basemaps/tree/master

[diagram1]:
https://github.com/user-attachments/assets/6a405417-59fb-4127-8630-cb0901c12618
[diagram2]:
https://github.com/user-attachments/assets/2994bd5d-c668-4b99-957e-57623da7c6a1

---------

Co-authored-by: Wentao Kuang <[email protected]>
ccbblin pushed a commit that referenced this pull request Jun 10, 2025
… properties BM-1268 (#3440)

1. This pull request extends the [feat/etl-create] branch, not the
[master] branch.
2. The changed files contain short-circuiting code to enable faster
debugging that needs to be purged before approval.

---

### Motivation

We are currently migrating our Vector ETL system into basemaps as a new
`cli-vector` package. We are also re-building the system in the process.
As a part of the re-build, we have streamlined the system's architecture
in the following four steps:

1. `extract`
2. `create`
3. `join`
4. `analyse`

This work implements an extension to the base branch's implementation of
the `create` CLI command. This work introduces a series of functions for
handling dataset features of specific Shortbread layers for which we
require special tagging.

### Modifications

| Diagram |
| - |
| ![][diagram1] |

1. Implements a switch-style pattern for identifying the Shortbread
layer to which a feature is assigned.
2. Implements bespoke functions for overriding a feature's metadata and
properties for the following Shortbread layers:
   - `contours`
- `place_labels` - This one is weird. We will need to discuss the
approach for this one.
- `pois` - This one doesn't actually do anything as it never receives
any features that will meet its conditions for special tagging. We can
discuss this also.
   - `public_transport`
- `street_labels` - This one's logic is very similar to the `streets`
logic.
- `streets` - This one's logic is very similar to the `street_labels`
logic.
   - `water_polygons`

#### Processing a Vector Stac Item

The system processes a Stac Item file created via the **cli-vector**
package's new `extract` command like so:

1. Parse the file as a runtime `VectorStacItem` type.
2. Download the source file (i.e. a GeoPackage file).
3. Convert the source file into an NDJSON file.
4. Parse each line (i.e. feature) of the NDJSON file, generalise it
(i.e. simplify, add/remove attributes, or discard).
5. Process the collection of generalised features into an mbtiles file.
6. Upload a copy of the mbtiles file to the same directory as the Stac
Item file.
7. Update the Stac Item file's contents and overwrite it.

| Diagram |
| - |
| ![][diagram2] |

### Verification

1. Established test cases for the special tagging functions for the
following Shortbread layers:
   - `contours`

More to come if desired.

<!-- external links -->

[feat/etl-create]: https://github.com/linz/basemaps/tree/feat/etl-create
[master]: https://github.com/linz/basemaps/tree/master

[diagram1]:
https://github.com/user-attachments/assets/6a405417-59fb-4127-8630-cb0901c12618
[diagram2]:
https://github.com/user-attachments/assets/2994bd5d-c668-4b99-957e-57623da7c6a1

---------

Co-authored-by: Wentao Kuang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants