-
-
Notifications
You must be signed in to change notification settings - Fork 306
Add mbtiles pack and mbtiles unpack subcommands
#2199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c1b9491 to
821cd2d
Compare
49770cb to
509bca8
Compare
for more information, see https://pre-commit.ci
nyurik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good! I made a few comments. Overall, I don't see any reasons not to have this, but others might chime in?
| if !input_directory.exists() { | ||
| anyhow::bail!( | ||
| "Input directory does not exist: {}", | ||
| input_directory.display() | ||
| ); | ||
| } | ||
| if !input_directory.is_dir() { | ||
| anyhow::bail!( | ||
| "Input path is not a directory: {}", | ||
| input_directory.display() | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the WalkDir report these errors as part of iteration?
| .iter() | ||
| .filter_map(|c| { | ||
| c.to_str() | ||
| .and_then(|s| s.split('.').next()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you removing file extension this way? Isn't there a Path-fn for this? Or am I misunderstanding?
| format = match entry.path().extension().and_then(|s| s.to_str()) { | ||
| Some("pbf" | "mvt") => Some("pbf".to_string()), | ||
| Some("jpg" | "jpeg") => Some("jpg".to_string()), | ||
| Some("webp") => Some("webp".to_string()), | ||
| Some("png") => Some("png".to_string()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably want to use
martin/martin-tile-utils/src/lib.rs
Lines 86 to 89 in 6e870e5
| pub fn parse(value: &str) -> Option<Self> { | |
| Some(match value.to_ascii_lowercase().as_str() { | |
| "gif" => Self::Gif, | |
| "jpg" | "jpeg" => Self::Jpeg, |
| TileScheme::Tms => y, | ||
| }; | ||
|
|
||
| mbt.insert_tiles( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing it one by one will likely kill performance - while this is ok for the first pass, you may want to batch up files, likely even using some parallel IO reading, and once the batch is big enough, insert it. Since sqlite is in-proc, you might actually achieve the same result with one by one inserting if you set some internal sqlite flags to do insertions in bulk (i.e. transaction), but you would have to research that
|
|
||
| // Get the format from metadata to determine file extension and compression | ||
| let format = mbt.get_metadata_value(&mut conn, "format").await?; | ||
| let (extension, decompress) = match format.as_deref() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing - see utils for all this
|
|
||
| while let Some(tile) = tiles.try_next().await? { | ||
| let Some(z) = tile.zoom_level else { | ||
| log::warn!("Skipping tile with missing zoom level"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while this would be mega-rare, you might still have some sort of a flag to avoid flooding the output with duplicate warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is actually possible, right?
| tokio = { workspace = true, features = ["rt-multi-thread"] } | ||
| xxhash-rust.workspace = true | ||
|
|
||
| walkdir = "2.5.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to workspace and fix newline spacing
| /// Tile ID scheme for input directory | ||
| #[arg(long, value_enum, default_value = "xyz")] | ||
| scheme: TileScheme, | ||
| }, | ||
| /// Unpack an MBTiles file into a directory tree of tiles | ||
| #[command(name = "unpack")] | ||
| Unpack { | ||
| /// MBTiles file to read | ||
| input_file: PathBuf, | ||
| /// directory to write | ||
| output_directory: PathBuf, | ||
| /// Tile ID scheme for output directory | ||
| #[arg(long, value_enum, default_value = "xyz")] | ||
| scheme: TileScheme, | ||
| }, | ||
| } | ||
|
|
||
| #[derive(Clone, Copy, PartialEq, Debug, clap::ValueEnum)] | ||
| enum TileScheme { | ||
| /// XYZ (aka. "slippy map") scheme where Y=0 is at the top | ||
| #[value(name = "xyz")] | ||
| Xyz, | ||
| /// TMS scheme where Y=0 is at the bottom | ||
| #[value(name = "tms")] | ||
| Tms, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: imo, this is a bit more ergonomic, but both are fine.
| /// Tile ID scheme for input directory | |
| #[arg(long, value_enum, default_value = "xyz")] | |
| scheme: TileScheme, | |
| }, | |
| /// Unpack an MBTiles file into a directory tree of tiles | |
| #[command(name = "unpack")] | |
| Unpack { | |
| /// MBTiles file to read | |
| input_file: PathBuf, | |
| /// directory to write | |
| output_directory: PathBuf, | |
| /// Tile ID scheme for output directory | |
| #[arg(long, value_enum, default_value = "xyz")] | |
| scheme: TileScheme, | |
| }, | |
| } | |
| #[derive(Clone, Copy, PartialEq, Debug, clap::ValueEnum)] | |
| enum TileScheme { | |
| /// XYZ (aka. "slippy map") scheme where Y=0 is at the top | |
| #[value(name = "xyz")] | |
| Xyz, | |
| /// TMS scheme where Y=0 is at the bottom | |
| #[value(name = "tms")] | |
| Tms, | |
| /// Tile ID scheme for input directory | |
| #[arg(long, value_enum, default)] | |
| scheme: TileScheme, | |
| }, | |
| /// Unpack an MBTiles file into a directory tree of tiles | |
| #[command(name = "unpack")] | |
| Unpack { | |
| /// MBTiles file to read | |
| input_file: PathBuf, | |
| /// directory to write | |
| output_directory: PathBuf, | |
| /// Tile ID scheme for output directory | |
| #[arg(long, value_enum, default)] | |
| scheme: TileScheme, | |
| }, | |
| } | |
| #[derive(Clone, Copy, Default, PartialEq, Debug, clap::ValueEnum)] | |
| enum TileScheme { | |
| /// XYZ (aka. "slippy map") scheme where Y=0 is at the top | |
| #[value(name = "xyz")] | |
| #[default] | |
| Xyz, | |
| /// TMS scheme where Y=0 is at the bottom | |
| #[value(name = "tms")] | |
| Tms, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
CommanderStorm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work.
I have done a few comments below.
What is missing from this PR is having tests as currently there are none.
| let walker = WalkDir::new(input_directory); | ||
| let entries = walker.into_iter().filter_entry(|entry| { | ||
| let should_include = if entry.file_type().is_dir() { | ||
| // skip directories except the root unless they have numeric names | ||
| entry.depth() == 0 | ||
| || entry | ||
| .file_name() | ||
| .to_str() | ||
| .is_some_and(|s| s.parse::<u32>().is_ok()) | ||
| } else { | ||
| // skip files that do not have a numeric basename | ||
| entry | ||
| .file_name() | ||
| .to_str() | ||
| .and_then(|s| s.split('.').next().map(|b| b.parse::<u32>().is_ok())) | ||
| .unwrap_or(false) | ||
| }; | ||
|
|
||
| if !should_include { | ||
| log::info!( | ||
| "Skipping {}{}", | ||
| entry.path().display(), | ||
| if entry.file_type().is_dir() { "/" } else { "" } | ||
| ); | ||
| } | ||
|
|
||
| should_include | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should warn on every occurance.
Lets also activate follow_links or make this an option at least 🤔
| let walker = WalkDir::new(input_directory); | |
| let entries = walker.into_iter().filter_entry(|entry| { | |
| let should_include = if entry.file_type().is_dir() { | |
| // skip directories except the root unless they have numeric names | |
| entry.depth() == 0 | |
| || entry | |
| .file_name() | |
| .to_str() | |
| .is_some_and(|s| s.parse::<u32>().is_ok()) | |
| } else { | |
| // skip files that do not have a numeric basename | |
| entry | |
| .file_name() | |
| .to_str() | |
| .and_then(|s| s.split('.').next().map(|b| b.parse::<u32>().is_ok())) | |
| .unwrap_or(false) | |
| }; | |
| if !should_include { | |
| log::info!( | |
| "Skipping {}{}", | |
| entry.path().display(), | |
| if entry.file_type().is_dir() { "/" } else { "" } | |
| ); | |
| } | |
| should_include | |
| }); | |
| let mut should_warn_for_invalid_directory_names = true; | |
| let mut should_warn_for_invalid_file_names = true; | |
| let walker = WalkDir::new(input_directory).follow_links(true); | |
| let entries = walker.into_iter().filter_entry(|entry| { | |
| if entry.file_type().is_dir() { | |
| // skip directories except the root unless they have numeric names | |
| let filename_is_numeric_or_root = entry | |
| .file_name() | |
| .to_str() | |
| .is_some_and(|s| s.parse::<u32>().is_ok() || | |
| entry.depth() == 0; | |
| if should_warn_for_invalid_directory_names && !filename_is_numeric_or_root { | |
| log::info!("Skipping directory {}/ and similar occurances. Please make sure that all directories under {} are named numericaly", | |
| entry.path().display() | |
| input_directory.display() | |
| ); | |
| should_warn_for_invalid_directory_names = false; | |
| } | |
| filename_is_numeric_or_root | |
| } else { | |
| // skip files that do not have a numeric basename | |
| let filename_is_numeric_except_extensions = entry | |
| .file_name() | |
| .to_str() | |
| .and_then(|s| s.split('.').next().map(|b| b.parse::<u32>().is_ok())) | |
| .unwrap_or(false) | |
| if should_warn_for_invalid_file_names && !filename_is_numeric_except_extensions { | |
| log::info!("Skipping file {} and similar occurances. Please make sure that all filesnames under {} are numeric with optional exstensions", | |
| entry.path().display() | |
| input_directory.display() | |
| ); | |
| should_warn_for_invalid_file_names = false; | |
| } | |
| filename_is_numeric_except_extensions | |
| } | |
| }); |
| if format == Some("pbf".to_string()) { | ||
| compress = true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think we should compress this indiscriminately.
The tile might already be compressed => we might compress it twice.
On another hand, lets give the user to opt out of compression, or?
| // TODO: set minzoom, maxzoom, and bbox? | ||
| // either compute them, or possibly read them from {input_directory}/metadata.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets just compute them instead.
It is cleaner this way.
For other metadata (atrribution, ...) it is might still be worthwhile to have this.
| // Query all tiles from the database | ||
| let mut tiles = sqlx::query!("SELECT zoom_level, tile_column, tile_row, tile_data FROM tiles ORDER BY zoom_level, tile_column, tile_row") | ||
| .fetch(&mut conn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use this code here instead:
|
|
||
| while let Some(tile) = tiles.try_next().await? { | ||
| let Some(z) = tile.zoom_level else { | ||
| log::warn!("Skipping tile with missing zoom level"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is actually possible, right?
|
Thanks for the reviews. I'll update the PR to address your feedback as soon as I have a chance. |
Hello! This is my first contribution to Martin.
This PR adds
packandunpacksubcommands to thembtilesCLI tool. Pack reads a directory tree of tiles and creates a new MBTiles archive, and unpack does the inverse, extracting the contents of an MBTiles archive into a directory tree. The directory tree is organized like{z}/{x}/{y}.{ext}, matching how URLs on a tile server are typically structured.For planet-sized data, such a directory tree is impractically large (it contains millions of files, and simply traversing it can be very slow). But for developers, who are often working with smaller test datasets, it can be convenient to work with tilesets in their directory form: it lets you easily inspect individual tile files, and also permits serving a tileset with any HTTP server (such as
python -m http.serverornpx serve). Tools like tippecanoe and GDAL also support generating tilesets in directory form.Mapbox's
mbutiltool supports packing and unpacking MBTiles archives, but it is no longer maintained. I also found it tricky to use (it has some surprising behaviors, like only compressing pbf tiles if you pass a special flag, despite compression being required by the spec). So I thought it would be nice if Martin'smbtilestool had the same functionality.This PR adds two new dependencies:
flate2for compression (already required by other packages in the workspace), andwalkdirfor directory traversal (needed forpack).Thanks for considering this feature! Please let me know if you have any questions or feedback.