Skip to content

Commit e8b18d4

Browse files
authored
Documentation update for geoarrow-array crate (#1027)
- Write crate-level documentation for `geoarrow-array`. - At least have a minimal docstring for every module - Add `from_geoarrow_array` top-level function to convert from a `&dyn Array`, `&Field` pair into an `Arc<dyn GeoArrowArray>` - Warn (which will error on CI) on missing docs in the geoarrow-array crate.
1 parent 985c34f commit e8b18d4

25 files changed

Lines changed: 215 additions & 125 deletions

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ arrow-buffer = "54.3.1"
1919
arrow-schema = "54.3.1"
2020
geo = "0.30.0"
2121
geo-traits = "0.2.0"
22+
geo-types = "0.7.16"
2223
num-traits = "0.2.19"
2324
rstar = "0.12.2"
2425
serde = "1"

rust/geoarrow-array/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,5 @@ wkb = { workspace = true }
2525
wkt = { workspace = true }
2626

2727
[dev-dependencies]
28+
geo-types = { workspace = true }
2829
geo = { workspace = true }

rust/geoarrow-array/README.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,110 @@
11
# geoarrow-array
2+
3+
The central type in Apache Arrow are arrays, which are a known-length sequence of values all having the same type. This crate provides concrete implementations of each type defined in the [GeoArrow specification], as well as a [GeoArrowArray] trait that can be used for type-erasure.
4+
5+
[GeoArrow specification]: https://github.com/geoarrow/geoarrow
6+
7+
In order to minimize overhead of dynamic downcasting, the array types in this crate are defined "natively" and there's a `O(1)` conversion process that needs to happen to convert between a GeoArrow array type and an [`arrow`][arrow_array] array type.
8+
9+
## Building a GeoArrow Array
10+
11+
Use [builders][builder] to construct GeoArrow arrays. These builders offer a push-based interface to construct arrays from a series of objects that implement [`geo-traits`][geo_traits].
12+
13+
```rust
14+
# use geo_traits::{CoordTrait, PointTrait};
15+
# use geoarrow_array::array::PointArray;
16+
# use geoarrow_array::builder::PointBuilder;
17+
# use geoarrow_array::scalar::Point;
18+
# use geoarrow_array::ArrayAccessor;
19+
# use geoarrow_schema::{CoordType, Dimension, PointType};
20+
#
21+
let point_type = PointType::new(CoordType::Separated, Dimension::XY, Default::default());
22+
let mut builder = PointBuilder::new(point_type);
23+
24+
builder.push_point(Some(&geo_types::point!(x: 0., y: 1.)));
25+
builder.push_point(Some(&geo_types::point!(x: 2., y: 3.)));
26+
builder.push_point(Some(&geo_types::point!(x: 4., y: 5.)));
27+
28+
let array: PointArray = builder.finish();
29+
30+
let point_0: Point<'_> = array.get(0).unwrap().unwrap();
31+
assert_eq!(point_0.coord().unwrap().x_y(), (0., 1.));
32+
```
33+
34+
Converting a builder to an array via `finish()` is always `O(1)`.
35+
36+
## Converting to and from [`arrow`][arrow_array] Arrays
37+
38+
The `geoarrow` crates depend on and are designed to be used in combination with the upstream [Arrow][arrow_array] crates. As such, we have easy integration to convert between representations of each crate.
39+
40+
Note that an [`Array`] or [`ArrayRef`] only maintains information about the physical [`DataType`] and will lose any extension type information. Because of this, it's **imperative to store an [`Array`] and [`Field`] together** since the [`Field`] persists the Arrow [extension metadata]. A [`RecordBatch`] holds an [`Array`] and [`Field`] together for each column, so a [`RecordBatch`] will persist extension metadata.
41+
42+
### Converting to GeoArrow Arrays
43+
44+
If you have an [`Array`] and [`Field`] but don't know the geometry type of the array, you can use [`from_arrow_array`][array::from_arrow_array]:
45+
46+
```rust
47+
# use std::sync::Arc;
48+
#
49+
# use arrow_array::Array;
50+
# use arrow_schema::Field;
51+
# use geoarrow_array::array::{from_arrow_array, PointArray};
52+
# use geoarrow_array::cast::AsGeoArrowArray;
53+
# use geoarrow_array::{GeoArrowArray, GeoArrowType};
54+
#
55+
fn use_from_arrow_array(array: &dyn Array, field: &Field) {
56+
let geoarrow_array: Arc<dyn GeoArrowArray> = from_arrow_array(array, field).unwrap();
57+
match geoarrow_array.data_type() {
58+
GeoArrowType::Point(_) => {
59+
let array: &PointArray = geoarrow_array.as_point();
60+
}
61+
_ => todo!("handle other geometry types"),
62+
}
63+
}
64+
```
65+
66+
If you know the geometry type of your array, you can use one of its `TryFrom` implementations to convert directly to that type. This means you don't have to downcast on the GeoArrow side from an `Arc<dyn GeoArrowArray>`.
67+
68+
```rust
69+
# use arrow_array::Array;
70+
# use arrow_schema::Field;
71+
# use geoarrow_array::array::PointArray;
72+
#
73+
fn convert_to_point_array(array: &dyn Array, field: &Field) {
74+
let point_array = PointArray::try_from((array, field)).unwrap();
75+
}
76+
```
77+
78+
### Converting to [arrow][arrow_array] Arrays
79+
80+
You can use the [`to_array_ref`][GeoArrowArray::to_array_ref] or [`into_array_ref`][GeoArrowArray::into_array_ref] methods on [`GeoArrowArray`] to convert to an [`ArrayRef`].
81+
82+
Alternatively, if you have a concrete GeoArrow array type, you can use [`IntoArray`] to convert to a concrete arrow array type.
83+
84+
The easiest way today to access an arrow [`Field`] is to use [`IntoArray::ext_type`] and then call `to_field` on the result. We like to make this process simpler in the future.
85+
86+
## Downcasting a GeoArrow array
87+
88+
Arrays are often passed around as a dynamically typed `&dyn GeoArrowArray` or [`Arc<dyn GeoArrowArray>`][GeoArrowArray].
89+
90+
While these arrays can be passed directly to compute functions, it is often the case that you wish to interact with the concrete arrays directly.
91+
92+
This requires downcasting to the concrete type of the array. Use the [`cast::AsGeoArrowArray`] extension trait to do this ergonomically.
93+
94+
```rust
95+
use geoarrow_array::cast::AsGeoArrowArray;
96+
use geoarrow_array::{ArrayAccessor, GeoArrowArray};
97+
98+
fn iter_line_string_array(array: &dyn GeoArrowArray) {
99+
for row in array.as_line_string().iter() {
100+
// do something with each row
101+
}
102+
}
103+
```
104+
105+
[`Array`]: arrow_array::Array
106+
[`ArrayRef`]: arrow_array::ArrayRef
107+
[`DataType`]: arrow_schema::DataType
108+
[`Field`]: arrow_schema::Field
109+
[`RecordBatch`]: arrow_array::RecordBatch
110+
[extension metadata]: https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types

rust/geoarrow-array/src/array/coord/interleaved.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,10 @@ use crate::builder::InterleavedCoordBufferBuilder;
1010
use crate::error::{GeoArrowError, Result};
1111
use crate::scalar::InterleavedCoord;
1212

13-
/// A an array of coordinates stored interleaved in a single buffer.
13+
/// An array of coordinates stored interleaved in a single buffer.
14+
///
15+
/// This stores all coordinates in interleaved fashion in a single underlying buffer: e.g. `xyxyxy`
16+
/// for 2D coordinates.
1417
#[derive(Debug, Clone, PartialEq)]
1518
pub struct InterleavedCoordBuffer {
1619
pub(crate) coords: ScalarBuffer<f64>,

rust/geoarrow-array/src/array/coord/separated.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ use crate::error::{GeoArrowError, Result};
1212
use crate::scalar::SeparatedCoord;
1313
use geo_traits::CoordTrait;
1414

15-
/// The GeoArrow equivalent to `Vec<Option<Coord>>`: an immutable collection of coordinates.
15+
/// An array of coordinates stored in separate buffers of the same length.
1616
///
17-
/// This stores all coordinates in separated fashion as multiple underlying buffers: `xxx` and
18-
/// `yyy`.
17+
/// This stores all coordinates in separated fashion as multiple underlying buffers: e.g. `xxx` and
18+
/// `yyy` for 2D coordinates.
1919
#[derive(Debug, Clone, PartialEq)]
2020
pub struct SeparatedCoordBuffer {
2121
/// We always store a buffer for all 4 dimensions. The buffers for dimension 3 and 4 may be

rust/geoarrow-array/src/array/geometry.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ use crate::error::{GeoArrowError, Result};
1818
use crate::scalar::Geometry;
1919
use crate::trait_::{ArrayAccessor, GeoArrowArray, IntoArrow};
2020

21+
/// An immutable array of geometries of unknown geometry type and dimension.
22+
///
2123
/// # Invariants
2224
///
2325
/// - All arrays must have the same dimension

rust/geoarrow-array/src/array/geometrycollection.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ use crate::scalar::GeometryCollection;
1616
use crate::trait_::{ArrayAccessor, GeoArrowArray, IntoArrow};
1717
use crate::util::{offsets_buffer_i64_to_i32, OffsetBufferUtils};
1818

19-
/// An immutable array of GeometryCollection geometries using GeoArrow's in-memory representation.
19+
/// An immutable array of GeometryCollection geometries.
2020
///
2121
/// This is semantically equivalent to `Vec<Option<GeometryCollection>>` due to the internal
2222
/// validity bitmap.

rust/geoarrow-array/src/array/linestring.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ use arrow_buffer::{NullBuffer, OffsetBuffer};
1616
use arrow_schema::{DataType, Field};
1717
use geoarrow_schema::{LineStringType, Metadata};
1818

19-
/// An immutable array of LineString geometries using GeoArrow's in-memory representation.
19+
/// An immutable array of LineString geometries.
2020
///
2121
/// This is semantically equivalent to `Vec<Option<LineString>>` due to the internal validity
2222
/// bitmap.
Lines changed: 32 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,6 @@
1-
//! Implementations of immutable GeoArrow arrays plus builders to more easily create arrays.
2-
//!
3-
//! There are three primary types of structs in this module: arrays, builders, and capacity
4-
//! counters.
5-
//!
6-
//! ## Arrays
7-
//!
8-
//! Arrays
9-
//!
10-
//! These arrays implement the binary layout defined in the [GeoArrow specification](https://github.com/geoarrow/geoarrow).
11-
//!
12-
//!
13-
//!
14-
//! These include:
15-
//!
16-
//! - [`PointArray`]
17-
//! - [`LineStringArray`]
18-
//! - [`PolygonArray`]
19-
//! - [`MultiPointArray`]
20-
//! - [`MultiLineStringArray`]
21-
//! - [`MultiPolygonArray`]
22-
//! - [`GeometryArray`]
23-
//! - [`GeometryCollectionArray`]
24-
//! - [`RectArray`]
25-
//!
26-
//! ## Builders
27-
//!
28-
//! Builders are designed to make it easier
29-
//!
30-
//! There's a builder for each of the above array types:
31-
//!
32-
//!
33-
//! - [`PointBuilder`]
34-
//! - [`LineStringBuilder`]
35-
//! - [`PolygonBuilder`]
36-
//! - [`MultiPointBuilder`]
37-
//! - [`MultiLineStringBuilder`]
38-
//! - [`MultiPolygonBuilder`]
39-
//! - [`GeometryBuilder`]
40-
//! - [`GeometryCollectionBuilder`]
41-
//! - [`RectBuilder`]
42-
//!
43-
//! Once you've finished adding geometries to a builder, it's `O(1)` to convert a builder to an
44-
//! array, by calling `finish()`.
45-
//!
46-
//! ## Capacity Counters
47-
//!
48-
//! Underlying the builders are growable `Vec`s. E.g. you can think of a `PointBuilder` as a buffer of `x` coordinates and a buffer of `y` coordinates.
49-
//!
50-
//! The fastest and most memory-efficient way to construct an array from a set of known geometries
51-
//! is to make a first pass over these geometries to count exactly how big each part of the Arrow
52-
//! array must be, allocate _once_ for exactly what you need, and then fill those buffers in a
53-
//! second pass.
1+
//! The concrete array definitions.
542
//!
3+
//! All arrays implement the core [GeoArrowArray][crate::GeoArrowArray] trait.
554
565
mod coord;
576
mod geometry;
@@ -80,3 +29,33 @@ pub use polygon::PolygonArray;
8029
pub use rect::RectArray;
8130
pub use wkb::WKBArray;
8231
pub use wkt::WKTArray;
32+
33+
use std::sync::Arc;
34+
35+
use arrow_array::Array;
36+
use arrow_schema::Field;
37+
38+
use crate::error::Result;
39+
use crate::{GeoArrowArray, GeoArrowType};
40+
41+
/// Construct a new [GeoArrowArray] from an Arrow [Array] and [Field].
42+
pub fn from_arrow_array(array: &dyn Array, field: &Field) -> Result<Arc<dyn GeoArrowArray>> {
43+
use GeoArrowType::*;
44+
45+
let result: Arc<dyn GeoArrowArray> = match GeoArrowType::try_from(field)? {
46+
Point(_) => Arc::new(PointArray::try_from((array, field))?),
47+
LineString(_) => Arc::new(LineStringArray::try_from((array, field))?),
48+
Polygon(_) => Arc::new(PolygonArray::try_from((array, field))?),
49+
MultiPoint(_) => Arc::new(MultiPointArray::try_from((array, field))?),
50+
MultiLineString(_) => Arc::new(MultiLineStringArray::try_from((array, field))?),
51+
MultiPolygon(_) => Arc::new(MultiPolygonArray::try_from((array, field))?),
52+
GeometryCollection(_) => Arc::new(GeometryCollectionArray::try_from((array, field))?),
53+
Rect(_) => Arc::new(RectArray::try_from((array, field))?),
54+
Geometry(_) => Arc::new(GeometryArray::try_from((array, field))?),
55+
WKB(_) => Arc::new(WKBArray::<i32>::try_from((array, field))?),
56+
LargeWKB(_) => Arc::new(WKBArray::<i64>::try_from((array, field))?),
57+
WKT(_) => Arc::new(WKTArray::<i32>::try_from((array, field))?),
58+
LargeWKT(_) => Arc::new(WKTArray::<i64>::try_from((array, field))?),
59+
};
60+
Ok(result)
61+
}

0 commit comments

Comments
 (0)