|
1 | 1 | # geoarrow-array |
| 2 | + |
| 3 | +The central type in Apache Arrow are arrays, which are a known-length sequence of values all having the same type. This crate provides concrete implementations of each type defined in the [GeoArrow specification], as well as a [GeoArrowArray] trait that can be used for type-erasure. |
| 4 | + |
| 5 | +[GeoArrow specification]: https://github.com/geoarrow/geoarrow |
| 6 | + |
| 7 | +In order to minimize overhead of dynamic downcasting, the array types in this crate are defined "natively" and there's a `O(1)` conversion process that needs to happen to convert between a GeoArrow array type and an [`arrow`][arrow_array] array type. |
| 8 | + |
| 9 | +## Building a GeoArrow Array |
| 10 | + |
| 11 | +Use [builders][builder] to construct GeoArrow arrays. These builders offer a push-based interface to construct arrays from a series of objects that implement [`geo-traits`][geo_traits]. |
| 12 | + |
| 13 | +```rust |
| 14 | +# use geo_traits::{CoordTrait, PointTrait}; |
| 15 | +# use geoarrow_array::array::PointArray; |
| 16 | +# use geoarrow_array::builder::PointBuilder; |
| 17 | +# use geoarrow_array::scalar::Point; |
| 18 | +# use geoarrow_array::ArrayAccessor; |
| 19 | +# use geoarrow_schema::{CoordType, Dimension, PointType}; |
| 20 | +# |
| 21 | +let point_type = PointType::new(CoordType::Separated, Dimension::XY, Default::default()); |
| 22 | +let mut builder = PointBuilder::new(point_type); |
| 23 | + |
| 24 | +builder.push_point(Some(&geo_types::point!(x: 0., y: 1.))); |
| 25 | +builder.push_point(Some(&geo_types::point!(x: 2., y: 3.))); |
| 26 | +builder.push_point(Some(&geo_types::point!(x: 4., y: 5.))); |
| 27 | + |
| 28 | +let array: PointArray = builder.finish(); |
| 29 | + |
| 30 | +let point_0: Point<'_> = array.get(0).unwrap().unwrap(); |
| 31 | +assert_eq!(point_0.coord().unwrap().x_y(), (0., 1.)); |
| 32 | +``` |
| 33 | + |
| 34 | +Converting a builder to an array via `finish()` is always `O(1)`. |
| 35 | + |
| 36 | +## Converting to and from [`arrow`][arrow_array] Arrays |
| 37 | + |
| 38 | +The `geoarrow` crates depend on and are designed to be used in combination with the upstream [Arrow][arrow_array] crates. As such, we have easy integration to convert between representations of each crate. |
| 39 | + |
| 40 | +Note that an [`Array`] or [`ArrayRef`] only maintains information about the physical [`DataType`] and will lose any extension type information. Because of this, it's **imperative to store an [`Array`] and [`Field`] together** since the [`Field`] persists the Arrow [extension metadata]. A [`RecordBatch`] holds an [`Array`] and [`Field`] together for each column, so a [`RecordBatch`] will persist extension metadata. |
| 41 | + |
| 42 | +### Converting to GeoArrow Arrays |
| 43 | + |
| 44 | +If you have an [`Array`] and [`Field`] but don't know the geometry type of the array, you can use [`from_arrow_array`][array::from_arrow_array]: |
| 45 | + |
| 46 | +```rust |
| 47 | +# use std::sync::Arc; |
| 48 | +# |
| 49 | +# use arrow_array::Array; |
| 50 | +# use arrow_schema::Field; |
| 51 | +# use geoarrow_array::array::{from_arrow_array, PointArray}; |
| 52 | +# use geoarrow_array::cast::AsGeoArrowArray; |
| 53 | +# use geoarrow_array::{GeoArrowArray, GeoArrowType}; |
| 54 | +# |
| 55 | +fn use_from_arrow_array(array: &dyn Array, field: &Field) { |
| 56 | + let geoarrow_array: Arc<dyn GeoArrowArray> = from_arrow_array(array, field).unwrap(); |
| 57 | + match geoarrow_array.data_type() { |
| 58 | + GeoArrowType::Point(_) => { |
| 59 | + let array: &PointArray = geoarrow_array.as_point(); |
| 60 | + } |
| 61 | + _ => todo!("handle other geometry types"), |
| 62 | + } |
| 63 | +} |
| 64 | +``` |
| 65 | + |
| 66 | +If you know the geometry type of your array, you can use one of its `TryFrom` implementations to convert directly to that type. This means you don't have to downcast on the GeoArrow side from an `Arc<dyn GeoArrowArray>`. |
| 67 | + |
| 68 | +```rust |
| 69 | +# use arrow_array::Array; |
| 70 | +# use arrow_schema::Field; |
| 71 | +# use geoarrow_array::array::PointArray; |
| 72 | +# |
| 73 | +fn convert_to_point_array(array: &dyn Array, field: &Field) { |
| 74 | + let point_array = PointArray::try_from((array, field)).unwrap(); |
| 75 | +} |
| 76 | +``` |
| 77 | + |
| 78 | +### Converting to [arrow][arrow_array] Arrays |
| 79 | + |
| 80 | +You can use the [`to_array_ref`][GeoArrowArray::to_array_ref] or [`into_array_ref`][GeoArrowArray::into_array_ref] methods on [`GeoArrowArray`] to convert to an [`ArrayRef`]. |
| 81 | + |
| 82 | +Alternatively, if you have a concrete GeoArrow array type, you can use [`IntoArray`] to convert to a concrete arrow array type. |
| 83 | + |
| 84 | +The easiest way today to access an arrow [`Field`] is to use [`IntoArray::ext_type`] and then call `to_field` on the result. We like to make this process simpler in the future. |
| 85 | + |
| 86 | +## Downcasting a GeoArrow array |
| 87 | + |
| 88 | +Arrays are often passed around as a dynamically typed `&dyn GeoArrowArray` or [`Arc<dyn GeoArrowArray>`][GeoArrowArray]. |
| 89 | + |
| 90 | +While these arrays can be passed directly to compute functions, it is often the case that you wish to interact with the concrete arrays directly. |
| 91 | + |
| 92 | +This requires downcasting to the concrete type of the array. Use the [`cast::AsGeoArrowArray`] extension trait to do this ergonomically. |
| 93 | + |
| 94 | +```rust |
| 95 | +use geoarrow_array::cast::AsGeoArrowArray; |
| 96 | +use geoarrow_array::{ArrayAccessor, GeoArrowArray}; |
| 97 | + |
| 98 | +fn iter_line_string_array(array: &dyn GeoArrowArray) { |
| 99 | + for row in array.as_line_string().iter() { |
| 100 | + // do something with each row |
| 101 | + } |
| 102 | +} |
| 103 | +``` |
| 104 | + |
| 105 | +[`Array`]: arrow_array::Array |
| 106 | +[`ArrayRef`]: arrow_array::ArrayRef |
| 107 | +[`DataType`]: arrow_schema::DataType |
| 108 | +[`Field`]: arrow_schema::Field |
| 109 | +[`RecordBatch`]: arrow_array::RecordBatch |
| 110 | +[extension metadata]: https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types |
0 commit comments