Skip to content

WKT/WKB parsing performance optimizations #438

@oleksii-leonov

Description

@oleksii-leonov

@BuonOmo

Current ActiveRecord::ConnectionAdapters::PostGIS::OID::Spatial internal parsing uses:

          def wkt_parser(string)
            if binary_string?(string)
              RGeo::WKRep::WKBParser.new(spatial_factory, support_ewkb: true, default_srid: @factory_attrs[:srid])
            else
              RGeo::WKRep::WKTParser.new(spatial_factory, support_ewkt: true, default_srid: @factory_attrs[:srid])
            end
          end

RGeo::WKRep::WKBParser and RGeo::WKRep::WKTParser are pure Ruby implementations:

Another option is to use factory.parse_wkt() or factory.parse_wkb().

GEOS factory with factory.parse_wkt() and factory.parse_wkb() is way faster than Ruby implementations in RGeo::WKRep::WKTParser and RGeo::WKRep::WKBParser.

For example:

large_circle = RGeo::Geographic.spherical_factory(buffer_resolution: 1000).point(0, 0).buffer(100)
large_circle_wkt = large_circle.as_text
large_circle_wkb = large_circle.as_binary
target_factory = RGeo::Geos.factory(srid: 4326)
n = 1_000

# WKT parsing
Benchmark.bm do |x|
  x.report("RGeo::WKRep::WKTParser") { n.times { RGeo::WKRep::WKTParser.new(target_factory, support_ewkt: true).parse(large_circle_wkt) } }
  x.report("GEOS factory.parse_wkt()") { n.times { target_factory.parse_wkt(large_circle_wkt) } }
end

#                               user     system      total        real
# RGeo::WKRep::WKTParser   20.830059   0.000542  20.830601 ( 20.865695)
# GEOS factory.parse_wkt()  0.842476   0.011024   0.853500 (  0.854785)

# WKB parsing
Benchmark.bm do |x|
  x.report("RGeo::WKRep::WKBParser") { n.times { RGeo::WKRep::WKBParser.new(target_factory, support_ewkb: true).parse(large_circle_wkb) } }
  x.report("GEOS factory.parse_wkb()") { n.times { target_factory.parse_wkb(large_circle_wkb) } }
end

#                               user     system      total        real
# RGeo::WKRep::WKBParser    8.811269   0.012583   8.823852 (  8.842664)
# GEOS factory.parse_wkb()  0.048067   0.016100   0.064167 (  0.064284)

So, GEOS WKT parsing is ~20x faster, and GEOS WKB parsing is ~150x faster.

On a scale of parsing 1 (even quite large) geometry, the difference is negligible. But in cases when we need to load tens or hundreds of thousands of geometries from the DB, the difference could be minutes or even hours.

In our project (we are using GEOS factories), we did a monkey-patch for ActiveRecord::ConnectionAdapters::PostGIS::OID::Spatial:

module ActiveRecord
  module ConnectionAdapters
    module PostGIS
      module OID
        class Spatial
          def parse_wkt(string)
            factory =
              if spatial_factory.is_a?(::RGeo::Feature::Factory::Instance)
                spatial_factory
              else
                spatial_factory.call(srid: factory_attrs.fetch(:srid))
              end

            if binary_string?(string)
              factory.parse_wkb(string)
            else
              factory.parse_wkt(string)
            end
          rescue RGeo::Error::ParseError, RGeo::Error::GeosError
            nil
          end
        end
      end
    end
  end
end

The original idea was taken from tneems@b587269.

I am not sure, maybe there are some nuances for FFI and Ruby factories (as we don't use them). I will try to test for those cases and open a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions