[SEDONA-725] Add pyflink to Sedona. #1875

Imbruced · 2025-03-23T18:06:28Z

Did you read the Contributor Guide?

Yes, I have read the Contributor Rules and Contributor Development Guide

Is this PR related to a ticket?

Yes, and the PR name follows the format [SEDONA-XXX] my subject.

What changes were proposed in this PR?

Add PyFlink integration with Sedona Flink

How was this patch tested?

Add new unit tests

Did this PR include necessary documentation updates?

Yes, I have updated the documentation.

jiayuasu · 2025-03-26T02:44:56Z

this is so cool. Do you think we'd better put pyflink as a new Python module to avoid conflicts with pyspark?

james-willis · 2025-03-31T20:05:30Z

.github/workflows/pyflink.yml

+      - run: sudo pip3 install -U setuptools
+      - run: sudo pip3 install -U wheel
+      - run: sudo pip3 install -U virtualenvwrapper
+      - run: python3 -m pip install uv


should we keep the tool we use consistent across the project? we are using pipenv in the rest of the project AFAIK.

I plan to propose uv as a base tool for Python dependencies in Sedona. I think uv is the future https://github.com/astral-sh/uv
It's way more popular now and faster than, pipenv and poetry

We were using Ruff - An extremely fast Python linter and code formatter, written in Rust, in our pre-commit framework but we changed to black.

It seems that Astral is modern popular fast tooling.

https://astral.sh/

https://github.com/astral-sh/ruff

#1368

https://github.com/psf/black

jiayuasu · 2025-04-25T06:50:00Z

Overall LGTM. Can you break this PR into 2 PRs? The first PR moves all Sedona PySpark stuff to Sedona Spark folder. The second PR adds the PyFlink. Does it make sense?

Copilot

Pull Request Overview

This PR updates the Sedona documentation and GitHub workflows to reflect the new pyflink and Sedona Spark APIs. It changes the import paths in various tutorial and API documentation files and adds concurrency configurations to multiple GitHub workflow files, as well as introducing a new pyflink test workflow.

Updated import statements in tutorials and API docs to use sedona.spark modules.
Fixed minor documentation typos in comments.
Added concurrency configurations to workflows and introduced a new pyflink workflow.

Reviewed Changes

Copilot reviewed 225 out of 225 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
docs/tutorial/sql.md	Updated import paths and sample code for Sedona Spark modules; potential naming issue in function call.
docs/tutorial/rdd.md	Updated import paths for Spark modules in RDD-based tutorials and fixed spelling errors in comments.
docs/tutorial/geopandas-shapely.md	Updated import path for geoarrow to use sedona.spark.
docs/tutorial/files/stac-sedona-spark.md	Updated Stac client import to the Sedona Spark variant.
docs/tutorial/concepts/clustering-algorithms.md	Updated DBSCAN import to use sedona.spark.stats.
docs/setup/install-python.md	Updated Sedona registrator imports to use sedona.spark.
docs/api/sql/Visualization_SedonaPyDeck.md	Updated SedonaPyDeck import to use sedona.spark.
docs/api/sql/Visualization_SedonaKepler.md	Updated SedonaKepler import to use sedona.spark.
docs/api/sql/Raster-visualizer.md	Updated SedonaUtils import to use sedona.spark.
GitHub Workflows	Added concurrency configurations and updated Python test flags for skipping flink.
.github/workflows/pyflink.yml	Introduced a new workflow for Sedona Pyflink testing.

Comments suppressed due to low confidence (3)

docs/tutorial/sql.md:744

The function is imported as 'add_binary_distance_band_column' but called using camelCase. Please update the function call to use snake_case for consistency.

weighted_df = addBinaryDistanceBandColumn(df, distance_radius)

docs/tutorial/rdd.md:172

There is a typo in the comment: 'gemeotries' should be 'geometries'.

consider_boundary_intersection = False  ## Only return gemeotries fully covered by the window

docs/tutorial/rdd.md:291

There is a typo in the comment: 'gemeotries' should be 'geometries'.

consider_boundary_intersection = False ## Only return gemeotries fully covered by the window

Imbruced · 2025-04-29T18:11:46Z

Overall LGTM. Can you break this PR into 2 PRs? The first PR moves all Sedona PySpark stuff to Sedona Spark folder. The second PR adds the PyFlink. Does it make sense?

Yeah, sure. I'm sorry for not getting back to you sooner. Can we merge this one first? #1924 It's helping to not spawn a lot of jobs in the CI

jiayuasu · 2025-04-29T18:17:17Z

@Imbruced Just merge that PR 😎

Imbruced · 2025-05-08T21:58:35Z

@jiayuasu what do we do with the Apache Flink MR?

jiayuasu · 2025-05-09T05:14:33Z

@Imbruced LGTM. Can you add documentation?

jiayuasu · 2025-05-21T03:57:24Z

@Imbruced LGTM. Can you add documentation?

@Imbruced gentle ping...

SEDONA-725 rearrange the spark module

Imbruced · 2025-05-21T22:27:28Z

@jiayuasu docs added

docs/setup/flink/install-python.md

jiayuasu · 2025-05-22T03:26:28Z

@Imbruced 2 minor comments. Please also fix the failed lint action

Co-authored-by: Jia Yu <[email protected]>

github-actions bot added sedona-python github-actions labels Mar 23, 2025

james-willis suggested changes Mar 31, 2025

View reviewed changes

Imbruced force-pushed the SEDONA-725-add-flink-register-functions branch from 1f82586 to aa12f25 Compare April 22, 2025 10:07

github-actions bot added docs sedona-spark labels Apr 22, 2025

jiayuasu requested a review from Copilot April 26, 2025 05:03

Copilot AI reviewed Apr 26, 2025

View reviewed changes

Imbruced force-pushed the SEDONA-725-add-flink-register-functions branch from 1719b50 to 7aaf41d Compare May 3, 2025 20:58

github-actions bot removed docs sedona-spark labels May 3, 2025

Imbruced force-pushed the SEDONA-725-add-flink-register-functions branch 2 times, most recently from fa8ddba to a2e3420 Compare May 4, 2025 10:57

Imbruced marked this pull request as ready for review May 8, 2025 21:58

Imbruced requested a review from jiayuasu as a code owner May 8, 2025 21:58

jiayuasu changed the title ~~SEDONA-725 Add pyflink to Sedona.~~ [SEDONA-725] Add pyflink to Sedona. May 9, 2025

jiayuasu added this to the sedona-1.8.0 milestone May 9, 2025

jiayuasu added attention needed affect public APIs labels May 9, 2025

Imbruced added 3 commits May 21, 2025 22:15

SEDONA-725 Add pyflink to Sedona.

c46f1c0

SEDONA-725 rearrange the spark module

SEDONA-725 add docs

b2492ae

SEDONA-725 add docs

7d6f85c

Imbruced force-pushed the SEDONA-725-add-flink-register-functions branch from a2e3420 to 7d6f85c Compare May 21, 2025 22:16

github-actions bot added docs root labels May 21, 2025

Imbruced added 3 commits May 22, 2025 00:17

SEDONA-725 add docs

512952a

SEDONA-725 add docs

6399bbb

SEDONA-725 add docs

db33d44

jiayuasu reviewed May 22, 2025

View reviewed changes

docs/setup/flink/install-python.md Outdated Show resolved Hide resolved

jiayuasu reviewed May 22, 2025

View reviewed changes

docs/setup/flink/install-python.md Outdated Show resolved Hide resolved

Imbruced and others added 5 commits May 22, 2025 22:03

Update docs/setup/flink/install-python.md

8619fc7

Co-authored-by: Jia Yu <[email protected]>

Update docs/setup/flink/install-python.md

bdc7805

Co-authored-by: Jia Yu <[email protected]>

SEDONA-725 add docs

cceb0d5

SEDONA-725 add docs

20a58c5

SEDONA-725 add docs

5137fb9

jiayuasu approved these changes May 22, 2025

View reviewed changes

jiayuasu merged commit afda111 into master May 23, 2025
26 checks passed

jiayuasu deleted the SEDONA-725-add-flink-register-functions branch May 30, 2025 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SEDONA-725] Add pyflink to Sedona. #1875

[SEDONA-725] Add pyflink to Sedona. #1875

Uh oh!

Imbruced commented Mar 23, 2025 •

edited by jiayuasu

Loading

Uh oh!

jiayuasu commented Mar 26, 2025

Uh oh!

james-willis Mar 31, 2025

Uh oh!

Imbruced Mar 31, 2025

Uh oh!

jbampton Apr 1, 2025

Uh oh!

jiayuasu commented Apr 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Imbruced commented Apr 29, 2025

Uh oh!

jiayuasu commented Apr 29, 2025

Uh oh!

Imbruced commented May 8, 2025

Uh oh!

jiayuasu commented May 9, 2025

Uh oh!

jiayuasu commented May 21, 2025

Uh oh!

Imbruced commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

jiayuasu commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

[SEDONA-725] Add pyflink to Sedona. #1875

[SEDONA-725] Add pyflink to Sedona. #1875

Uh oh!

Conversation

Imbruced commented Mar 23, 2025 • edited by jiayuasu Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

How was this patch tested?

Did this PR include necessary documentation updates?

Uh oh!

jiayuasu commented Mar 26, 2025

Uh oh!

james-willis Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Imbruced Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

jbampton Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

jiayuasu commented Apr 25, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Imbruced commented Apr 29, 2025

Uh oh!

jiayuasu commented Apr 29, 2025

Uh oh!

Imbruced commented May 8, 2025

Uh oh!

jiayuasu commented May 9, 2025

Uh oh!

jiayuasu commented May 21, 2025

Uh oh!

Imbruced commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

jiayuasu commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

Imbruced commented Mar 23, 2025 •

edited by jiayuasu

Loading