Skip to content

SEDONA-725 Add pyflink to Sedona. #1875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Imbruced
Copy link
Member

Did you read the Contributor Guide?

Is this PR related to a ticket?

  • Yes, and the PR name follows the format [SEDONA-XXX] my subject.

  • Yes, and the PR name follows the format [GH-XXX] my subject.

  • No:

    • this is a documentation update. The PR name follows the format [DOCS] my subject
    • this is a CI update. The PR name follows the format [CI] my subject

What changes were proposed in this PR?

How was this patch tested?

Did this PR include necessary documentation updates?

  • Yes, I am adding a new API. I am using the current SNAPSHOT version number in vX.Y.Z format.
  • Yes, I have updated the documentation.
  • No, this PR does not affect any public API so no need to change the documentation.

@jiayuasu
Copy link
Member

this is so cool. Do you think we'd better put pyflink as a new Python module to avoid conflicts with pyspark?

- run: sudo pip3 install -U setuptools
- run: sudo pip3 install -U wheel
- run: sudo pip3 install -U virtualenvwrapper
- run: python3 -m pip install uv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we keep the tool we use consistent across the project? we are using pipenv in the rest of the project AFAIK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to propose uv as a base tool for Python dependencies in Sedona. I think uv is the future https://github.com/astral-sh/uv
It's way more popular now and faster than, pipenv and poetry

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were using Ruff - An extremely fast Python linter and code formatter, written in Rust, in our pre-commit framework but we changed to black.

It seems that Astral is modern popular fast tooling.

https://astral.sh/

https://github.com/astral-sh/ruff

#1368

https://github.com/psf/black

@Imbruced Imbruced force-pushed the SEDONA-725-add-flink-register-functions branch from 1f82586 to aa12f25 Compare April 22, 2025 10:07
@jiayuasu
Copy link
Member

Overall LGTM. Can you break this PR into 2 PRs? The first PR moves all Sedona PySpark stuff to Sedona Spark folder. The second PR adds the PyFlink. Does it make sense?

@jiayuasu jiayuasu requested a review from Copilot April 26, 2025 05:03
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the Sedona documentation and GitHub workflows to reflect the new pyflink and Sedona Spark APIs. It changes the import paths in various tutorial and API documentation files and adds concurrency configurations to multiple GitHub workflow files, as well as introducing a new pyflink test workflow.

  • Updated import statements in tutorials and API docs to use sedona.spark modules.
  • Fixed minor documentation typos in comments.
  • Added concurrency configurations to workflows and introduced a new pyflink workflow.

Reviewed Changes

Copilot reviewed 225 out of 225 changed files in this pull request and generated no comments.

Show a summary per file
File Description
docs/tutorial/sql.md Updated import paths and sample code for Sedona Spark modules; potential naming issue in function call.
docs/tutorial/rdd.md Updated import paths for Spark modules in RDD-based tutorials and fixed spelling errors in comments.
docs/tutorial/geopandas-shapely.md Updated import path for geoarrow to use sedona.spark.
docs/tutorial/files/stac-sedona-spark.md Updated Stac client import to the Sedona Spark variant.
docs/tutorial/concepts/clustering-algorithms.md Updated DBSCAN import to use sedona.spark.stats.
docs/setup/install-python.md Updated Sedona registrator imports to use sedona.spark.
docs/api/sql/Visualization_SedonaPyDeck.md Updated SedonaPyDeck import to use sedona.spark.
docs/api/sql/Visualization_SedonaKepler.md Updated SedonaKepler import to use sedona.spark.
docs/api/sql/Raster-visualizer.md Updated SedonaUtils import to use sedona.spark.
GitHub Workflows Added concurrency configurations and updated Python test flags for skipping flink.
.github/workflows/pyflink.yml Introduced a new workflow for Sedona Pyflink testing.
Comments suppressed due to low confidence (3)

docs/tutorial/sql.md:744

  • The function is imported as 'add_binary_distance_band_column' but called using camelCase. Please update the function call to use snake_case for consistency.
weighted_df = addBinaryDistanceBandColumn(df, distance_radius)

docs/tutorial/rdd.md:172

  • There is a typo in the comment: 'gemeotries' should be 'geometries'.
consider_boundary_intersection = False  ## Only return gemeotries fully covered by the window

docs/tutorial/rdd.md:291

  • There is a typo in the comment: 'gemeotries' should be 'geometries'.
consider_boundary_intersection = False ## Only return gemeotries fully covered by the window

@Imbruced
Copy link
Member Author

Overall LGTM. Can you break this PR into 2 PRs? The first PR moves all Sedona PySpark stuff to Sedona Spark folder. The second PR adds the PyFlink. Does it make sense?

Yeah, sure. I'm sorry for not getting back to you sooner. Can we merge this one first? #1924 It's helping to not spawn a lot of jobs in the CI

@jiayuasu
Copy link
Member

@Imbruced Just merge that PR 😎

@Imbruced Imbruced force-pushed the SEDONA-725-add-flink-register-functions branch from 1719b50 to 7aaf41d Compare May 3, 2025 20:58
@Imbruced Imbruced force-pushed the SEDONA-725-add-flink-register-functions branch from 7aaf41d to cda1fbb Compare May 3, 2025 21:01
SEDONA-725 rearrange the spark module
@Imbruced Imbruced force-pushed the SEDONA-725-add-flink-register-functions branch from fa8ddba to a2e3420 Compare May 4, 2025 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants