-
Notifications
You must be signed in to change notification settings - Fork 710
SEDONA-725 Add pyflink to Sedona. #1875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
this is so cool. Do you think we'd better put pyflink as a new Python module to avoid conflicts with pyspark? |
- run: sudo pip3 install -U setuptools | ||
- run: sudo pip3 install -U wheel | ||
- run: sudo pip3 install -U virtualenvwrapper | ||
- run: python3 -m pip install uv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we keep the tool we use consistent across the project? we are using pipenv in the rest of the project AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to propose uv as a base tool for Python dependencies in Sedona. I think uv is the future https://github.com/astral-sh/uv
It's way more popular now and faster than, pipenv and poetry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were using Ruff
- An extremely fast Python linter and code formatter, written in Rust, in our pre-commit framework but we changed to black
.
It seems that Astral is modern popular fast tooling.
1f82586
to
aa12f25
Compare
Overall LGTM. Can you break this PR into 2 PRs? The first PR moves all Sedona PySpark stuff to Sedona Spark folder. The second PR adds the PyFlink. Does it make sense? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the Sedona documentation and GitHub workflows to reflect the new pyflink and Sedona Spark APIs. It changes the import paths in various tutorial and API documentation files and adds concurrency configurations to multiple GitHub workflow files, as well as introducing a new pyflink test workflow.
- Updated import statements in tutorials and API docs to use sedona.spark modules.
- Fixed minor documentation typos in comments.
- Added concurrency configurations to workflows and introduced a new pyflink workflow.
Reviewed Changes
Copilot reviewed 225 out of 225 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
docs/tutorial/sql.md | Updated import paths and sample code for Sedona Spark modules; potential naming issue in function call. |
docs/tutorial/rdd.md | Updated import paths for Spark modules in RDD-based tutorials and fixed spelling errors in comments. |
docs/tutorial/geopandas-shapely.md | Updated import path for geoarrow to use sedona.spark. |
docs/tutorial/files/stac-sedona-spark.md | Updated Stac client import to the Sedona Spark variant. |
docs/tutorial/concepts/clustering-algorithms.md | Updated DBSCAN import to use sedona.spark.stats. |
docs/setup/install-python.md | Updated Sedona registrator imports to use sedona.spark. |
docs/api/sql/Visualization_SedonaPyDeck.md | Updated SedonaPyDeck import to use sedona.spark. |
docs/api/sql/Visualization_SedonaKepler.md | Updated SedonaKepler import to use sedona.spark. |
docs/api/sql/Raster-visualizer.md | Updated SedonaUtils import to use sedona.spark. |
GitHub Workflows | Added concurrency configurations and updated Python test flags for skipping flink. |
.github/workflows/pyflink.yml | Introduced a new workflow for Sedona Pyflink testing. |
Comments suppressed due to low confidence (3)
docs/tutorial/sql.md:744
- The function is imported as 'add_binary_distance_band_column' but called using camelCase. Please update the function call to use snake_case for consistency.
weighted_df = addBinaryDistanceBandColumn(df, distance_radius)
docs/tutorial/rdd.md:172
- There is a typo in the comment: 'gemeotries' should be 'geometries'.
consider_boundary_intersection = False ## Only return gemeotries fully covered by the window
docs/tutorial/rdd.md:291
- There is a typo in the comment: 'gemeotries' should be 'geometries'.
consider_boundary_intersection = False ## Only return gemeotries fully covered by the window
Yeah, sure. I'm sorry for not getting back to you sooner. Can we merge this one first? #1924 It's helping to not spawn a lot of jobs in the CI |
@Imbruced Just merge that PR 😎 |
1719b50
to
7aaf41d
Compare
7aaf41d
to
cda1fbb
Compare
SEDONA-725 rearrange the spark module
fa8ddba
to
a2e3420
Compare
Did you read the Contributor Guide?
Yes, I have read the Contributor Rules and Contributor Development Guide
No, I haven't read it.
Is this PR related to a ticket?
Yes, and the PR name follows the format
[SEDONA-XXX] my subject
.Yes, and the PR name follows the format
[GH-XXX] my subject
.No:
[DOCS] my subject
[CI] my subject
What changes were proposed in this PR?
How was this patch tested?
Did this PR include necessary documentation updates?
vX.Y.Z
format.