[SPARK-52505][K8S] Allow to create executor kubernetes service#96
Open
[SPARK-52505][K8S] Allow to create executor kubernetes service#96
Conversation
0673d0d to
c4818e3
Compare
c4818e3 to
6bfacf0
Compare
6bfacf0 to
52b69d8
Compare
… and MutableColumnarRow ### What changes were proposed in this pull request? `ColumnarBatchRow.copy()` and `MutableColumnarRow.copy()`/`get()` do not handle `VariantType`, causing a `RuntimeException: Not implemented. VariantType` when using `VariantType` columns with streaming custom data sources that rely on columnar batch row copying. PR apache#53137 (SPARK-54427) added `VariantType` support to `ColumnarRow` but missed `ColumnarBatchRow` and `MutableColumnarRow`. PR apache#54006 attempted this fix but was closed. This patch adds: - `PhysicalVariantType` branch in `ColumnarBatchRow.copy()` - `VariantType` branch in `MutableColumnarRow.copy()` and `get()` - Test in `ColumnarBatchSuite` validating `VariantVal` round-trip through `copy()` ### Why are the changes needed? Without this fix, any streaming pipeline that returns `VariantType` columns from a custom columnar data source throws a runtime exception when Spark attempts to copy the batch row. ### Does this PR introduce _any_ user-facing change? No. This is a bug fix for an existing feature. ### How was this patch tested? Added a new test `[SPARK-55552] Variant` in `ColumnarBatchSuite` that creates a `VariantType` column vector, populates it with `VariantVal` data (including a null), wraps it in a `ColumnarBatchRow`, calls `copy()`, and verifies the values round-trip correctly. ### Was this patch authored or co-authored using generative AI tooling? Yes. GitHub Copilot was used to assist in drafting portions of this contribution. This contribution is my original work and I license it under the Apache 2.0 license. Closes apache#54337 from tugceozberk/fix-columnar-batch-row-variant-copy. Authored-by: tugce-applied <tugce@appliedcomputing.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…y and is_geography ### What changes were proposed in this pull request? Add `field.metadata is not None` check in `is_geometry()` and `is_geography()` before accessing `field.metadata` with the `in` operator. ### Why are the changes needed? PyArrow struct fields have `metadata=None` by default. When `from_arrow_type()` encounters a struct with a field named `wkb` that has no metadata, `is_geometry()` / `is_geography()` crash with `TypeError: argument of type 'NoneType' is not iterable` because the code does `b"geometry" in field.metadata` without checking for `None` first. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Verified with a dedicated unit test (see commit a3fbd9b) that constructs a PyArrow struct with `wkb` and `srid` fields without metadata, confirming the `TypeError` before the fix and a clean pass after. The test was then removed (commit da08fdc) to reduce test burden since the fix is a trivial one-line defensive check. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#54295 from Yicong-Huang/SPARK-55507/fix/is-geometry-none-metadata. Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…g for SQL functions ### What changes were proposed in this pull request? Propagate the new grouping "sketch_funcs" to the rest of the files where datasketches are used. Reference PR: apache#54061 ### Why are the changes needed? apache#54061 created a new group called sketch_funcs and moved them out the datasketch functions from misc_funcs group. The new grouping has not propagated through the whole codebase, so this is an attempt to do so ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Existing tests ### Was this patch authored or co-authored using generative AI tooling? N/A Closes apache#54231 from cboumalh/SPARK-55279-sketch-funcs-group-followup. Lead-authored-by: Chris Boumalhab <cboumalh@amazon.com> Co-authored-by: Chris Boumalhab <84485659+cboumalh@users.noreply.github.com> Signed-off-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
…r applyInPandas ### What changes were proposed in this pull request? Optimize the non-iterator `applyInPandas` path by merging Arrow batches at the Arrow level before converting to pandas, instead of converting each batch individually and reassembling via per-column `pd.concat`. Changes: - **`GroupPandasUDFSerializer.load_stream`**: Yield raw `Iterator[pa.RecordBatch]` instead of converting to pandas per-batch via `ArrowBatchTransformer.to_pandas`. - **Non-iterator mapper**: Collect all Arrow batches → `pa.Table.from_batches().combine_chunks()` → convert to pandas once for the entire group. - **`wrap_grouped_map_pandas_udf`**: Simplified to accept a list of pandas Series directly. - **Iterator mapper**: Split into its own `elif` branch; still converts batches lazily via `ArrowBatchTransformer.to_pandas` per-batch. ### Why are the changes needed? Follow-up to SPARK-55459. After SPARK-54316 consolidated the grouped-map serializer, the non-iterator `applyInPandas` lost its efficient Arrow-level batch merge and instead converts each batch to pandas individually, then reassembles via per-column `pd.concat`. This PR restores the Arrow-level merge so that all batches within a group are merged into a single `pa.Table` and converted to pandas once, rather than N times (once per batch). A pure-Python microbenchmark (335 groups × 100K rows × 5 columns, 7 runs each): | Approach | avg | min | vs Master | |---|---|---|---| | Master (per-batch convert + per-column concat) | 0.544s | 0.543s | baseline | | This PR (Arrow-level merge + single convert) | 0.489s | 0.487s | **10% faster** | ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing `applyInPandas` tests (`test_pandas_grouped_map.py`, `test_pandas_cogrouped_map.py`). ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#54327 from Yicong-Huang/SPARK-55529/optimize-apply-in-pandas-arrow-merge. Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com> Signed-off-by: Takuya Ueshin <ueshin@databricks.com>
### What changes were proposed in this pull request? This PR aims to upgrade `MySQL` image to `9.6.0` for Apache Spark 4.2.0. ### Why are the changes needed? To maintain our JDBC test coverage up-to-date by testing against the latest version of `MySQL`. - https://dev.mysql.com/doc/relnotes/mysql/9.6/en/ (2026-01-20) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: `Gemini 3 Pro (High)` on `Antigravity` Closes apache#54346 from dongjoon-hyun/SPARK-55572. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…o-int unsafe cast golden file mismatches ### What changes were proposed in this pull request? Add hardcoded ARM (aarch64/arm64) overrides in `_overrides_unsafe()` for 34 float-to-integer unsafe cast golden file mismatches. Also rename `_version_overrides_safe/unsafe` to `_overrides_safe/unsafe` since overrides now cover both version and platform differences. ### Why are the changes needed? The PyArrow array cast golden files are generated on x86. On ARM, unsafe float-to-integer casts produce different results due to IEEE 754 implementation-defined behavior: - **ARM FCVT instructions** saturate on overflow: `inf` → type max, `-inf` → type min, `nan` → 0 - **x86 SSE/AVX** returns "integer indefinite" values (typically `0x80...0`) - **Negative float → unsigned int**: ARM saturates to 0, x86 may wrap This caused `test_scalar_cast_matrix_unsafe` to fail on Ubuntu 24.04 ARM CI runners with 34 mismatches across three categories: - `float*:standard` → `uint*` (negative float -1.5 → unsigned int) - `float*:special` → `int*` (inf/nan → signed int) - `float*:special` → `uint*` (inf/nan → unsigned int) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Verified with [ARM CI run](https://github.com/Yicong-Huang/spark/actions/runs/21966674224) on my fork. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#54288 from Yicong-Huang/SPARK-55405/fix/arm-float-cast-overrides. Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
…ogroupPandasUDFSerializer` ### What changes were proposed in this pull request? Pass explicit Spark schema (derived from each Arrow table's schema via `from_arrow_schema`) to `ArrowBatchTransformer.to_pandas()` in `CogroupPandasUDFSerializer.load_stream()`, instead of passing `None` (the inherited `_input_type`). ### Why are the changes needed? `CogroupPandasUDFSerializer` is constructed without `input_type`, so `_input_type` defaults to `None`. When `to_pandas()` receives `schema=None`, it infers the Spark schema from the Arrow batch internally via `from_arrow_type()`. This works, but: 1. The same `None` is used for both left and right DataFrames, which is conceptually wrong since they can have different schemas. 2. The schema inference is implicit rather than explicit. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests in `test_pandas_cogrouped_map.py`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#54293 from Yicong-Huang/SPARK-55506/fix/cogroup-input-type. Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
### What changes were proposed in this pull request? Removed test files from ignore list of ruff for `F403` and `F401`. ### Why are the changes needed? We used to need it because we use `from ... import *` in our tests. We don't do that anymore (with a few rare cases that we can ignore inline). During the time we ignore these warnings, we accumulated some real unused imports. These are cleared with this PR. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#54251 from gaogaotiantian/fix-test-import. Authored-by: Tian Gao <gaogaotiantian@hotmail.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
### What changes were proposed in this pull request? * Replaced `parameterized` with standard logic * Add `test_dl_util` to `modules.py` ### Why are the changes needed? The 3rd party library `parameterized` is unnecessary and it's not used by any other tests. The test could run without it. Then we can add it to `modules.py` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Locally it passed. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#54286 from gaogaotiantian/fix-test-dl-util. Authored-by: Tian Gao <gaogaotiantian@hotmail.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
### What changes were proposed in this pull request? * Add all dangling test files to `modules.py` * There are two helper python files that start with `test_` which is a semantics used for python test files. Rename those files and change the caller code accordingly. ### Why are the changes needed? We should run tests if they were there. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#54274 from gaogaotiantian/add-dangling-tests. Authored-by: Tian Gao <gaogaotiantian@hotmail.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
### What changes were proposed in this pull request? This PR aims to enable `GitHub Issues` feature. ### Why are the changes needed? This will enable `GitHub Issues` feature after the vote passes. [[VOTE] Open github issues as an experimental option](https://lists.apache.org/thread/kv11qlr8j05cwqjoyddybclwcn0nv2n7). ### Does this PR introduce _any_ user-facing change? No Spark behavior change. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#54334 from dongjoon-hyun/SPARK-55547. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…d revise test suites ### What changes were proposed in this pull request? This PR aims to upgrade `MariaDB` JDBC driver to `3.5.7` from `2.7.12` for Apache Spark 4.2.0. ### Why are the changes needed? To maintain the JDBC test coverage up-to-date with the latest `MariaDB` JDBC driver. - https://mariadb.com/docs/release-notes/connectors/java/3.5/3.5.7 (2025-12-07) - https://mariadb.com/docs/release-notes/connectors/java/3.4/3.4.2 (2025-03-27) - https://mariadb.com/docs/release-notes/connectors/java/3.3/3.3.4 (2025-03-27) - https://mariadb.com/docs/release-notes/connectors/java/3.2/3.2.0 (2023-08-20) - https://mariadb.com/docs/release-notes/connectors/java/3.1/3.1.4 (2023-05-01) - https://mariadb.com/docs/release-notes/connectors/java/3.0/3.0.11 (2023-08-25) Note that - `MariaDB JDBC Client v3` was released 5 years ago (May 2021) and fixed the old issue which cause a conflict with `MySQL JDBC Driver`. - In other words, `v3` client only accepts `jdbc:mariadb`. We need to have a test coverage for `v3`. - Also, there is a legacy configuration, `permitMysqlScheme`, to preserve the old behavior. We use this to minimize the size of this PR. We can remove this `permitMysqlScheme` later. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: `Gemini 3 Pro (High)` on `Antigravity` Closes apache#54348 from dongjoon-hyun/SPARK-55574. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? This PR aims to upgrade `MySQL` JDBC driver to `9.6.0`. ### Why are the changes needed? To maintain JDBC test coverage up-to-date with the latest MySQL JDBC driver. - https://dev.mysql.com/doc/relnotes/connector-j/en/news-9-6-0.html (2026-01-21) - https://dev.mysql.com/doc/relnotes/connector-j/en/news-9-5-0.html (2025-10-22) - https://dev.mysql.com/doc/relnotes/connector-j/en/news-9-4-0.html (2025-07-22) - https://dev.mysql.com/doc/relnotes/connector-j/en/news-9-3-0.html (2025-04-15) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: `Gemini 3 Pro (High)` on `Antigravity` Closes apache#54349 from dongjoon-hyun/SPARK-55575. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…port `sleepBeforeTesting` ### What changes were proposed in this pull request? This PR aims to improve `DockerJDBCIntegrationSuite` to support `sleepBeforeTesting` ### Why are the changes needed? Some DBMS images enter `Running` status even they are not ready to receive the connnection. This new API will help us handle those images. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#54355 from dongjoon-hyun/SPARK-55578. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
ed2d724 to
f43d5b9
Compare
97d3004 to
5783d88
Compare
…xecutor.service.enabled
5783d88 to
a02f739
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This allows executors to register its block manager with the driver via a Kubernetes service name rather than the pod IP. This allows driver and executor to connect to the executor block manager via the service.
Why are the changes needed?
In Kubernetes, connecting to an evicted (decommissioned) executor times out after 2 minutes (default). Executors connect to other executors synchronously (one at a time), so this time out accumulates for each executor peer. An executor that reads from many decommissioned executors blocks for a multiple of the timeout until it fails with a fetch failure.
This can be fixed by binding the block manager to a fixed port, defining a Kubernetes service for that block manager port and have the executor register that K8S service port with the driver. The driver and other executors then connect to the service name and instantly fail with a connection refused if the executor got decommissioned and the service removed.
Setting
spark.kubernetes.executor.enableService=trueand definingspark.blockManager.portwill perform this setup for each executor.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit tests.
Was this patch authored or co-authored using generative AI tooling?
No.