[SPARK-56561][DOCS] Document order preservation for array_distinct, array_intersect, array_union, array_except by shrirangmhalgi · Pull Request #55549 · apache/spark

shrirangmhalgi · 2026-04-25T03:49:37Z

What changes were proposed in this pull request?

This change documents the order preservation behavior of array_distinct, array_intersect, array_union, and array_except in:

SQL function descriptions (@ExpressionDescription)
Scala API scaladoc (functions.scala)
PySpark docstrings (builtin.py)

Also fixes an incorrect statement in array_except's scaladoc which said "The order of elements in the result is not determined" - the implementation preserves order from the first array.

Why are the changes needed?

With this change users will not have to read implementation code to know whether these functions preserve element order. This is useful for code reviews and helps AI coding agents understand the behavior.

Does this PR introduce any user-facing change?

No - It is just updating the documentation.

How was this patch tested?

Verified Unit Tests using SBT - Tests pass for CollectionExpressionsSuite and DataFrameFunctionsSuite

build/sbt 'catalyst/testOnly *CollectionExpressionsSuite -- -z "Array Distinct" -z "Array Union" -z "Array Except" -z "Array Intersect"'
build/sbt 'sql/testOnly *DataFrameFunctionsSuite -- -z "array_distinct" -z "array_intersect" -z "array_union" -z "array_except"'

Runtime verification using - spark-shell:

import org.apache.spark.sql.functions._
val df = spark.createDataFrame(Seq((Array(3,1,2,1,3), Array(2,4,3)))).toDF("a","b")
val r1 = df.select(array_distinct(col("a"))).collect()(0).getSeq[Int](0)
println(s"array_distinct([3,1,2,1,3]) = $r1")

Result - array_distinct([3,1,2,1,3]) = ArraySeq(3, 1, 2)

import org.apache.spark.sql.functions._
val df = spark.createDataFrame(Seq((Array(3,1,2,1,3), Array(2,4,3)))).toDF("a","b")
val r2 = df.select(array_union(col("a"), col("b"))).collect()(0).getSeq[Int](0)
println(s"array_union([3,1,2,1,3], [2,4,3]) = $r2")

Result - array_union([3,1,2,1,3], [2,4,3]) = ArraySeq(3, 1, 2, 4)

import org.apache.spark.sql.functions._
val df = spark.createDataFrame(Seq((Array(3,1,2,1,3), Array(2,4,3)))).toDF("a","b")
val r3 = df.select(array_intersect(col("a"), col("b"))).collect()(0).getSeq[Int](0)
println(s"array_intersect([3,1,2,1,3], [2,4,3]) = $r3")

Result - array_intersect([3,1,2,1,3], [2,4,3]) = ArraySeq(3, 2)

import org.apache.spark.sql.functions._
val df = spark.createDataFrame(Seq((Array(3,1,2,1,3), Array(2,4,3)))).toDF("a","b")
val r4 = df.select(array_except(col("a"), col("b"))).collect()(0).getSeq[Int](0)
println(s"array_except([3,1,2,1,3], [2,4,3]) = $r4")

Result - array_except([3,1,2,1,3], [2,4,3]) = ArraySeq(1)

What changes were proposed in this pull request?

Documentation update.

Was this patch authored or co-authored using generative AI tooling?

No.

…rray_intersect, array_union, array_except

shrirangmhalgi force-pushed the SPARK-56561-doc-array-order branch from 8e03f20 to 534cf1b Compare April 25, 2026 06:54

[SPARK-56561][DOCS] Document order preservation for array_distinct, a…

0b421b0

…rray_intersect, array_union, array_except

shrirangmhalgi force-pushed the SPARK-56561-doc-array-order branch from 534cf1b to 0b421b0 Compare April 25, 2026 06:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56561][DOCS] Document order preservation for array_distinct, array_intersect, array_union, array_except#55549

[SPARK-56561][DOCS] Document order preservation for array_distinct, array_intersect, array_union, array_except#55549
shrirangmhalgi wants to merge 1 commit intoapache:masterfrom
shrirangmhalgi:SPARK-56561-doc-array-order

shrirangmhalgi commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shrirangmhalgi commented Apr 25, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

What changes were proposed in this pull request?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant