Skip to content

Latest commit

 

History

History
73 lines (52 loc) · 1.68 KB

File metadata and controls

73 lines (52 loc) · 1.68 KB

Removes duplicate rows. The rows in the resulting DataFrame are in the same order as they were in the original DataFrame.

Related operations:

df.distinct()

If columns are specified, resulting DataFrame will have only given columns with distinct values.

See column selectors for how to select the columns for this operation.

df.distinct { age and name }
// same as
df.select { age and name }.distinct()
df.distinct("age", "name")
// same as
df.select("age", "name").distinct()

distinctBy

Keep only the first row for every group of rows grouped by some condition.

See column selectors for how to select the columns for this operation.

df.distinctBy { age and name }
// same as
df.groupBy { age and name }.mapToRows { group.first() }
df.distinctBy("age", "name")
// same as
df.groupBy("age", "name").mapToRows { group.first() }