Skip to content

Commit 4d01ce1

Browse files
authored
docs: Update roadmap in contributor guide (#4144)
1 parent 050e1e2 commit 4d01ce1

1 file changed

Lines changed: 39 additions & 20 deletions

File tree

docs/source/contributor-guide/roadmap.md

Lines changed: 39 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -22,25 +22,26 @@ under the License.
2222
Comet is an open-source project and contributors are welcome to work on any issues at any time, but we find it
2323
helpful to have a roadmap for some of the major items that require coordination between contributors.
2424

25-
## Major Initiatives
25+
## Window Expressions
2626

27-
### Iceberg Integration
27+
Native window execution is currently disabled by default due to known correctness issues ([#2721], [#2841]).
28+
In addition, dedicated window functions such as `rank`, `dense_rank`, `row_number`, `lag`, `lead`, `ntile`,
29+
`cume_dist`, `percent_rank`, and `nth_value` are not yet implemented and fall back to Spark ([#2705]). The
30+
goal is to enable windowed aggregates by default ([#4007]) and add the missing dedicated window functions.
2831

29-
Reads of Iceberg tables with Parquet data files are fully native and enabled by default, powered by a scan operator
30-
backed by Iceberg-rust ([#2528]). We anticipate major improvements in the next few releases, including bringing Iceberg table format V3 features (_e.g._,
31-
encryption) to the reader.
32+
[#2705]: https://github.com/apache/datafusion-comet/issues/2705
33+
[#2721]: https://github.com/apache/datafusion-comet/issues/2721
34+
[#2841]: https://github.com/apache/datafusion-comet/issues/2841
35+
[#4007]: https://github.com/apache/datafusion-comet/issues/4007
3236

33-
[#2528]: https://github.com/apache/datafusion-comet/pull/2528
37+
## Lambda Expressions
3438

35-
### Spark 4.0 Support
39+
Spark supports higher-order functions on arrays and maps that take a lambda, including `transform`, `exists`,
40+
`forall`, `aggregate`, `zip_with`, `map_filter`, and `map_zip_with`. Comet currently lacks a general mechanism
41+
for serializing lambda expressions and evaluating them in DataFusion. Adding this capability will unlock a
42+
significant family of Spark expressions in one effort.
3643

37-
Comet has experimental support for Spark 4.0, but there is more work to do ([#1637]), such as enabling
38-
more Spark SQL tests and fully implementing ANSI support ([#313]) for all supported expressions.
39-
40-
[#313]: https://github.com/apache/datafusion-comet/issues/313
41-
[#1637]: https://github.com/apache/datafusion-comet/issues/1637
42-
43-
### Dynamic Partition Pruning
44+
## Dynamic Partition Pruning
4445

4546
Both Iceberg table scans and Parquet V1 native scans (`CometNativeScanExec`) support non-AQE Dynamic Partition Pruning
4647
(DPP) filters generated by Spark's `PlanDynamicPruningFilters` optimizer rule ([#3349], [#3511]). However, Spark's
@@ -51,11 +52,29 @@ requires a redesign of Comet's plan translation. This effort can be tracked at [
5152
[#3510]: https://github.com/apache/datafusion-comet/issues/3510
5253
[#3511]: https://github.com/apache/datafusion-comet/pull/3511
5354

54-
## Ongoing Improvements
55+
## TPC-H and TPC-DS Performance
56+
57+
We regularly publish benchmark results derived from TPC-H and TPC-DS to track performance against Spark. Closing
58+
the remaining gaps and increasing the speedup on both benchmark suites is an ongoing focus, tracked under [#2004]
59+
(TPC-H), [#858] (TPC-DS), and [#3799] (improving the awslabs published TPC-DS results).
60+
61+
[#858]: https://github.com/apache/datafusion-comet/issues/858
62+
[#2004]: https://github.com/apache/datafusion-comet/issues/2004
63+
[#3799]: https://github.com/apache/datafusion-comet/issues/3799
64+
65+
## Upstream Work in DataFusion
66+
67+
A growing number of Spark-compatible expressions live in the `datafusion-spark` crate in the core DataFusion
68+
repository. Comet is migrating its expression implementations to that crate so that they can be shared by other
69+
DataFusion-based projects, tracked in [#2084]. Improvements to core DataFusion operators (joins, aggregates,
70+
window) made in support of Comet also benefit the wider ecosystem.
71+
72+
[#2084]: https://github.com/apache/datafusion-comet/issues/2084
73+
74+
## Native Parquet Writes
5575

56-
In addition to the major initiatives above, we have the following ongoing areas of work:
76+
Comet has experimental support for native Parquet writes via `InsertIntoHadoopFsRelationCommand`, currently
77+
disabled by default. The goal is to reach correctness and performance parity with Spark's writer so it can be
78+
enabled by default ([#1625]).
5779

58-
- Adding support for more Spark expressions
59-
- Moving more expressions to the `datafusion-spark` crate in the core DataFusion repository
60-
- Performance tuning
61-
- Nested type support improvements
80+
[#1625]: https://github.com/apache/datafusion-comet/issues/1625

0 commit comments

Comments
 (0)