1919
2020# Upgrade Guides
2121
22+ ## DataFusion ` 47.0.0 `
23+
24+ This section calls out some of the major changes in the ` 47.0.0 ` release of DataFusion.
25+
26+ Here are some example upgrade PRs that demonstrate changes required when upgrading from DataFusion 46.0.0:
27+
28+ - [ delta-rs Upgrade to ` 47.0.0 ` ] ( https://github.com/delta-io/delta-rs/pull/3378 )
29+ - [ DataFusion Comet Upgrade to ` 47.0.0 ` ] ( https://github.com/apache/datafusion-comet/pull/1563 )
30+ - [ Sail Upgrade to ` 47.0.0 ` ] ( https://github.com/lakehq/sail/pull/434 )
31+
32+ ### Upgrades to ` arrow-rs ` and ` arrow-parquet ` 55.0.0 and ` object_store ` 0.12.0
33+
34+ Several APIs are changed in the underlying arrow and parquet libraries to use a
35+ ` u64 ` instead of ` usize ` to better support WASM (See [ #7371 ] and [ #6961 ] )
36+
37+ Additionally ` ObjectStore::list ` and ` ObjectStore::list_with_offset ` have been changed to return ` static ` lifetimes (See [ #6619 ] )
38+
39+ [ #6619 ] : https://github.com/apache/arrow-rs/pull/6619
40+ [ #7371 ] : https://github.com/apache/arrow-rs/pull/7371
41+ [ #7328 ] : https://github.com/apache/arrow-rs/pull/6961
42+
43+ This requires converting from ` usize ` to ` u64 ` occasionally as well as changes to ` ObjectStore ` implementations such as
44+
45+ ``` rust
46+ # /* comment to avoid running
47+ impl Objectstore {
48+ ...
49+ // The range is now a u64 instead of usize
50+ async fn get_range(&self, location: &Path, range: Range<u64>) -> ObjectStoreResult<Bytes> {
51+ self.inner.get_range(location, range).await
52+ }
53+ ...
54+ // the lifetime is now 'static instead of `_ (meaning the captured closure can't contain references)
55+ // (this also applies to list_with_offset)
56+ fn list(&self, prefix: Option<&Path>) -> BoxStream<'static, ObjectStoreResult<ObjectMeta>> {
57+ self.inner.list(prefix)
58+ }
59+ }
60+ # */
61+ ```
62+
63+ The ` ParquetObjectReader ` has been updated to no longer require the object size
64+ (it can be fetched using a single suffix request). See [ #7334 ] for details
65+
66+ [ #7334 ] : https://github.com/apache/arrow-rs/pull/7334
67+
68+ Pattern in DataFusion ` 46.0.0 ` :
69+
70+ ``` rust
71+ # /* comment to avoid running
72+ let meta: ObjectMeta = ...;
73+ let reader = ParquetObjectReader::new(store, meta);
74+ # */
75+ ```
76+
77+ Pattern in DataFusion ` 47.0.0 ` :
78+
79+ ``` rust
80+ # /* comment to avoid running
81+ let meta: ObjectMeta = ...;
82+ let reader = ParquetObjectReader::new(store, location)
83+ .with_file_size(meta.size);
84+ # */
85+ ```
86+
87+ ### ` DisplayFormatType::TreeRender `
88+
89+ DataFusion now supports [ ` tree ` style explain plans] . Implementations of
90+ ` Executionplan ` must also provide a description in the
91+ ` DisplayFormatType::TreeRender ` format. This can be the same as the existing
92+ ` DisplayFormatType::Default ` .
93+
94+ [ `tree` style explain plans ] : https://datafusion.apache.org/user-guide/sql/explain.html#tree-format-default
95+
96+ ### Removed Deprecated APIs
97+
98+ Several APIs have been removed in this release. These were either deprecated
99+ previously or were hard to use correctly such as the multiple different
100+ ` ScalarUDFImpl::invoke* ` APIs. See [ #15130 ] , [ #15123 ] , and [ #15027 ] for more
101+ details.
102+
103+ [ #15130 ] : https://github.com/apache/datafusion/pull/15130
104+ [ #15123 ] : https://github.com/apache/datafusion/pull/15123
105+ [ #15027 ] : https://github.com/apache/datafusion/pull/15027
106+
107+ ## ` FileScanConfig ` --> ` FileScanConfigBuilder `
108+
109+ Previously, ` FileScanConfig::build() ` directly created ExecutionPlans. In
110+ DataFusion 47.0.0 this has been changed to use ` FileScanConfigBuilder ` . See
111+ [ #15352 ] for details.
112+
113+ [ #15352 ] : https://github.com/apache/datafusion/pull/15352
114+
115+ Pattern in DataFusion ` 46.0.0 ` :
116+
117+ ``` rust
118+ # /* comment to avoid running
119+ let plan = FileScanConfig::new(url, schema, Arc::new(file_source))
120+ .with_statistics(stats)
121+ ...
122+ .build()
123+ # */
124+ ```
125+
126+ Pattern in DataFusion ` 47.0.0 ` :
127+
128+ ``` rust
129+ # /* comment to avoid running
130+ let config = FileScanConfigBuilder::new(url, schema, Arc::new(file_source))
131+ .with_statistics(stats)
132+ ...
133+ .build();
134+ let scan = DataSourceExec::from_data_source(config);
135+ # */
136+ ```
137+
22138## DataFusion ` 46.0.0 `
23139
24140### Use ` invoke_with_args ` instead of ` invoke() ` and ` invoke_batch() `
@@ -39,7 +155,7 @@ below. See [PR 14876] for an example.
39155Given existing code like this:
40156
41157``` rust
42- # /*
158+ # /* comment to avoid running
43159impl ScalarUDFImpl for SparkConcat {
44160...
45161 fn invoke_batch(&self, args: &[ColumnarValue], number_rows: usize) -> Result<ColumnarValue> {
@@ -59,7 +175,7 @@ impl ScalarUDFImpl for SparkConcat {
59175To
60176
61177``` rust
62- # /* comment out so they don't run
178+ # /* comment to avoid running
63179impl ScalarUDFImpl for SparkConcat {
64180 ...
65181 fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
0 commit comments