Commit efc5ac5
authored
[Datasets] [Out-of-Band Serialization: 1/3] Refactor
This PR refactors `LazyBlockList` in service of out-of-band serialization (see [mono-PR](ray-project#22616)) and is a precursor to an execution plan refactor (PR #2) and adding the actual out-of-band serialization APIs (PR #3). The following is included in this refactor:
1. `ReadTask`s are now a first-class concept, replacing calls;
2. read stage progress tracking is consolidated into `LazyBlockList._get_blocks_with_metadta()` and more of the read task complexity, e.g. the read remote function, was pushed into `LazyBlockList` to make `ray.data.read_datasource()` simpler;
3. we are a bit smarter with how we progressively launch tasks and fetch and cache metadata, including fetching the metadata for read tasks in `.iter_blocks_with_metadata()` instead of relying on the pre-read task metadata (which will be less accurate), and we also fix some small bugs in the lazy ramp-up around progressive metadata fetching.
(1) is the most important item for supporting out-of-band serialization and fundamentally changes the `LazyBlockList` data model. This is required since we need to be able to reference the underlying read tasks when rewriting read stages during optimization and when serializing the lineage of the Dataset. See the [mono-PR](ray-project#22616) for more context.
Other changes:
1. Changed stats actor to a global named actor singleton in order to obviate the need for serializing the actor handle with the Dataset stats; without this, we were encountering serialization failures.LazyBlockList. (ray-project#23821)1 parent d96ac25 commit efc5ac5
File tree
9 files changed
+454
-195
lines changed- python/ray/data
- impl
- tests
9 files changed
+454
-195
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| 60 | + | |
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
| |||
988 | 989 | | |
989 | 990 | | |
990 | 991 | | |
991 | | - | |
992 | | - | |
993 | | - | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
994 | 995 | | |
995 | 996 | | |
996 | 997 | | |
997 | 998 | | |
998 | 999 | | |
999 | | - | |
1000 | | - | |
1001 | | - | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
1002 | 1003 | | |
1003 | | - | |
1004 | | - | |
| 1004 | + | |
1005 | 1005 | | |
1006 | | - | |
| 1006 | + | |
1007 | 1007 | | |
1008 | 1008 | | |
1009 | 1009 | | |
1010 | | - | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
1011 | 1014 | | |
1012 | 1015 | | |
1013 | 1016 | | |
| |||
1028 | 1031 | | |
1029 | 1032 | | |
1030 | 1033 | | |
1031 | | - | |
| 1034 | + | |
| 1035 | + | |
1032 | 1036 | | |
1033 | 1037 | | |
1034 | 1038 | | |
| |||
2548 | 2552 | | |
2549 | 2553 | | |
2550 | 2554 | | |
| 2555 | + | |
2551 | 2556 | | |
2552 | 2557 | | |
2553 | 2558 | | |
| |||
2666 | 2671 | | |
2667 | 2672 | | |
2668 | 2673 | | |
| 2674 | + | |
2669 | 2675 | | |
2670 | 2676 | | |
2671 | 2677 | | |
| |||
2749 | 2755 | | |
2750 | 2756 | | |
2751 | 2757 | | |
2752 | | - | |
2753 | | - | |
2754 | | - | |
| 2758 | + | |
| 2759 | + | |
| 2760 | + | |
| 2761 | + | |
2755 | 2762 | | |
2756 | 2763 | | |
2757 | | - | |
| 2764 | + | |
2758 | 2765 | | |
2759 | 2766 | | |
2760 | 2767 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
3 | | - | |
4 | | - | |
5 | | - | |
| 2 | + | |
6 | 3 | | |
7 | 4 | | |
8 | 5 | | |
9 | 6 | | |
10 | 7 | | |
11 | | - | |
12 | | - | |
| 8 | + | |
13 | 9 | | |
14 | 10 | | |
15 | 11 | | |
| |||
26 | 22 | | |
27 | 23 | | |
28 | 24 | | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
| 25 | + | |
34 | 26 | | |
35 | 27 | | |
36 | 28 | | |
| |||
182 | 174 | | |
183 | 175 | | |
184 | 176 | | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
0 commit comments