You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Estuary Flow can auto-discover data resources and schemas, and implements a priority-based system that intelligently selects fields to materialize.
4
+
However, you may wish to override these defaults to customize the final format of your materialized tables.
5
+
For example, columns you require may be missing or may need specific names to work with downstream systems.
6
+
Or you might wish to keep columns with sensitive data from materializing entirely.
2
7
3
-
When you first materialize a collection to an endpoint like a database or data warehouse,
4
-
the resulting table columns might not be formatted how you want.
5
-
You might notice missing columns, extra columns, or columns with names you don't like.
6
8
This happens when the collection's JSON schema doesn't map to a table schema appropriate for your use case.
7
9
8
10
You can control the shape and appearance of materialized tables using a two-step process.
@@ -13,11 +15,13 @@ JSON pointers that turn locations in a document's JSON structure into custom nam
13
15
14
16
Then, you add the `fields` stanza to the materialization specification, telling Flow which fields to materialize.
15
17
18
+
You can manage both of these options through Estuary's dashboard or modify them directly in the resource specification file.
19
+
16
20
The following sections break down the process in more detail.
17
21
18
22
:::info Hint
19
23
If you just need to add a field that isn't included by default and it's already present in the schema
20
-
with a name you like, skip ahead to [include desired fields in your materialization](#include-desired-fields-in-your-materialization).
24
+
with a name you like, skip ahead to [include desired fields in your materialization](#field-selection-for-materializations).
21
25
:::
22
26
23
27
## Capture desired fields and generate projections
@@ -37,47 +41,50 @@ If the collection you're using was captured directly, follow these steps.
37
41
1. Go to the [Captures](https://dashboard.estuary.dev/captures) page of the Flow web app
38
42
and locate the capture that produced the collection.
39
43
40
-
2.Click the **Options** button and choose **Edit Specification**.
44
+
2.Select your capture and click the **Edit** button.
41
45
42
-
3. Under **Output Collections**, choose the binding that corresponds to the collection.
46
+
3. Under **Target Collections**, choose the binding that corresponds to the collection.
43
47
Then, click the **Collection** tab.
44
48
45
49
4. In the list of fields, look for the fields you want to materialize.
46
50
If they're present and correctly named, you can skip to
47
-
[including them in the materialization](#include-desired-fields-in-your-materialization).
51
+
[including them in the materialization](#field-selection-for-materializations).
48
52
49
-
:::info hint:
53
+
:::info hint
50
54
Compare the field name and pointer.
51
55
For nested pointers, you'll probably want to change the field name to omit slashes.
52
56
:::
53
57
54
-
5. If your desired fields aren't present or need to be re-named, edit the collection schema manually:
58
+
5. If you need to change your fields, you can edit the collection schema.
59
+
60
+
If your desired fields aren't present and your capture does not automatically keep schemas up to date, you can edit the schema directly:
55
61
56
62
1. Click **Edit**.
57
63
58
64
2. Add missing fields to the schema in the correct location based on the source data structure.
59
65
60
66
3. Click **Close**.
61
67
62
-
6. Generate projections for new or incorrectly named fields.
68
+
If you simply want to rename existing fields, you can provide alternate names for individual fields:
63
69
64
-
1.If available, click the **Schema Inference** button. The Schema Inference Window appears. Flow cleans up your schema and adds projections for new fields.
70
+
1.In the Schema table, click the **Rename** button for the field you wish to change.
65
71
66
-
2.Manually change the names of projected fields. These names will be used by the materialization and shown in the endpoint system as column names or the equivalent.
72
+
2.In the **Alternate Name** modal, provide the field's **New Name**.
67
73
68
-
3. Click **Next**.
74
+
3. Click **Apply**.
69
75
70
-
:::info
71
-
Schema Inference isn't available for all capture types.
72
-
You can also add projections manually with `flowctl`.
73
-
Refer to the guide to [editing with flowctl](./flowctl/edit-specification-locally.md) and
74
-
[how to format projections](../concepts/collections.md#projections).
75
-
:::
76
+
6. Repeat steps 3 through 5 with other collections, if necessary.
76
77
77
-
7.Repeat steps 3 through 6 with other collections, if necessary.
78
+
7.You can [backfill](../reference/backfilling-data.md) affected collections to ensure historical data is populated with your new projections.
78
79
79
80
8. Click **Save and Publish**.
80
81
82
+
:::info
83
+
You can also add projections manually with `flowctl`.
84
+
Refer to the guide to [editing with flowctl](./flowctl/edit-specification-locally.md) and
85
+
[how to format projections](../concepts/collections.md#projections).
86
+
:::
87
+
81
88
### Derived collections
82
89
83
90
If the collection you're using came from a derivation, follow these steps.
2. Review the collection's schema to see if the fields of interest are included. If they're present, you can skip to
92
-
[including them in the materialization](#include-desired-fields-in-your-materialization).
99
+
[including them in the materialization](#field-selection-for-materializations).
93
100
94
101
3. If your desired fields aren't present or are incorrectly named, add any missing fields to the schema in the correct location based on the source data structure.
6.[Re-publish the collection specification](./flowctl/edit-specification-locally.md#edit-source-files-and-re-publish-specifications).
105
112
106
-
## Include desired fields in your materialization
113
+
## Field selection for materializations
107
114
108
115
Now that all your fields are present in the collection schema as projections,
109
116
you can choose which ones to include in the materialization.
110
117
118
+
Estuary automatically detects fields and uses a priority-based selection system to determine the fields to include or exclude in the materialization.
119
+
120
+
This means that, for each field, a stronger selection reason will override a weaker rejection reason, and vice versa.
121
+
This helps ensure that critical fields get materialized.
122
+
111
123
Every included field will be mapped to a table column or equivalent in the endpoint system.
112
124
113
125
1. If you haven't created the materialization, [begin the process](./create-dataflow.md#create-a-materialization). Pause once you've selected the collections to materialize.
114
126
115
127
If your materialization already exists, navigate to the [edit materialization](./edit-data-flows.md#edit-a-materialization) page.
116
128
117
-
2. In the Collection Selector, choose the collection whose output fields you want to change. Click its **Collection** tab.
129
+
2. In the Collection Selector, choose the collection whose output fields you want to change.
118
130
119
-
3.Review the listed fields.
131
+
3.In the **Config** tab, scroll down to the **Field Selection** table.
120
132
121
-
In most cases, Flow automatically detects all fields to materialize, projected or otherwise. However, a projected field may still be missing, or you may want to exclude other fields.
133
+
4. Review the listed fields in the field selection table.
122
134
123
-
By default, Estuary's recommended field selection generally includes:
124
-
***Scalars** (simple data types including strings, numbers, booleans, nulls), and
125
-
***Natively supported types** for the destination (e.g. arrays in the case of SQL destinations)
135
+
Estuary checks each field against a number of selection and rejection criteria to inform the default materialized fields.
136
+
You can customize this behavior further with **modes** and individual **field overrides**.
126
137
127
-
When dealing with objects in your data, Estuary:
128
-
***Flattens objects:** Estuary flattens nested structures and includes the scalar fields within them by default.
129
-
***Excludes top-level objects:** Top-level objects need to be explicitly selected to be included in the materialization.
138
+
The field selection table will provide an **Outcome** for each field:
130
139
131
-
Complex data structures like nested objects and maps are excluded by default.
140
+
***Field included**: The field will be included in the materialization. Symbolized by a filled bookmark.
141
+
***Field excluded**: The field will not be included in the materialization. Symbolized by an empty bookmark.
142
+
***Conflict**: The field matches criteria for both selection and rejection.
143
+
Symbolized by a warning sign. The outcome tooltip provides detailed information on the conflict.
132
144
133
-
4. Choose whether to start with one of Flow's field selection **modes**. You can customize individual fields later. Available modes include:
145
+
5. Choose whether to start with one of Flow's field selection **modes**. You can customize individual fields later. Modes include and exclude fields based on field depth:
134
146
135
-
***Select Scalars:** Include all scalar fields using the default setting
136
-
***Exclude All:** Only required fields
147
+
***Depth Zero:** Only selects top-level fields
148
+
***Depth One:** Selects object fields with one degree of nesting
149
+
***Depth Two:** Selects object fields with two degrees of nesting
150
+
***Unlimited Depth:** Selects all fields
137
151
138
-
5. For each individual field, you can choose one of these options:
152
+
Selecting a depth limit can help prevent over-materializing complex document structures.
153
+
If you don't select a mode, Estuary will default to **Depth One**.
139
154
140
-
***Select:** The field is included based on the chosen mode; if the field becomes unavailable, it may be dropped silently.
141
-
***Require:** Ensure the field is materialized; Flow will raise an error if the field cannot be materialized.
142
-
***Exclude:** Prevent the field from being materialized to the destination.
155
+
6. You can modify individual fields by choosing to **require** or **exclude** them.
143
156
144
157

145
158
146
-
6. Repeat steps 2 through 5 with other collections, if necessary.
159
+
7. Repeat steps 2 through 5 with other collections, if necessary.
147
160
148
-
7. Click **Save and Publish**.
161
+
8. Click **Save and Publish**.
149
162
150
-
The named, included fields will be reflected in the endpoint system.
163
+
The named, included fields will be reflected in the endpoint system.
0 commit comments