Skip to content

Commit 312dbba

Browse files
authored
Docs: Update info on field selection (#2355)
* docs: field selection and customization updates * modify field selection ui instructions * remove explicit field selection/rejection criteria
1 parent 2ae40ee commit 312dbba

File tree

3 files changed

+57
-44
lines changed

3 files changed

+57
-44
lines changed
Lines changed: 56 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
1-
# Customize materialized fields
1+
# Customize Materialized Fields
2+
3+
Estuary Flow can auto-discover data resources and schemas, and implements a priority-based system that intelligently selects fields to materialize.
4+
However, you may wish to override these defaults to customize the final format of your materialized tables.
5+
For example, columns you require may be missing or may need specific names to work with downstream systems.
6+
Or you might wish to keep columns with sensitive data from materializing entirely.
27

3-
When you first materialize a collection to an endpoint like a database or data warehouse,
4-
the resulting table columns might not be formatted how you want.
5-
You might notice missing columns, extra columns, or columns with names you don't like.
68
This happens when the collection's JSON schema doesn't map to a table schema appropriate for your use case.
79

810
You can control the shape and appearance of materialized tables using a two-step process.
@@ -13,11 +15,13 @@ JSON pointers that turn locations in a document's JSON structure into custom nam
1315

1416
Then, you add the `fields` stanza to the materialization specification, telling Flow which fields to materialize.
1517

18+
You can manage both of these options through Estuary's dashboard or modify them directly in the resource specification file.
19+
1620
The following sections break down the process in more detail.
1721

1822
:::info Hint
1923
If you just need to add a field that isn't included by default and it's already present in the schema
20-
with a name you like, skip ahead to [include desired fields in your materialization](#include-desired-fields-in-your-materialization).
24+
with a name you like, skip ahead to [include desired fields in your materialization](#field-selection-for-materializations).
2125
:::
2226

2327
## Capture desired fields and generate projections
@@ -37,47 +41,50 @@ If the collection you're using was captured directly, follow these steps.
3741
1. Go to the [Captures](https://dashboard.estuary.dev/captures) page of the Flow web app
3842
and locate the capture that produced the collection.
3943

40-
2. Click the **Options** button and choose **Edit Specification**.
44+
2. Select your capture and click the **Edit** button.
4145

42-
3. Under **Output Collections**, choose the binding that corresponds to the collection.
46+
3. Under **Target Collections**, choose the binding that corresponds to the collection.
4347
Then, click the **Collection** tab.
4448

4549
4. In the list of fields, look for the fields you want to materialize.
4650
If they're present and correctly named, you can skip to
47-
[including them in the materialization](#include-desired-fields-in-your-materialization).
51+
[including them in the materialization](#field-selection-for-materializations).
4852

49-
:::info hint:
53+
:::info hint
5054
Compare the field name and pointer.
5155
For nested pointers, you'll probably want to change the field name to omit slashes.
5256
:::
5357

54-
5. If your desired fields aren't present or need to be re-named, edit the collection schema manually:
58+
5. If you need to change your fields, you can edit the collection schema.
59+
60+
If your desired fields aren't present and your capture does not automatically keep schemas up to date, you can edit the schema directly:
5561

5662
1. Click **Edit**.
5763

5864
2. Add missing fields to the schema in the correct location based on the source data structure.
5965

6066
3. Click **Close**.
6167

62-
6. Generate projections for new or incorrectly named fields.
68+
If you simply want to rename existing fields, you can provide alternate names for individual fields:
6369

64-
1. If available, click the **Schema Inference** button. The Schema Inference Window appears. Flow cleans up your schema and adds projections for new fields.
70+
1. In the Schema table, click the **Rename** button for the field you wish to change.
6571

66-
2. Manually change the names of projected fields. These names will be used by the materialization and shown in the endpoint system as column names or the equivalent.
72+
2. In the **Alternate Name** modal, provide the field's **New Name**.
6773

68-
3. Click **Next**.
74+
3. Click **Apply**.
6975

70-
:::info
71-
Schema Inference isn't available for all capture types.
72-
You can also add projections manually with `flowctl`.
73-
Refer to the guide to [editing with flowctl](./flowctl/edit-specification-locally.md) and
74-
[how to format projections](../concepts/collections.md#projections).
75-
:::
76+
6. Repeat steps 3 through 5 with other collections, if necessary.
7677

77-
7. Repeat steps 3 through 6 with other collections, if necessary.
78+
7. You can [backfill](../reference/backfilling-data.md) affected collections to ensure historical data is populated with your new projections.
7879

7980
8. Click **Save and Publish**.
8081

82+
:::info
83+
You can also add projections manually with `flowctl`.
84+
Refer to the guide to [editing with flowctl](./flowctl/edit-specification-locally.md) and
85+
[how to format projections](../concepts/collections.md#projections).
86+
:::
87+
8188
### Derived collections
8289

8390
If the collection you're using came from a derivation, follow these steps.
@@ -89,7 +96,7 @@ flowctl catalog pull-specs --name <yourOrg/full/collectionName>
8996
```
9097

9198
2. Review the collection's schema to see if the fields of interest are included. If they're present, you can skip to
92-
[including them in the materialization](#include-desired-fields-in-your-materialization).
99+
[including them in the materialization](#field-selection-for-materializations).
93100

94101
3. If your desired fields aren't present or are incorrectly named, add any missing fields to the schema in the correct location based on the source data structure.
95102

@@ -103,48 +110,54 @@ flowctl preview --infer-schema --source <full\path\to\flow.yaml> --collection <y
103110

104111
6. [Re-publish the collection specification](./flowctl/edit-specification-locally.md#edit-source-files-and-re-publish-specifications).
105112

106-
## Include desired fields in your materialization
113+
## Field selection for materializations
107114

108115
Now that all your fields are present in the collection schema as projections,
109116
you can choose which ones to include in the materialization.
110117

118+
Estuary automatically detects fields and uses a priority-based selection system to determine the fields to include or exclude in the materialization.
119+
120+
This means that, for each field, a stronger selection reason will override a weaker rejection reason, and vice versa.
121+
This helps ensure that critical fields get materialized.
122+
111123
Every included field will be mapped to a table column or equivalent in the endpoint system.
112124

113125
1. If you haven't created the materialization, [begin the process](./create-dataflow.md#create-a-materialization). Pause once you've selected the collections to materialize.
114126

115127
If your materialization already exists, navigate to the [edit materialization](./edit-data-flows.md#edit-a-materialization) page.
116128

117-
2. In the Collection Selector, choose the collection whose output fields you want to change. Click its **Collection** tab.
129+
2. In the Collection Selector, choose the collection whose output fields you want to change.
118130

119-
3. Review the listed fields.
131+
3. In the **Config** tab, scroll down to the **Field Selection** table.
120132

121-
In most cases, Flow automatically detects all fields to materialize, projected or otherwise. However, a projected field may still be missing, or you may want to exclude other fields.
133+
4. Review the listed fields in the field selection table.
122134

123-
By default, Estuary's recommended field selection generally includes:
124-
* **Scalars** (simple data types including strings, numbers, booleans, nulls), and
125-
* **Natively supported types** for the destination (e.g. arrays in the case of SQL destinations)
135+
Estuary checks each field against a number of selection and rejection criteria to inform the default materialized fields.
136+
You can customize this behavior further with **modes** and individual **field overrides**.
126137

127-
When dealing with objects in your data, Estuary:
128-
* **Flattens objects:** Estuary flattens nested structures and includes the scalar fields within them by default.
129-
* **Excludes top-level objects:** Top-level objects need to be explicitly selected to be included in the materialization.
138+
The field selection table will provide an **Outcome** for each field:
130139

131-
Complex data structures like nested objects and maps are excluded by default.
140+
* **Field included**: The field will be included in the materialization. Symbolized by a filled bookmark.
141+
* **Field excluded**: The field will not be included in the materialization. Symbolized by an empty bookmark.
142+
* **Conflict**: The field matches criteria for both selection and rejection.
143+
Symbolized by a warning sign. The outcome tooltip provides detailed information on the conflict.
132144

133-
4. Choose whether to start with one of Flow's field selection **modes**. You can customize individual fields later. Available modes include:
145+
5. Choose whether to start with one of Flow's field selection **modes**. You can customize individual fields later. Modes include and exclude fields based on field depth:
134146

135-
* **Select Scalars:** Include all scalar fields using the default setting
136-
* **Exclude All:** Only required fields
147+
* **Depth Zero:** Only selects top-level fields
148+
* **Depth One:** Selects object fields with one degree of nesting
149+
* **Depth Two:** Selects object fields with two degrees of nesting
150+
* **Unlimited Depth:** Selects all fields
137151

138-
5. For each individual field, you can choose one of these options:
152+
Selecting a depth limit can help prevent over-materializing complex document structures.
153+
If you don't select a mode, Estuary will default to **Depth One**.
139154

140-
* **Select:** The field is included based on the chosen mode; if the field becomes unavailable, it may be dropped silently.
141-
* **Require:** Ensure the field is materialized; Flow will raise an error if the field cannot be materialized.
142-
* **Exclude:** Prevent the field from being materialized to the destination.
155+
6. You can modify individual fields by choosing to **require** or **exclude** them.
143156

144157
![Field selection modes and individual options](./guide-images/field-selection.png)
145158

146-
6. Repeat steps 2 through 5 with other collections, if necessary.
159+
7. Repeat steps 2 through 5 with other collections, if necessary.
147160

148-
7. Click **Save and Publish**.
161+
8. Click **Save and Publish**.
149162

150-
The named, included fields will be reflected in the endpoint system.
163+
The named, included fields will be reflected in the endpoint system.
-284 KB
Loading

site/docs/reference/Connectors/materialization-connectors/amazon-redshift.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ The maximum size of a single input document is 4 MB. Attempting to materialize c
142142
documents larger than 4 MB will result in an error. To materialize this data you can use a
143143
[derivation](../../../concepts/derivations.md) to create a derived collection with smaller
144144
documents, or exclude fields containing excessive amounts of data by [customizing the materialized
145-
fields](../../../../guides/customize-materialization-fields/#include-desired-fields-in-your-materialization).
145+
fields](../../../../guides/customize-materialization-fields/#field-selection-for-materializations).
146146

147147
## Delta updates
148148

0 commit comments

Comments
 (0)