Skip to content

Commit 5d76402

Browse files
committed
source-airtable-native: initial connector
This commit introduces an initial implementation of a native Airtable capture connector. See the README.md for notable API features and connector design decisions.
1 parent 96d2d6d commit 5d76402

File tree

14 files changed

+5914
-0
lines changed

14 files changed

+5914
-0
lines changed

source-airtable-native/README.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# `source-airtable-native`
2+
3+
`source-airtable-native` is a capture connector built with the `estuary-cdk` for capturing data from Airtable bases and tables. This README is intended to document non-standard / non-obvious API behavior and connector design decisions.
4+
5+
## Notable API Features and Behaviors
6+
7+
### Formula Field Errors
8+
9+
Airtable formula fields can produce errors such as circular references, NaN values, or divide-by-zero errors. When this happens, the API returns an object with a `specialValue` property instead of the expected result:
10+
11+
```json
12+
{
13+
"myFormulaField": {
14+
"specialValue": "NaN"
15+
}
16+
}
17+
```
18+
19+
Reference: https://support.airtable.com/docs/common-formula-errors-and-how-to-fix-them
20+
21+
### Pagination Behavior
22+
23+
The Airtable API uses offset-based pagination with a page size of 100 records. Pagination uses snapshot-based ordering: the sort order is frozen at the time of the first request, but field values reflect the most recent modifications including those made after the first request was made. Records modified after pagination starts keep their original position in the returned result set, but the record's fields will contain the most recent values. This means that when requesting records in sorted order of a `lastModifiedTime` field, records may not be strictly sorted; any records modified after the first request will have a `lastModifiedTime` field value that's after the first request was submitted.
24+
25+
For example, if the current time is 11:30 and we're slowly paginating records modified between 11:00 and 11:30, a record originally at position 50 that gets modified mid-pagination will still be at position 50, but its `lastModifiedTime` may be 11:31.
26+
27+
---
28+
29+
## Connector Design Decisions
30+
31+
### Full Refresh vs. Incremental Resources
32+
33+
The connector uses the presence of a specific type of field to determine if a table can be captured incrementally.
34+
35+
Valid incremental cursor fields meet the following criteria:
36+
- The field type is `lastModifiedTime`.
37+
- The result type is `dateTime`.
38+
- The referenced field IDs array is empty, which means that all field changes are tracked.
39+
40+
Airtable automatically updates these types of `lastModifiedTime` fields whenever a change is made to any non-formula fields. Airtable does not create these fields by default, so users must explicitly create an appropriate `lastModifiedTime` field in order for the connector to incrementally capture from a given table.
41+
42+
If no valid incremental cursor field exists, the connector uses full refreshes to re-capture all records in a table.
43+
44+
### Incremental Strategy
45+
46+
For incremental resources, the connector uses the [`filterByFormula` query parameter](https://airtable.com/developers/web/api/list-records#query-filterbyformula) along with the `IS_AFTER` and `IS_BEFORE` functions to query for records updated in a specific time window. Due to the [API's pagination behavior](#pagination-behavior), it's possible for records to be returned that have been modified after the `IS_BEFORE` upper bound. These records are ignored in the current incremental sweep since they'll be captured in a later incremental sweep.
47+
48+
### Formula Fields
49+
50+
#### Scheduled Refreshes
51+
52+
Formula fields present a challenge for incremental replication: when underlying field dependencies change, the formula result updates but the record's `lastModifiedTime` does not change.
53+
54+
The connector supports scheduled formula field refreshes via a configurable cron expression. When the schedule triggers, the binding performs a backfill that only captures the formula fields and cursor field for the table's current contents. `merge` reduction strategies are then used to ensure materializations read complete documents from the captured collections instead of the partial documents from formula field refreshes.
55+
56+
#### Omitting Errors
57+
58+
If a formula results in an error, formula fields can contain an object with a `specialValue` field describing the error instead of the usual scalar formula result. Allowing these `specialValue` errors into captured collections would widen the inferred schema, making formula fields appear as either their usual scalar type or an object. Widening the inferred schema for these errors likely isn't what users would want, as resetting the inferred schema would require a collection reset and discarding all previously captured documents. The connector filters out fields containing an object with a `specialValue` field before emitting any documents to prevent formula field errors from widening the inferred schema.
59+
60+
### Resource Naming
61+
62+
Resources are named using the format: `{base_name}/{table_name}/{table_id}`. The `base_name` and `table_name` are not stable since users can rename either, but including them in the resource name makes it clearer to users which resources are capturing from which bases and tables.
63+
64+
This means that if users rename a base or a table, the connector will detect the renamed bases and tables as brand new resources, stop populating collections using the old base and table names, and start populating collections with the new names.

source-airtable-native/VERSION

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
v1

0 commit comments

Comments
 (0)