Skip to content

Commit f6aca16

Browse files
committed
docs: source-s3 iam
Documentation updates for estuary/connectors#3645
1 parent c5713b4 commit f6aca16

File tree

1 file changed

+85
-59
lines changed
  • site/docs/reference/Connectors/capture-connectors

1 file changed

+85
-59
lines changed

site/docs/reference/Connectors/capture-connectors/amazon-s3.md

Lines changed: 85 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,10 @@ This bucket or prefix must be either be:
1616

1717
* Publicly accessible and allowing anonymous reads.
1818

19-
* Accessible via a root or [IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users.html).
19+
* Accessible via a root or [IAM user][] or [IAM role][].
20+
21+
[IAM user]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users.html
22+
[IAM role]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html
2023

2124
In either case, you'll need an [access policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_controlling.html).
2225
Policies in AWS are JSON objects that define permissions. You attach them to _resources_, which include both IAM users and S3 buckets.
@@ -48,17 +51,17 @@ For a public bucket, the bucket access policy must allow anonymous reads on the
4851

4952
3. Confirm that the **Block public access** setting on the bucket is [disabled](https://docs.aws.amazon.com/AmazonS3/latest/userguide/WebsiteAccessPermissionsReqd.html).
5053

51-
### Setup: Accessing with a user account
54+
### Setup: Accessing with a user or role.
5255

53-
For buckets accessed by a user account, you'll need the AWS **access key** and **secret access key** for the user.
54-
You'll also need to apply an access policy to the user to grant access to the specific bucket or prefix.
56+
For buckets accessed by a user account, you'll need the AWS **access key** and **secret access key** for the user. For bucket access using a IAM role, you will need the **role ARN**.
57+
You'll also need to attach an access policy to the user or role to grant access to the specific bucket or prefix.
5558

56-
1. [Create an IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html) if you don't yet have one to use with Flow.
59+
1. [Create an IAM user][IAM create-user] or follow the [AWS IAM Guide](/guides/iam-auth/aws.md) to setup an IAM role and Identity Provider. If you already have a user or role for use with Estuary, it can be reused.
5760

5861
2. Note the user's access key and secret access key.
5962
See the [AWS blog](https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/) for help finding these credentials.
6063

61-
3. Create an IAM policy using the templates below.
64+
3. [Create an IAM policy][IAM create-policy] using the templates below.
6265

6366
<Tabs>
6467
<TabItem value="IAM user access policy - Full bucket" default>
@@ -75,9 +78,10 @@ See the [AWS blog](https://aws.amazon.com/blogs/security/wheres-my-secret-access
7578
</TabItem>
7679
</Tabs>
7780

78-
4. [Add the policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create-console.html#access_policies_create-json-editor) to AWS.
81+
5. [Attach the policy to the IAM user or role](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#add-policies-console).
7982

80-
5. [Attach the policy to the IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#add-policies-console).
83+
[IAM create-user]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html
84+
[IAM create-policy]: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create-console.html#access_policies_create-json-editor
8185

8286
## Configuration
8387

@@ -88,62 +92,46 @@ See [connectors](../../../concepts/connectors.md#using-connectors) to learn more
8892

8993
#### Endpoint
9094

91-
| Property | Title | Description | Type | Required/Default |
92-
|---|---|---|---|---|
93-
| `/advanced` | | Options for advanced users. You should not typically need to modify these. | object | |
94-
| `/advanced/ascendingKeys` | Ascending Keys | Improve sync speeds by listing files from the end of the last sync, rather than listing the entire bucket prefix. This requires that you write objects in ascending lexicographic order, such as an RFC-3339 timestamp, so that key ordering matches modification time ordering. If data is not ordered correctly, using ascending keys could cause errors.| boolean | `false` |
95-
| `/advanced/endpoint` | AWS Endpoint | The AWS endpoint URI to connect to. Use if you&#x27;re capturing from a S3-compatible API that isn&#x27;t provided by AWS | string | |
96-
| `/awsAccessKeyId` | AWS Access Key ID | Part of the AWS credentials that will be used to connect to S3. Required unless the bucket is public and allows anonymous listings and reads. | string | |
97-
| `/awsSecretAccessKey` | AWS Secret Access Key | Part of the AWS credentials that will be used to connect to S3. Required unless the bucket is public and allows anonymous listings and reads. | string | |
98-
| **`/bucket`** | Bucket | Name of the S3 bucket | string | Required |
99-
| `/matchKeys` | Match Keys | Filter applied to all object keys under the prefix. If provided, only objects whose absolute path matches this regex will be read. For example, you can use &quot;.&#x2A;&#x5C;.json&quot; to only capture json files. | string | |
100-
| `/parser` | Parser Configuration | Configures how files are parsed (optional, see below) | object | |
101-
| `/parser/compression` | Compression | Determines how to decompress the contents. The default, &#x27;Auto&#x27;, will try to determine the compression automatically. | null, string | `null` |
102-
| `/parser/format` | Format | Determines how to parse the contents. The default, &#x27;Auto&#x27;, will try to determine the format automatically based on the file extension or MIME type, if available. | object | `{"type":"auto"}` |
103-
| `/parser/format/type` | Type | | string | |
104-
| `/prefix` | Prefix | Prefix within the bucket to capture from. Use this to limit the data in your capture. | string | |
105-
| **`/region`** | AWS Region | The name of the AWS region where the S3 bucket is located. &quot;us-east-1&quot; is a popular default you can try, if you&#x27;re unsure what to put here. | string | Required, `"us-east-1"` |
106-
107-
#### Bindings
108-
109-
| Property | Title| Description | Type | Required/Default |
110-
|---|---|---|---|---|
111-
| **`/stream`** | Prefix | Path to dataset in the bucket, formatted as `bucket-name/prefix-name`. | string | Required |
112-
113-
### Sample
114-
115-
```yaml
116-
captures:
117-
${PREFIX}/${CAPTURE_NAME}:
118-
endpoint:
119-
connector:
120-
image: ghcr.io/estuary/source-s3:dev
121-
config:
122-
bucket: "my-bucket"
123-
parser:
124-
compression: zip
125-
format:
126-
type: csv
127-
config:
128-
delimiter: ","
129-
encoding: UTF-8
130-
errorThreshold: 5
131-
headers: [ID, username, first_name, last_name]
132-
lineEnding: "\\r"
133-
quote: "\""
134-
region: "us-east-1"
135-
bindings:
136-
- resource:
137-
stream: my-bucket/${PREFIX}
138-
target: ${PREFIX}/${COLLECTION_NAME}
139-
140-
```
95+
| Property | Title | Description | Type | Required/Default |
96+
| ------------------------- | ----------------------- | ------------------------------------------------------------------------------------------------------------- | ------- | ---------------- |
97+
| **`/region`** | AWS Region | The name of the AWS region where the S3 bucket is located. `us-east-1` is a popular default you can try, if you're unsure what to put here. | string | Required, `"us-east-1"` |
98+
| **`/bucket`** | Bucket | Name of the S3 bucket | string | Required |
99+
| `/prefix` | Prefix | Prefix within the bucket to capture from. Use this to limit the data in your capture. | string | |
100+
| `/matchKeys` | Match Keys | Filter applied to all object keys under the prefix. If provided, only objects whose absolute path matches this regex will be read. For example, you can use `.*\\.json` to only capture json files. | string | |
101+
| **`/credentials`** | Credentials | Credentials for authentication. | [Credentials](#credentials) | Required |
102+
| `/parser` | Parser Configuration | Configures how files are parsed (optional, see below) | [Parser](#parser) | |
103+
| `/parser/compression` | Compression | Determines how to decompress the contents. The default, 'Auto', will try to determine the compression automatically. | null, string | `null` |
104+
| `/parser/format` | Format | Determines how to parse the contents. The default, 'Auto', will try to determine the format automatically based on the file extension or MIME type, if available. | object | `{"type":"auto"}` |
105+
| `/parser/format/type` | Type | | string | |
106+
| `/advanced` | | Options for advanced users. You should not typically need to modify these. | object | |
107+
| `/advanced/ascendingKeys` | Ascending Keys | Improve sync speeds by listing files from the end of the last sync, rather than listing the entire bucket prefix. This requires that you write objects in ascending lexicographic order, such as an RFC-3339 timestamp, so that key ordering matches modification time ordering. If data is not ordered correctly, using ascending keys could cause errors.| boolean | `false` |
108+
| `/advanced/endpoint` | AWS Endpoint | The AWS endpoint URI to connect to. Use if you're capturing from a S3-compatible API that isn't provided by AWS | string | |
109+
110+
#### Credentials
111+
112+
Credentials for authenticating. Use one of the following sets of options:
113+
114+
| Property | Title | Description | Type | Required/Default |
115+
| ---------------------------------------- | ----------------------- | -------------------------------------------------------------- | ------- | ------------------------ |
116+
| **`/credentials/auth_type`** | Auth Type | Use `AWSAccessKey` to authenticate with a user account. | string | Required: `AWSAccessKey` |
117+
| **`/credentials/aws_access_key_id`** | AWS Access Key ID | AWS Access Key ID. | string | Required |
118+
| **`/credentials/aws_secret_access_key`** | AWS Secret Access key | AWS Secret Access Key. | string | Required |
119+
120+
| Property | Title | Description | Type | Required/Default |
121+
| ---------------------------------------- | ----------------------- | -------------------------------------------------------------- | ------- | ------------------------ |
122+
| **`/credentials/auth_type`** | Auth Type | Use `AWSIAM` to authenticate as an IAM role. | string | Required: `AWSIAM` |
123+
| **`/credentials/aws_role_arn`** | AWS Role ARN | IAM Role to assume. | string | Required |
124+
| **`/credentials/aws_region`** | AWS Region | AWS Region to authenticate in. | string | Required |
125+
126+
| Property | Title | Description | Type | Required/Default |
127+
| ---------------------------------------- | ----------------------- | -------------------------------------------------------------- | ------- | ------------------------ |
128+
| **`/credentials/auth_type`** | Auth Type | Use `AWSAnonymous` to do anonymous authenciation | string | Required: `AWSAnonymous` |
141129

142130
Your capture definition may be more complex, with additional bindings for different S3 prefixes within the same bucket.
143131

144132
[Learn more about capture definitions.](../../../concepts/captures.md)
145133

146-
### Advanced: Parsing cloud storage data
134+
#### Parser
147135

148136
Cloud storage platforms like S3 can support a wider variety of file types
149137
than other data source systems. For each of these file types, Flow must parse
@@ -230,3 +218,41 @@ but you may need to specify for unusual datasets. These properties are:
230218
* Auto
231219

232220
The sample specification [above](#sample) includes these fields.
221+
222+
#### Bindings
223+
224+
| Property | Title| Description | Type | Required/Default |
225+
|---|---|---|---|---|
226+
| **`/stream`** | Prefix | Path to dataset in the bucket, formatted as `bucket-name/prefix-name`. | string | Required |
227+
228+
### Sample
229+
230+
```yaml
231+
captures:
232+
${PREFIX}/${CAPTURE_NAME}:
233+
endpoint:
234+
connector:
235+
image: ghcr.io/estuary/source-s3:dev
236+
config:
237+
bucket: "my-bucket"
238+
region: "us-east-1"
239+
credentials:
240+
auth_type: "AWSAccessKey"
241+
aws_access_key_id: "example-aws-access-key-id"
242+
aws_secret_access_key: "example-aws-secret-access-key"
243+
parser:
244+
compression: zip
245+
format:
246+
type: csv
247+
config:
248+
delimiter: ","
249+
encoding: UTF-8
250+
errorThreshold: 5
251+
headers: [ID, username, first_name, last_name]
252+
lineEnding: "\\r"
253+
quote: "\""
254+
bindings:
255+
- resource:
256+
stream: my-bucket/${PREFIX}
257+
target: ${PREFIX}/${COLLECTION_NAME}
258+
```

0 commit comments

Comments
 (0)