You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/source/guide/security.md
+8-75
Original file line number
Diff line number
Diff line change
@@ -120,92 +120,25 @@ Once Label Studio tasks are created, users can view and edit tasks in their brow
120
120
121
121
#### Source storage behind your VPC
122
122
123
-
!!! warning Google Cloud Storage
124
-
Google Cloud Storage does **not** support IP or VPN restrictions for pre-signed URLs, making this approach infeasible for GCS. As an alternative security measure for GCS, you can use **signed URLs with short lifetimes**.
123
+
To ensure maximum security and isolation of your data behind a VPC, only allow access to the Label Studio backend and users within your internal network. To do this, you can use the following technique — especially effective with Label Studio SaaS (Cloud, `app.humansignal.com`):
125
124
126
-
To ensure maximum security and isolation of your data behind a VPC, only allow access to users within your VPC. To do this, you can use the following technique — especially effective with Label Studio SaaS (Cloud,`app.humansignal.com`) and AWS S3:
125
+
1. Set **IP restrictions** for your storage to **allow Label Studio to perform task synchronization and generate pre-signed URLs** for media file serving. IP restrictions enhance security by ensuring that only trusted networks can access your storage. GET (`s3:GetObject` for S3) and LIST (`s3:ListBucket` for S3) permissions are required. <spanclass="enterprise-only">The IP ranges for`app.humansignal.com` can be found in the documentation [here](saas#IP-range).</span>
127
126
128
-
1. Set **IP restrictions** for your S3 storage to allow Label Studio to perform task synchronization and generate pre-signed URLs for media file serving. IP restrictions enhance security by ensuring that only trusted networks can access your storage. GET (`s3:GetObject`) and LIST (`s3:ListBucket`) permissions are required. <spanclass="enterprise-only">The IP ranges for `app.humansignal.com` can be found in the documentation [here](saas#IP-range).</span>
127
+
2.**Establish secure connection** between Storage and Users' Browsers:
128
+
- Configure a VPC private endpoint and route VPN traffic to it so that users' browsers can securely access the S3 bucket using only your Virtual Private Network (VPN).
129
+
- Or limit your storage access to certain IPs or VPCs.
129
130
130
-
2.**Establish your VPC Connection** between S3 Storage and Users' Browsers:
131
-
132
-
Configure your network so that users' browsers can access the S3 bucket securely within your Virtual Private Cloud (VPC). This ensures that data transmission occurs over a private network, enhancing security by preventing exposure to the public internet. Administrators can set up this connection using AWS VPC endpoints or other networking configurations within their infrastructure.
133
-
134
-
**Helpful Resources**:
135
-
-[AWS Documentation: VPC Endpoints for Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html)
136
-
-[AWS Documentation: How to Configure VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/privatelink/endpoint-services-overview.html)
137
-
138
-
<details>
139
-
<summary>Bucket Policy Example for S3 storage</summary>
140
-
141
-
!!! warning
142
-
These example bucket policies explicitly deny access to any requests outside the allowed IP addresses. Even the user that entered the bucket policy can be denied access to the bucket if the user doesn't meet the conditions. Therefore, make sure to review the bucket policy carefully before saving it. If you get accidentally locked out, see [How to regain access to an Amazon S3 bucket](https://repost.aws/knowledge-center/s3-accidentally-denied-access).
143
-
144
-
Go to your S3 bucket and then **Permissions > Bucket Policy** in the AWS management console. Add the following policy:
//// IP ranges for app.humansignal.com from the documentation
168
-
"x.x.x.x/32",
169
-
"x.x.x.x/32",
170
-
"x.x.x.x/32"
171
-
]
172
-
}
173
-
}
174
-
},
175
-
//// Optional
176
-
{
177
-
"Sid": "DenyAccessUnlessFromVPNForGetObject",
178
-
"Effect": "Deny",
179
-
"Principal": "*",
180
-
"Action": "s3:GetObject",
181
-
"Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*",
182
-
"Condition": {
183
-
"NotIpAddress": {
184
-
"aws:SourceIp": "YOUR_VPN_SUBNET/32"
185
-
}
186
-
}
187
-
}
188
-
]
189
-
}
190
-
```
191
-
</details>
131
+
**Configuration examples:**
132
+
-[AWS S3 Storage: IP Filtering and VPN for Enhanced Security](storage#IP-Filtering-and-VPN-for-Enhanced-Security-for-S3-storage).
133
+
-[Google Cloud Storage: IP Filtering for Enhanced Security](storage#IP-Filtering-for-Enhanced-Security-for-GCS-storage).
192
134
193
135
<i>This image shows how you can securely configure source cloud storages with Label Studio using your VPC and IP restrictions</i>
194
136
195
137
<imgwidth="49%"style="display: inline-block; margin-right: 5px;"src="/images/storages/cloud-storage-ip-restriction.jpg"alt="Label Studio + Cloud Storage IP Restriction"class="make-intense-zoom" />
196
138
197
139
<imgwidth="49%"style="display: inline-block;"src="/images/storages/cloud-storage-vpn.jpg"alt="Label Studio + Cloud Storage VPC"class="make-intense-zoom" />
198
140
199
-
#### Additional Notes
200
-
201
-
**Google ADC**: If you use Label Studio on-premises with Google Cloud Storage, you can set up [Application Default Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc) to provide cloud storage authentication globally for all projects, so users do not need to configure credentials manually.
202
-
203
-
**AWS S3 IAM**: In Label Studio Enterprise, you can use an IAM role configured with an external ID to access S3 bucket contents securely. An 'external ID' is a unique identifier that enhances security by ensuring that only trusted entities can assume the role, reducing the risk of unauthorized access. <spanclass="enterprise-only">See [Set up an S3 connection with IAM role access](storage#Set-up-an-S3-connection-with-IAM-role-access)</span>
204
-
205
-
**Storage Regions**: To minimize latency and improve efficiency, store data in cloud storage buckets that are geographically closer to your team rather than near the Label Studio server.
206
141
207
-
!!! note More details on Cloud Storages
208
-
See more details on [Source storage Sync and URI resolving](storage#Source-storage-Sync-and-URI-resolving).
Copy file name to clipboardexpand all lines: docs/source/guide/storage.md
+88-2
Original file line number
Diff line number
Diff line change
@@ -27,6 +27,7 @@ When working with an external cloud storage connection, keep the following in mi
27
27
* Label Studio doesn't import the data stored in the bucket, but instead creates *references* to the objects. Therefore, you must have full access control on the data to be synced and shown on the labeling screen.
28
28
* Sync operations with external buckets only goes one way. It either creates tasks from objects on the bucket (Source storage) or pushes annotations to the output bucket (Target storage). Changing something on the bucket side doesn't guarantee consistency in results.
29
29
* We recommend using a separate bucket folder for each Label Studio project.
30
+
* Storage Regions: To minimize latency and improve efficiency, store data in cloud storage buckets that are geographically closer to your team rather than near the Label Studio server.
30
31
31
32
<divclass="opensource-only">
32
33
@@ -282,6 +283,14 @@ After you [configure access to your S3 bucket](#Configure-access-to-your-S3-buck
282
283
283
284
After adding the storage, click **Sync** to collect tasks from the bucket, or make an API call to [sync export storage](https://api.labelstud.io/api-reference/api-reference/export-storage/s-3/sync)
284
285
286
+
<divclass="opensource-only">
287
+
288
+
### S3 connection with IAM role access
289
+
290
+
In Label Studio Enterprise, you can use an IAM role configured with an external ID to access S3 bucket contents securely. An 'external ID' is a unique identifier that enhances security by ensuring that only trusted entities can assume the role, reducing the risk of unauthorized access. See how to [Set up an S3 connection with IAM role access](https://docs.humansignal.com/guide/storage#Set-up-an-S3-connection-with-IAM-role-access)</span> in the Enterprise documentation.
291
+
292
+
</div>
293
+
285
294
<divclass="enterprise-only">
286
295
287
296
### Set up an S3 connection with IAM role access
@@ -416,6 +425,72 @@ You can also create a storage connection using the Label Studio API.
416
425
- See [Create new import storage](/api#operation/api_storages_s3_create) then [sync the import storage](/api#operation/api_storages_s3_sync_create).
417
426
- See [Create export storage](/api#operation/api_storages_export_s3_create) and after annotating, [sync the export storage](/api#operation/api_storages_export_s3_sync_create).
418
427
428
+
### IP Filtering and VPN for Enhanced Security for S3 storage
429
+
430
+
To maximize security and data isolation behind a VPC, restrict access to the Label Studio backend and internal network users by setting IP restrictions for storage, allowing only trusted networks to perform task synchronization and generate pre-signed URLs. Additionally, establish a secure connection between storage and users' browsers by configuring a VPC private endpoint or limiting storage access to specific IPs or VPCs.
431
+
432
+
Read more about [Source storage behind your VPC](security.html#Source-storage-behind-your-VPC).
433
+
434
+
<details>
435
+
<summary>Bucket Policy Example for S3 storage</summary>
436
+
<br>
437
+
438
+
!!! warning
439
+
These example bucket policies explicitly deny access to any requests outside the allowed IP addresses. Even the user that entered the bucket policy can be denied access to the bucket if the user doesn't meet the conditions. Therefore, make sure to review the bucket policy carefully before saving it. If you get accidentally locked out, see [How to regain access to an Amazon S3 bucket](https://repost.aws/knowledge-center/s3-accidentally-denied-access).
440
+
441
+
**Helpful Resources**:
442
+
-[AWS Documentation: VPC Endpoints for Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html)
443
+
-[AWS Documentation: How to Configure VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/privatelink/endpoint-services-overview.html)
444
+
445
+
Go to your S3 bucket and then **Permissions > Bucket Policy** in the AWS management console. Add the following policy:
//// IP ranges for app.humansignal.com from the documentation
469
+
"x.x.x.x/32",
470
+
"x.x.x.x/32",
471
+
"x.x.x.x/32"
472
+
]
473
+
}
474
+
}
475
+
},
476
+
//// Optional
477
+
{
478
+
"Sid": "DenyAccessUnlessFromVPNForGetObject",
479
+
"Effect": "Deny",
480
+
"Principal": "*",
481
+
"Action": "s3:GetObject",
482
+
"Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*",
483
+
"Condition": {
484
+
"NotIpAddress": {
485
+
"aws:SourceIp": "YOUR_VPN_SUBNET/32"
486
+
}
487
+
}
488
+
}
489
+
]
490
+
}
491
+
```
492
+
</details>
493
+
419
494
## Google Cloud Storage
420
495
421
496
Dynamically import tasks and export annotations to Google Cloud Storage (GCS) buckets in Label Studio. For details about how Label Studio secures access to cloud storage, see [Secure access to cloud storage](security.html/#Secure-access-to-cloud-storage).
@@ -472,17 +547,21 @@ You can also create a storage connection using the Label Studio API.
472
547
- See [Create export storage](/api#operation/api_storages_export_gcs_create) and after annotating, [sync the export storage](/api#operation/api_storages_export_gcs_sync_create).
473
548
474
549
475
-
### IP Filtering for Enhanced Security
550
+
### IP Filtering for Enhanced Security for GCS storage
476
551
477
552
Google Cloud Storage offers [bucket IP filtering](https://cloud.google.com/storage/docs/ip-filtering-overview) as a powerful security mechanism to restrict access to your data based on source IP addresses. This feature helps prevent unauthorized access and provides fine-grained control over who can interact with your storage buckets.
478
553
554
+
Read more about [Source storage behind your VPC](security.html#Source-storage-behind-your-VPC).
555
+
479
556
**Common Use Cases:**
480
557
- Restrict bucket access to only your organization's IP ranges
481
558
- Allow access only from specific VPC networks in your infrastructure
482
559
- Secure sensitive data by limiting access to known IP addresses
483
560
- Control access for third-party integrations by whitelisting their IPs
484
561
485
-
**How to Set Up IP Filtering:**
562
+
<details>
563
+
<summary>How to Set Up IP Filtering</summary>
564
+
<br>
486
565
487
566
1. First, create your GCS bucket through the console or CLI
488
567
2. Create a JSON configuration file to define IP filtering rules. You have two options:
[Read more about GCS IP filtering](https://cloud.google.com/storage/docs/ip-filtering-overview)
545
624
625
+
</details>
626
+
627
+
#### Application Default Credentials as Advanced Security Approach
628
+
629
+
**Google ADC**: If you use Label Studio on-premises with Google Cloud Storage, you can set up [Application Default Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc) to provide cloud storage authentication globally for all projects, so users do not need to configure credentials manually.
630
+
631
+
546
632
## Microsoft Azure Blob storage
547
633
548
634
Connect your [Microsoft Azure Blob storage](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) container with Label Studio. For details about how Label Studio secures access to cloud storage, see [Secure access to cloud storage](security.html#Secure-access-to-cloud-storage).
0 commit comments