You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/source/guide/security.md
+112-9
Original file line number
Diff line number
Diff line change
@@ -86,23 +86,126 @@ All specific object properties that are exposed with a REST API are added to an
86
86
87
87
The PostgreSQL database has SSL mode enabled and requires valid certificates.
88
88
89
-
### Secure access to cloud storage
89
+
### Secure access to cloud storages
90
90
91
-
When using Label Studio, users don't have direct access to cloud storage. Objects are retrieved from and stored in cloud storage buckets according to the [cloud storage settings](/guide/storage.html) for each project.
91
+
Each project in Label Studio can be linked to various cloud storage options such as AWS S3, Google Cloud Storage, and others. Users primarily access files from cloud storage through pre-signed URLs generated by Label Studio. You can configure multiple cloud storage connections per project with different credentials to manage data access. Learn how to set up [cloud storage settings](storage).
92
92
93
-
Label Studio accesses the data stored in remote cloud storage using URLs, so place the data in cloud storage buckets near where your team works, rather than near where you host Label Studio.
93
+
Combine workspaces, projects, users, and roles. This approach helps configure and secure cloud storage access effectively.
94
94
95
-
Use workspaces, projects, and roles to further secure access to cloud storage and data accessed using URLs by setting up cloud storage credentials. You can provide cloud storage authentication credentials globally for all projects in Label Studio, or use different credentials for access to different buckets on a per-project basis. Label Studio allows you to configure different cloud storage buckets for different projects, making it easier to manage access to the data. See [Sync data from external storage](/guide/storage.html).
95
+
#### Source storage logic and security
96
96
97
-
<divclass="enterprise-only">
97
+
Label Studio's cloud storage integration performs two key operations:
98
+
***Task sync and import**
99
+
***Media file serving**
98
100
99
-
In Label Studio Enterprise, if you're using Amazon S3, Label Studio can use an IAM role configured with an external ID to access S3 bucket contents securely. See [Set up an S3 connection with IAM role access](/guide/storage.html#Set-up-an-S3-connection-with-IAM-role-access)
101
+
Below, both are explained from a security perspective.
100
102
101
-
</div>
103
+
##### Task synchronization and import
104
+
105
+
After connecting a storage to a project, you have several options to load tasks into the project. Depending on the option, you need to provide specific permissions:
106
+
107
+
***Sync media files** (**LIST** permission required): Storage Sync automatically creates Label Studio tasks based on the file list in your storage when **Treat every bucket object as a source file** is enabled. Label Studio does not read the file content; it simply references the files (e.g., `{"image": "s3://bucket/1.jpg"}`).
108
+
109
+
***Sync JSON task files** (**LIST** and **GET** permissions required): Storage Sync reads Label Studio tasks from JSON files in your bucket and loads the entire JSON content into the Label Studio database when "Treat every bucket object as a source file" is enabled.
110
+
111
+
***No sync** (**none** permissions required): You can manually import JSON files containing Label Studio tasks and reference storage URIs (e.g., `{"image": "s3://bucket/1.jpg"}`) inside tasks.
112
+
113
+
##### Media file serving
114
+
115
+
Once Label Studio tasks are created, users can view and edit tasks in their browsers. To access media stored in your bucket, the following steps occur:
116
+
117
+
1.**Pre-signed URL Generation**: Label Studio Backend generates pre-signed URLs for files in the storage bucket. This step requires **GET** permission for pre-signed URL generation, but Label Studio does not download your data.
118
+
119
+
2.**User Browser Downloads**: The user's browser downloads and displays the media when viewing or labeling tasks. This requires the user's browser to access the pre-signed URLs directly.
120
+
121
+
#### Source storage behind your VPC
122
+
123
+
!!! warning Google Cloud Storage
124
+
Google Cloud Storage does **not** support IP or VPN restrictions for pre-signed URLs, making this approach infeasible for GCS. As an alternative security measure for GCS, you can use **signed URLs with short lifetimes**.
125
+
126
+
To ensure maximum security and isolation of your data behind a VPC, only allow access to users within your VPC. To do this, you can use the following technique — especially effective with Label Studio SaaS (Cloud, `app.humansignal.com`) and AWS S3:
127
+
128
+
1. Set **IP restrictions** for your S3 storage to allow Label Studio to perform task synchronization and generate pre-signed URLs for media file serving. IP restrictions enhance security by ensuring that only trusted networks can access your storage. GET (`s3:GetObject`) and LIST (`s3:ListBucket`) permissions are required. <spanclass="enterprise-only">The IP ranges for `app.humansignal.com` can be found in the documentation [here](saas#IP-range).</span>
129
+
130
+
2.**Establish your VPC Connection** between S3 Storage and Users' Browsers:
131
+
132
+
Configure your network so that users' browsers can access the S3 bucket securely within your Virtual Private Cloud (VPC). This ensures that data transmission occurs over a private network, enhancing security by preventing exposure to the public internet. Administrators can set up this connection using AWS VPC endpoints or other networking configurations within their infrastructure.
133
+
134
+
**Helpful Resources**:
135
+
-[AWS Documentation: VPC Endpoints for Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html)
136
+
-[AWS Documentation: How to Configure VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/privatelink/endpoint-services-overview.html)
137
+
138
+
<details>
139
+
<summary>Bucket Policy Example for S3 storage</summary>
140
+
141
+
!!! warning
142
+
These example bucket policies explicitly deny access to any requests outside the allowed IP addresses. Even the user that entered the bucket policy can be denied access to the bucket if the user doesn't meet the conditions. Therefore, make sure to review the bucket policy carefully before saving it. If you get accidentally locked out, see [How to regain access to an Amazon S3 bucket](https://repost.aws/knowledge-center/s3-accidentally-denied-access).
143
+
144
+
Go to your S3 bucket and then **Permissions > Bucket Policy** in the AWS management console. Add the following policy:
//// IP ranges for app.humansignal.com from the documentation
168
+
"x.x.x.x/32",
169
+
"x.x.x.x/32",
170
+
"x.x.x.x/32"
171
+
]
172
+
}
173
+
}
174
+
},
175
+
//// Optional
176
+
{
177
+
"Sid": "DenyAccessUnlessFromVPNForGetObject",
178
+
"Effect": "Deny",
179
+
"Principal": "*",
180
+
"Action": "s3:GetObject",
181
+
"Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*",
182
+
"Condition": {
183
+
"NotIpAddress": {
184
+
"aws:SourceIp": "YOUR_VPN_SUBNET/32"
185
+
}
186
+
}
187
+
}
188
+
]
189
+
}
190
+
```
191
+
</details>
192
+
193
+
<i>This image shows how you can securely configure source cloud storages with Label Studio using your VPC and IP restrictions</i>
194
+
195
+
<imgwidth="49%"style="display: inline-block; margin-right: 5px;"src="/images/storages/cloud-storage-ip-restriction.jpg"alt="Label Studio + Cloud Storage IP Restriction"class="make-intense-zoom" />
196
+
197
+
<imgwidth="49%"style="display: inline-block;"src="/images/storages/cloud-storage-vpn.jpg"alt="Label Studio + Cloud Storage VPC"class="make-intense-zoom" />
198
+
199
+
#### Additional Notes
200
+
201
+
**Google ADC**: If you use Label Studio on-premises with Google Cloud Storage, you can set up [Application Default Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc) to provide cloud storage authentication globally for all projects, so users do not need to configure credentials manually.
202
+
203
+
**AWS S3 IAM**: In Label Studio Enterprise, you can use an IAM role configured with an external ID to access S3 bucket contents securely. An 'external ID' is a unique identifier that enhances security by ensuring that only trusted entities can assume the role, reducing the risk of unauthorized access. <spanclass="enterprise-only">See [Set up an S3 connection with IAM role access](storage#Set-up-an-S3-connection-with-IAM-role-access)</span>
102
204
205
+
**Storage Regions**: To minimize latency and improve efficiency, store data in cloud storage buckets that are geographically closer to your team rather than near the Label Studio server.
103
206
104
-
!!! warning Note on securing cloud data
105
-
If you need to secure your data in a way to ensure that it is not touched by Label Studio, see[Source storage Sync and URI resolving](storage#Source-storage-Sync-and-URI-resolving).
207
+
!!! note More details on Cloud Storages
208
+
See more details on[Source storage Sync and URI resolving](storage#Source-storage-Sync-and-URI-resolving).
Copy file name to clipboardexpand all lines: docs/source/guide/storage.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -48,7 +48,7 @@ You can add source storage connections to sync data from an external source to a
48
48
49
49
Label Studio does not automatically sync data from source storage. If you upload new data to a connected cloud storage bucket, sync the storage connection using the UI to add the new labeling tasks to Label Studio without restarting. You can also use the API to set up or sync storage connections. See [Label Studio API](https://api.labelstud.io/api-reference/introduction/getting-started) and locate the relevant storage connection type.
50
50
51
-
Task data synced from cloud storage is not stored in Label Studio. Instead, the data is accessed using a URL. You can also secure access to cloud storage using cloud storage credentials. For details, see [Secure access to cloud storage](security.html#Secure-access-to-cloud-storage).
51
+
Task data synced from cloud storage is not stored in Label Studio. Instead, the data is accessed using presigned URLs. You can also secure access to cloud storage using VPC and IP restrictions for your storage. For details, see [Secure access to cloud storage](security.html#Secure-access-to-cloud-storages).
52
52
53
53
#### Source storage permissions
54
54
@@ -682,4 +682,4 @@ For more troubleshooting information, see [Troubleshooting Label Studio](trouble
682
682
683
683
For more troubleshooting information, see [Troubleshooting Import, Export, & Storage](https://support.humansignal.com/hc/en-us/sections/16982163062029-Import-Export-Storage) in the HumanSignal support center.
0 commit comments