Skip to content

Self-hosting: 'do it all for me' and 'bring my own infrastructure' setup guidance #574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Self-hosting: 'do it for me' and 'bring my own infrastructure' setup …
…guidance
Paul-Cornell committed Apr 4, 2025
commit 397958d2c369a1bb79ae91079803811fade6e5ab
95 changes: 91 additions & 4 deletions self-hosted/aws/onboard.mdx
Original file line number Diff line number Diff line change
@@ -12,8 +12,10 @@ sidebarTitle: Onboarding
</Note>

After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. To do this, you
must first set up your AWS account as follows.
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. Choose one of the following setup options:

- [Do it all for me](#do-it-all-for-me): Have Unstructured set up the required infrastructure in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure.
- [Bring my own infrastructure](#bring-my-own-infrastructure): Set up the required infrastructure yourself in your AWS account, and then have Unstructured deploy the Unstructured UI and API into your existing infrastructure.

## Questions? Need help?

@@ -22,9 +24,94 @@ email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io
[contact form](https://unstructured.io/contact) on the Unstructured website, and a member of the Unstructured sales or technical enablement teams
will get back to you as soon as possible.

## Onboarding checklist
## Do it all for me

If you want Unstructured to set up the required infrastructure for you in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure, then provide your Unstructured sales representative or technical enablement contact with
the access credentials for an IAM user or service principal in your AWS account that has the following required permissions.

### Core networking permissions

For VPC and subnet management:

- `ec2:CreateVpc`
- `ec2:CreateSubnet`
- `ec2:CreateRouteTable`
- `ec2:CreateInternetGateway`
- `ec2:CreateNatGateway`
- `ec2:ModifyVpcAttribute` (for DNS settings)
- `ec2:AssociateRouteTable`, `ec2:CreateRoute` (for public and private route tables)
- `ec2:AllocateAddress` (for Elastic IP assignment to the NAT Gateway)

For security group rules:

- `ec2:AuthorizeSecurityGroupIngress/Egress` (to configure cluster and node security groups to allow VPC CIDR traffic)

### EKS permissions

For the cluster role:

- Attach the managed policies `AmazonEKSClusterPolicy` and `AmazonEKSVPCResourceController` to a role with `sts:AssumeRole` trust for `eks.amazonaws.com`

For the node group role:

Attach these managed policies:

- `AmazonEKSWorkerNodePolicy` (for node operations)
- `AmazonEKS_CNI_Policy` (for networking)
- `AmazonEC2ContainerRegistryReadOnly` (for ECR access)

For OIDC integration:

- `iam:CreateOpenIDConnectProvider` (to associate the EKS cluster with IAM OIDC)
- `iam:CreateRole` + `iam:AttachRolePolicy` (for service accounts in the `recommender`, `etl-operator`, and `data-broker` namespaces)

### Storage and database

These permissions:

- `s3:CreateBucket`
- `s3:PutBucketVersioning`
- `s3:PutBucketEncryption`

For these S3 buckets:

- `u10d-*-etl-blob-cache`
- `u10d-*-etl-job-db`
- `u10d-*-etl-job-status`
- `u10d-*-job-files`

For RDS:

- `rds:CreateDBInstance`
- `rds:CreateDBSubnetGroup`
- `rds:CreateDBSecurityGroup` + `ec2:AuthorizeSecurityGroupIngress` (to allow VPC CIDR access)

### Add-ons and utilities

For the EBS CSI Driver:

- `eks:CreateAddon` with IAM role attachment permissions for the `ebs.csi.aws.com` service account

For the SSH Key:

- `ec2:CreateKeyPair` + `ec2:ExportKeyPair` (for node group remote access)

### Cross-service requirements

- For IAM: `iam:PassRole` (to assign roles to EKS, RDS, and S3)
- For KMS: `kms:CreateKey` (if using CMK for S3 and RDS encryption)
- For CloudFormation: `cloudformation:*`

For least privilege, scope resource ARNs in policies (for example, restrict S3 bucket names with wildcards such as `u10d-*-etl*`).
The EKS Pod Identity Agent requires `eks-auth:AssumeRoleForPodIdentity` permission on node roles when used with IRSA.

## Bring my own infrastructure

If you want to set up the required infrastructure yourself, set things up as follows within your AWS account for Unstructured to deploy the Unstructured UI and API into.

Set up the following infrastructure within your AWS account for Unstructured to deploy the Unstructured UI and API into.
You must also provide your Unstructured sales representative or technical enablement contact with
the access credentials for an IAM user or service principal in your AWS account that has access to the target Amazon Elastic Kubernetes Service (EKS) cluster to deploy the
Unstructured UI and API into.

### VPC and networking

69 changes: 65 additions & 4 deletions self-hosted/azure/onboard.mdx
Original file line number Diff line number Diff line change
@@ -12,8 +12,10 @@ sidebarTitle: Onboarding
</Note>

After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. To do this, you
must first set up your Azure account as follows.
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. Choose one of the following setup options:

- [Do it all for me](#do-it-all-for-me): Have Unstructured set up the required infrastructure in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure.
- [Bring my own infrastructure](#bring-my-own-infrastructure): Set up the required infrastructure yourself in your AWS account, and then have Unstructured deploy the Unstructured UI and API into your existing infrastructure.

## Questions? Need help?

@@ -22,9 +24,68 @@ email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io
[contact form](https://unstructured.io/contact) on the Unstructured website, and a member of the Unstructured sales or technical enablement teams
will get back to you as soon as possible.

## Onboarding checklist
## Do it all for me

If you want Unstructured to set up the required infrastructure for you into your Azure account and then deploy the Unstructured UI and API into that newly created infrastrucrure, then provide your Unstructured sales representative or technical enablement contact with
the access credentials for a Microsoft Entra ID user or service principal in your Azure account that has the following required permissions.

### Subscription and resource group

- `Microsoft.Resources/subscriptions/resourceGroups/write` (to create the resource group)
- `Microsoft.Resources/subscriptions/resourceGroups/read` (to read the resource group)

### VNet and networking

- `Microsoft.Network/virtualNetworks/write` (to create the VNet)
- `Microsoft.Network/virtualNetworks/read` (to read the VNet)
- `Microsoft.Network/publicIPAddresses/write` (to create the public IPs)
- `Microsoft.Network/publicIPAddresses/read` (to read the public IPs)
- `Microsoft.Network/natGateways/write` (to create the NAT Gateway)
- `Microsoft.Network/natGateways/read` (to read the NAT Gateway)
- `Microsoft.Network/routeTables/write` (to create the route tables)
- `Microsoft.Network/routeTables/read` (to read the route tables)
- `Microsoft.Network/networkSecurityGroups/write` (to create the NSGs)
- `Microsoft.Network/networkSecurityGroups/read` (to read the NSGs)

### AKS cluster

- `Microsoft.ContainerService/managedClusters/write` (to create the AKS cluster)
- `Microsoft.ContainerService/managedClusters/read` (to read the AKS cluster)
- `Microsoft.ContainerService/agentPools/write` (to create the node pools)
- `Microsoft.ContainerService/agentPools/read` (to read the node pools)

### Managed identities and RBAC

- `Microsoft.ManagedIdentity/userAssignedIdentities/write` (to create the managed identities)
- `Microsoft.ManagedIdentity/userAssignedIdentities/read` (to read managed identities)
- Assign built-in roles such as:

- **Contributor** or scoped **Network Contributor** for the AKS cluster identity
- **Monitoring Metrics Publisher**, **AcrPull**, and **Storage Blob Data Reader** for the node pool identity
- **Storage Blob Data Contributor** for workload identities

### Kubernetes add-ons

Permissions depend on the Helm/YAML installation, but Azure RBAC integration requires `Microsoft.ContainerService/managedClusters/accessProfiles/*/read` (to access kubeconfig)

### Storage class

- `Microsoft.Storage/storageAccounts/write` (to create the storage account for CSI driver provisioning)
- `Microsoft.Storage/storageAccounts/read`

### PostgreSQL database

- `Microsoft.DBforPostgreSQL/flexibleServers/write` (to create the PostgreSQL server)
- `Microsoft.DBforPostgreSQL/flexibleServers/read`
- NSG permissions for database access: allow traffic from the VNet CIDR

## Bring my own infrastructure

If you want to set up the required infrastructure yourself, set things up as follows within your Azure account for Unstructured to deploy the Unstructured UI and API into.

Set up the following infrastructure within your Azure account for Unstructured to deploy the Unstructured UI and API into.
You must also provide your Unstructured sales representative or technical enablement contact with
the access credentials for an IAM user or service principal in your AWS account that has access to the target Azure Kubernetes Service (AKS) cluster to deploy the
Unstructured UI and API into.

### **Azure subscription and resource group**

103 changes: 99 additions & 4 deletions self-hosted/gcp/onboard.mdx
Original file line number Diff line number Diff line change
@@ -12,8 +12,10 @@ sidebarTitle: Onboarding
</Note>

After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. To do this, you
must first set up your GCP account as follows.
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. Choose one of the following setup options:

- [Do it all for me](#do-it-all-for-me): Have Unstructured set up the required infrastructure in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure.
- [Bring my own infrastructure](#bring-my-own-infrastructure): Set up the required infrastructure yourself in your AWS account, and then have Unstructured deploy the Unstructured UI and API into your existing infrastructure.

## Questions? Need help?

@@ -22,9 +24,102 @@ email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io
[contact form](https://unstructured.io/contact) on the Unstructured website, and a member of the Unstructured sales or technical enablement teams
will get back to you as soon as possible.

## Onboarding checklist
## Do it all for me

If you want Unstructured to set up the required infrastructure for you in your GCP account and then deploy the Unstructured UI and API into that newly created infrastructure, then provide your Unstructured sales representative or technical enablement contact with
the access credentials for an IAM user or service account in your GCP account that has the following required permissions:

### Core networking permissions

VPC/subnet management:

- `compute.networks.create`
- `compute.subnetworks.create`
- `compute.routers.create` (for Cloud NAT)
- `compute.addresses.create` (for NAT IPs)
- `compute.firewalls.create` (for intra-cluster traffic rules)

Shared VPC (if used):

- `compute.organizations.admin` (for the host project)
- `compute.networks.use` (for the service project)

### GKE cluster permissions

Control plane:

- `container.clusters.create`
- `container.clusters.update` (for private cluster settings)
- `compute.networks.useExternalIp` (for public endpoint access)

Node pools:

- `compute.instances.create`
- `compute.disks.create` (for node disks)
- `compute.instanceGroups.create` (for autoscaling)

IAM roles:

- For the GKE cluster SA service account: `roles/container.hostServiceAgentUser`
- For the node SA service account: `roles/container.nodeServiceAccount`
- For the workload identity service account: `roles/iam.workloadIdentityUser`

### Storage and database

GCS buckets:

- `storage.buckets.create`
- `storage.objects.create` (for versioning)
- `storage.buckets.update` (for encryption/lifecycle rules)

Cloud SQL:

- `cloudsql.instances.create`
- `cloudsql.instances.connect` (for private IPs)
- `vpcaccess.connectors.use` (if using Serverless VPC Access)

Persistent disks (CSI):

- `compute.disks.create` (for `pd.csi.storage.gke.io`)
- `compute.subnetworks.use` (for regional disks)

### Advanced configurations

Workload identity:

- `iam.serviceAccounts.getAccessToken` (for federated access)
- `iam.serviceAccounts.setIamPolicy` (to bind Kubernetes SAs to GCP SAs)

Cloud NAT:

- `compute.routers.update` (for NAT configuration)
- `compute.addresses.use` (for NAT IP allocation)

OS login/SSH:

- `compute.projects.setCommonInstanceMetadata` (for SSH key upload)
- `compute.instances.osAdminLogin`

### Minimum required roles

Project level:

- `roles/editor` (broad access, or scope with custom roles)

Scoped roles:

- `roles/compute.networkAdmin` (for VPC and subnets)
- `roles/container.admin` (for GKE)
- `roles/storage.admin` (for GCS)
- `roles/cloudsql.admin` (for Postgres)

## Bring my own infrastructure

If you want to set up the required infrastructure yourself, set things up as follows within your GCP account for Unstructured to deploy the Unstructured UI and API into.

Set up the following infrastructure within your GCP account for Unstructured to deploy the Unstructured UI and API into.
You must also provide your Unstructured sales representative or technical enablement contact with
the access credentials for an IAM user or service account in your GCP account that has access to the target Google Kubernetes Engine (GKE) cluster to deploy the
Unstructured UI and API into.

### **VPC and networking (GCP equivalent)**