|
| 1 | +--- |
| 2 | +title: Self-hosted on GCP |
| 3 | +sidebarTitle: GCP |
| 4 | +icon: "google" |
| 5 | +--- |
| 6 | + |
| 7 | +When running LangSmith on [Google Cloud Platform (GCP)](https://cloud.google.com/), you can set up in either [full self-hosted](/langsmith/self-hosted) or [hybrid](/langsmith/hybrid) mode. Full self-hosted mode deploys a complete LangSmith platform with observability functionality as well as the option to create agent deployments. Hybrid mode entails just the infrastructure to run agents in a data plane within your cloud, while our SaaS provides the control plane and observability functionality. |
| 8 | + |
| 9 | +This page provides GCP-specific architecture patterns, service recommendations, and best practices for deploying and operating LangSmith on GCP. |
| 10 | + |
| 11 | +<Note> |
| 12 | +LangChain provides Terraform modules specifically for GCP to help provision infrastructure for LangSmith. These modules can quickly set up GKE clusters, Cloud SQL, Memorystore Redis, Cloud Storage, and networking resources. |
| 13 | + |
| 14 | +View the [GCP Terraform modules](https://github.com/langchain-ai/terraform/tree/main/modules/gcp) for documentation and examples. |
| 15 | +</Note> |
| 16 | + |
| 17 | +## Reference architecture |
| 18 | + |
| 19 | +We recommend leveraging GCP's managed services to provide a scalable, secure, and resilient platform. The following architecture applies to both self-hosted and hybrid and aligns with the [Google Cloud Well-Architected Framework](https://docs.cloud.google.com/architecture/framework): |
| 20 | + |
| 21 | + |
| 22 | + |
| 23 | +- <Icon icon="globe" /> **Ingress & networking**: Requests enter via [Cloud Load Balancing](https://cloud.google.com/load-balancing) within your [VPC](https://cloud.google.com/vpc), secured using [Cloud Armor](https://cloud.google.com/armor) and [IAM](https://cloud.google.com/iam)-based authentication. |
| 24 | +- <Icon icon="cube" /> **Frontend & backend services:** Containers run on [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine), orchestrated behind the load balancer. Routes requests to other services within the cluster as necessary. |
| 25 | +- <Icon icon="database" /> **Storage & databases:** |
| 26 | + - [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres): metadata, projects, users, and short-term and long-term memory for deployed agents. LangSmith supports PostgreSQL version 14 or higher. |
| 27 | + - [Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis): caching and job queues. Memorystore can be in single-instance or cluster mode, running Redis OSS version 5 or higher. |
| 28 | + - ClickHouse + [Persistent Disks](https://cloud.google.com/compute/docs/disks): analytics and trace storage. |
| 29 | + - We recommend using an [externally managed ClickHouse solution](/langsmith/self-host-external-clickhouse) unless security or compliance reasons |
| 30 | + prevent you from doing so. |
| 31 | + - ClickHouse is not required for hybrid deployments. |
| 32 | + - [Cloud Storage](https://cloud.google.com/storage): object storage for trace artifacts and telemetry. |
| 33 | + |
| 34 | +- <Icon icon="sparkles" /> **LLM integration:** Optionally proxy requests to [Vertex AI](https://cloud.google.com/vertex-ai) for LLM inference. |
| 35 | +- <Icon icon="chart-line" /> **Monitoring & observability:** Integrate with [Cloud Monitoring](https://cloud.google.com/monitoring) and [Cloud Logging](https://cloud.google.com/logging) |
| 36 | + |
| 37 | + |
| 38 | +## Compute options |
| 39 | + |
| 40 | +LangSmith supports multiple compute options depending on your requirements: |
| 41 | + |
| 42 | +| Compute option | Description | Suitable for | |
| 43 | +|-----------------|-------------|--------------| |
| 44 | +| **Google Kubernetes Engine (preferred)** | Advanced scaling and multi-tenant support | Large enterprises | |
| 45 | +| **Compute Engine-based** | Full control, BYO-infra | Regulated or air-gapped environments | |
| 46 | + |
| 47 | +## Google Cloud Well-Architected best practices |
| 48 | + |
| 49 | +This reference is designed to align with the six pillars of the Google Cloud Well-Architected Framework: |
| 50 | + |
| 51 | +### Operational excellence |
| 52 | + |
| 53 | +- Automate deployments with IaC ([Terraform](https://www.terraform.io/) / [Deployment Manager](https://cloud.google.com/deployment-manager)). |
| 54 | +- Use [Secret Manager](https://cloud.google.com/secret-manager) for configuration and sensitive data. |
| 55 | +- Configure your LangSmith instance to [export telemetry data](/langsmith/export-backend) and continuously monitor via [Cloud Logging](https://cloud.google.com/logging). |
| 56 | +- The preferred method to manage [LangSmith deployments](/langsmith/deployments) is to create a CI process that builds [Agent Server](/langsmith/agent-server) images and pushes them to [Artifact Registry](https://cloud.google.com/artifact-registry). Create a test deployment for pull requests before deploying a new revision to staging or production upon PR merge. |
| 57 | + |
| 58 | +### Security |
| 59 | + |
| 60 | +- Use [IAM](https://cloud.google.com/iam) roles with least-privilege policies and [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) for secure pod-to-GCP-service authentication. |
| 61 | +- Enable encryption at rest ([Cloud SQL](https://docs.cloud.google.com/sql/docs/postgres/cmek), [Cloud Storage](https://cloud.google.com/storage/docs/encryption), Persistent Disks) and in transit (TLS 1.2+). |
| 62 | +- Integrate with [Secret Manager](https://cloud.google.com/secret-manager) for credentials. |
| 63 | +- Use [Identity Platform](https://cloud.google.com/identity-platform) or [Workload Identity Federation](https://cloud.google.com/iam/docs/workload-identity-federation) as an IDP in conjunction with LangSmith's built-in authentication and authorization features to secure access to agents and their tools. |
| 64 | + |
| 65 | +### Reliability |
| 66 | + |
| 67 | +- Replicate the LangSmith [data plane](/langsmith/data-plane) across regions: Deploy identical data planes to Kubernetes clusters in different regions for LangSmith Deployment. Deploy [Cloud SQL](https://cloud.google.com/sql/docs/postgres/high-availability) and [GKE](https://docs.cloud.google.com/kubernetes-engine/docs/concepts/configuration-overview) services across multiple zones. |
| 68 | +- Implement [autoscaling](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler) for backend workers using [Horizontal Pod Autoscaler](https://cloud.google.com/kubernetes-engine/docs/concepts/horizontalpodautoscaler) and [Cluster Autoscaler](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler). |
| 69 | +- Use [Cloud DNS](https://cloud.google.com/dns) health checks and failover policies. |
| 70 | + |
| 71 | +### Performance optimization |
| 72 | + |
| 73 | +- Leverage [Compute Engine](https://cloud.google.com/compute) instances for optimized compute with [machine type selection](https://cloud.google.com/compute/docs/machine-types). |
| 74 | +- Use [Cloud Storage lifecycle policies](https://cloud.google.com/storage/docs/lifecycle) for infrequently accessed trace data, moving to [Nearline](https://cloud.google.com/storage/docs/storage-classes#nearline) or [Coldline](https://cloud.google.com/storage/docs/storage-classes#coldline) storage classes. |
| 75 | + |
| 76 | +### Cost optimization |
| 77 | + |
| 78 | +- Right-size [GKE](https://cloud.google.com/kubernetes-engine) clusters using [Committed Use Discounts](https://cloud.google.com/compute/docs/instances/signing-up-committed-use-discounts) and [Sustained Use Discounts](https://cloud.google.com/compute/docs/sustained-use-discounts). |
| 79 | +- Monitor cost KPIs using [Cloud Billing](https://cloud.google.com/billing/docs) dashboards and [Cost Management](https://cloud.google.com/cost-management) tools. |
| 80 | + |
| 81 | +### Sustainability |
| 82 | + |
| 83 | +- Minimize idle workloads with on-demand compute and [autoscaling](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler). |
| 84 | +- Store telemetry in low-latency, low-cost tiers using [Cloud Storage lifecycle policies](https://cloud.google.com/storage/docs/lifecycle). |
| 85 | +- Enable auto-shutdown for non-prod environments using [scheduled actions](https://cloud.google.com/compute/docs/instances/schedule-instance-start-stop). |
| 86 | + |
| 87 | +## Security and compliance |
| 88 | + |
| 89 | +LangSmith can be configured for: |
| 90 | + |
| 91 | +- [Private Service Connect](https://cloud.google.com/vpc/docs/private-service-connect)-only access (no public internet exposure, besides egress necessary for billing). |
| 92 | +- [Cloud KMS](https://cloud.google.com/kms)-based encryption keys for Cloud Storage, Cloud SQL, and Persistent Disks. |
| 93 | +- Audit logging to [Cloud Logging](https://cloud.google.com/logging) and [Cloud Audit Logs](https://cloud.google.com/logging/docs/audit). |
| 94 | + |
| 95 | +Customers can deploy in [Assured Workloads](https://cloud.google.com/assured-workloads) regions for compliance with ISO, HIPAA, or other regulatory requirements as needed. |
| 96 | + |
| 97 | +## Monitoring and evals |
| 98 | + |
| 99 | +Use LangSmith to: |
| 100 | + |
| 101 | +- Capture traces from LLM apps running on [Vertex AI](https://cloud.google.com/vertex-ai). |
| 102 | +- Evaluate model outputs via [LangSmith datasets](/langsmith/manage-datasets). |
| 103 | +- Track latency, token usage, and success rates. |
| 104 | + |
| 105 | +Integrate with: |
| 106 | + |
| 107 | +- [Cloud Monitoring](https://cloud.google.com/monitoring) dashboards. |
| 108 | +- [OpenTelemetry](https://opentelemetry.io/) and [Prometheus](https://prometheus.io/) exporters. |
| 109 | + |
0 commit comments