Skip to content

Commit 888c3d5

Browse files
DavoCoderkatmayb
authored andcommitted
Add GCP self-hosted architecture documentation
1 parent 436856d commit 888c3d5

File tree

3 files changed

+111
-1
lines changed

3 files changed

+111
-1
lines changed

src/docs.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1341,7 +1341,8 @@
13411341
"group": "Self-hosted cloud architecture",
13421342
"pages": [
13431343
"langsmith/aws-self-hosted",
1344-
"langsmith/azure-self-hosted"
1344+
"langsmith/azure-self-hosted",
1345+
"langsmith/gcp-self-hosted"
13451346
]
13461347
},
13471348
{

src/langsmith/gcp-self-hosted.mdx

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
---
2+
title: Self-hosted on GCP
3+
sidebarTitle: GCP
4+
icon: "google"
5+
---
6+
7+
When running LangSmith on [Google Cloud Platform (GCP)](https://cloud.google.com/), you can set up in either [full self-hosted](/langsmith/self-hosted) or [hybrid](/langsmith/hybrid) mode. Full self-hosted mode deploys a complete LangSmith platform with observability functionality as well as the option to create agent deployments. Hybrid mode entails just the infrastructure to run agents in a data plane within your cloud, while our SaaS provides the control plane and observability functionality.
8+
9+
This page provides GCP-specific architecture patterns, service recommendations, and best practices for deploying and operating LangSmith on GCP.
10+
11+
<Note>
12+
LangChain provides Terraform modules specifically for GCP to help provision infrastructure for LangSmith. These modules can quickly set up GKE clusters, Cloud SQL, Memorystore Redis, Cloud Storage, and networking resources.
13+
14+
View the [GCP Terraform modules](https://github.com/langchain-ai/terraform/tree/main/modules/gcp) for documentation and examples.
15+
</Note>
16+
17+
## Reference architecture
18+
19+
We recommend leveraging GCP's managed services to provide a scalable, secure, and resilient platform. The following architecture applies to both self-hosted and hybrid and aligns with the [Google Cloud Well-Architected Framework](https://docs.cloud.google.com/architecture/framework):
20+
21+
![Architecture diagram showing GCP relations to LangSmith services](/langsmith/images/gcp-architecture-self-hosted.png)
22+
23+
- <Icon icon="globe" /> **Ingress & networking**: Requests enter via [Cloud Load Balancing](https://cloud.google.com/load-balancing) within your [VPC](https://cloud.google.com/vpc), secured using [Cloud Armor](https://cloud.google.com/armor) and [IAM](https://cloud.google.com/iam)-based authentication.
24+
- <Icon icon="cube" /> **Frontend & backend services:** Containers run on [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine), orchestrated behind the load balancer. Routes requests to other services within the cluster as necessary.
25+
- <Icon icon="database" /> **Storage & databases:**
26+
- [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres): metadata, projects, users, and short-term and long-term memory for deployed agents. LangSmith supports PostgreSQL version 14 or higher.
27+
- [Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis): caching and job queues. Memorystore can be in single-instance or cluster mode, running Redis OSS version 5 or higher.
28+
- ClickHouse + [Persistent Disks](https://cloud.google.com/compute/docs/disks): analytics and trace storage.
29+
- We recommend using an [externally managed ClickHouse solution](/langsmith/self-host-external-clickhouse) unless security or compliance reasons
30+
prevent you from doing so.
31+
- ClickHouse is not required for hybrid deployments.
32+
- [Cloud Storage](https://cloud.google.com/storage): object storage for trace artifacts and telemetry.
33+
34+
- <Icon icon="sparkles" /> **LLM integration:** Optionally proxy requests to [Vertex AI](https://cloud.google.com/vertex-ai) for LLM inference.
35+
- <Icon icon="chart-line" /> **Monitoring & observability:** Integrate with [Cloud Monitoring](https://cloud.google.com/monitoring) and [Cloud Logging](https://cloud.google.com/logging)
36+
37+
38+
## Compute options
39+
40+
LangSmith supports multiple compute options depending on your requirements:
41+
42+
| Compute option | Description | Suitable for |
43+
|-----------------|-------------|--------------|
44+
| **Google Kubernetes Engine (preferred)** | Advanced scaling and multi-tenant support | Large enterprises |
45+
| **Compute Engine-based** | Full control, BYO-infra | Regulated or air-gapped environments |
46+
47+
## Google Cloud Well-Architected best practices
48+
49+
This reference is designed to align with the six pillars of the Google Cloud Well-Architected Framework:
50+
51+
### Operational excellence
52+
53+
- Automate deployments with IaC ([Terraform](https://www.terraform.io/) / [Deployment Manager](https://cloud.google.com/deployment-manager)).
54+
- Use [Secret Manager](https://cloud.google.com/secret-manager) for configuration and sensitive data.
55+
- Configure your LangSmith instance to [export telemetry data](/langsmith/export-backend) and continuously monitor via [Cloud Logging](https://cloud.google.com/logging).
56+
- The preferred method to manage [LangSmith deployments](/langsmith/deployments) is to create a CI process that builds [Agent Server](/langsmith/agent-server) images and pushes them to [Artifact Registry](https://cloud.google.com/artifact-registry). Create a test deployment for pull requests before deploying a new revision to staging or production upon PR merge.
57+
58+
### Security
59+
60+
- Use [IAM](https://cloud.google.com/iam) roles with least-privilege policies and [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) for secure pod-to-GCP-service authentication.
61+
- Enable encryption at rest ([Cloud SQL](https://docs.cloud.google.com/sql/docs/postgres/cmek), [Cloud Storage](https://cloud.google.com/storage/docs/encryption), Persistent Disks) and in transit (TLS 1.2+).
62+
- Integrate with [Secret Manager](https://cloud.google.com/secret-manager) for credentials.
63+
- Use [Identity Platform](https://cloud.google.com/identity-platform) or [Workload Identity Federation](https://cloud.google.com/iam/docs/workload-identity-federation) as an IDP in conjunction with LangSmith's built-in authentication and authorization features to secure access to agents and their tools.
64+
65+
### Reliability
66+
67+
- Replicate the LangSmith [data plane](/langsmith/data-plane) across regions: Deploy identical data planes to Kubernetes clusters in different regions for LangSmith Deployment. Deploy [Cloud SQL](https://cloud.google.com/sql/docs/postgres/high-availability) and [GKE](https://docs.cloud.google.com/kubernetes-engine/docs/concepts/configuration-overview) services across multiple zones.
68+
- Implement [autoscaling](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler) for backend workers using [Horizontal Pod Autoscaler](https://cloud.google.com/kubernetes-engine/docs/concepts/horizontalpodautoscaler) and [Cluster Autoscaler](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler).
69+
- Use [Cloud DNS](https://cloud.google.com/dns) health checks and failover policies.
70+
71+
### Performance optimization
72+
73+
- Leverage [Compute Engine](https://cloud.google.com/compute) instances for optimized compute with [machine type selection](https://cloud.google.com/compute/docs/machine-types).
74+
- Use [Cloud Storage lifecycle policies](https://cloud.google.com/storage/docs/lifecycle) for infrequently accessed trace data, moving to [Nearline](https://cloud.google.com/storage/docs/storage-classes#nearline) or [Coldline](https://cloud.google.com/storage/docs/storage-classes#coldline) storage classes.
75+
76+
### Cost optimization
77+
78+
- Right-size [GKE](https://cloud.google.com/kubernetes-engine) clusters using [Committed Use Discounts](https://cloud.google.com/compute/docs/instances/signing-up-committed-use-discounts) and [Sustained Use Discounts](https://cloud.google.com/compute/docs/sustained-use-discounts).
79+
- Monitor cost KPIs using [Cloud Billing](https://cloud.google.com/billing/docs) dashboards and [Cost Management](https://cloud.google.com/cost-management) tools.
80+
81+
### Sustainability
82+
83+
- Minimize idle workloads with on-demand compute and [autoscaling](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler).
84+
- Store telemetry in low-latency, low-cost tiers using [Cloud Storage lifecycle policies](https://cloud.google.com/storage/docs/lifecycle).
85+
- Enable auto-shutdown for non-prod environments using [scheduled actions](https://cloud.google.com/compute/docs/instances/schedule-instance-start-stop).
86+
87+
## Security and compliance
88+
89+
LangSmith can be configured for:
90+
91+
- [Private Service Connect](https://cloud.google.com/vpc/docs/private-service-connect)-only access (no public internet exposure, besides egress necessary for billing).
92+
- [Cloud KMS](https://cloud.google.com/kms)-based encryption keys for Cloud Storage, Cloud SQL, and Persistent Disks.
93+
- Audit logging to [Cloud Logging](https://cloud.google.com/logging) and [Cloud Audit Logs](https://cloud.google.com/logging/docs/audit).
94+
95+
Customers can deploy in [Assured Workloads](https://cloud.google.com/assured-workloads) regions for compliance with ISO, HIPAA, or other regulatory requirements as needed.
96+
97+
## Monitoring and evals
98+
99+
Use LangSmith to:
100+
101+
- Capture traces from LLM apps running on [Vertex AI](https://cloud.google.com/vertex-ai).
102+
- Evaluate model outputs via [LangSmith datasets](/langsmith/manage-datasets).
103+
- Track latency, token usage, and success rates.
104+
105+
Integrate with:
106+
107+
- [Cloud Monitoring](https://cloud.google.com/monitoring) dashboards.
108+
- [OpenTelemetry](https://opentelemetry.io/) and [Prometheus](https://prometheus.io/) exporters.
109+
284 KB
Loading

0 commit comments

Comments
 (0)