|
| 1 | +--- |
| 2 | +title: Export trace data to BigQuery |
| 3 | +sidebarTitle: BigQuery integration |
| 4 | +description: Load LangSmith trace data into BigQuery using bulk export to GCS. |
| 5 | +--- |
| 6 | + |
| 7 | +<Info> |
| 8 | +**Plan restrictions apply** |
| 9 | + |
| 10 | +Bulk export is only available on [LangSmith Plus or Enterprise tiers](https://www.langchain.com/pricing-langsmith). |
| 11 | +</Info> |
| 12 | + |
| 13 | +LangSmith can export trace data to a Google Cloud Storage (GCS) bucket in Parquet format. From there, you can load it into BigQuery as an external table (queried in place from GCS) or as a native table (copied into BigQuery storage). |
| 14 | + |
| 15 | +This guide covers: |
| 16 | + |
| 17 | +- Setting up a GCS bucket and HMAC credentials for LangSmith. |
| 18 | +- Creating a bulk export destination and export job. |
| 19 | +- Loading the exported data into BigQuery. |
| 20 | + |
| 21 | +For full details on bulk export configuration options, refer to [Bulk export trace data](/langsmith/data-export) and [Manage bulk export destinations](/langsmith/data-export-destinations). |
| 22 | + |
| 23 | +## Prerequisites |
| 24 | + |
| 25 | +- Data in your LangSmith [Tracing project](https://smith.langchain.com/projects). |
| 26 | +- [`gcloud` CLI installed](https://docs.cloud.google.com/sdk/docs/install-sdk). (You can also use the Google Cloud console for setup.) |
| 27 | + |
| 28 | +## 1. Create a GCS bucket |
| 29 | + |
| 30 | +Create a dedicated GCS bucket for LangSmith exports. Using a dedicated bucket makes it easier to grant scoped permissions without affecting other data: |
| 31 | + |
| 32 | +```bash |
| 33 | +gcloud storage buckets create gs://YOUR_BUCKET_NAME \ |
| 34 | + --location=US \ |
| 35 | + --uniform-bucket-level-access |
| 36 | +``` |
| 37 | + |
| 38 | +Choose a region close to your BigQuery dataset to minimize latency and avoid cross-region egress charges. |
| 39 | + |
| 40 | +## 2. Create a service account and grant access |
| 41 | + |
| 42 | +Create a GCP service account that LangSmith will use to write data to GCS: |
| 43 | + |
| 44 | +```bash |
| 45 | +gcloud iam service-accounts create langsmith-bulk-export \ |
| 46 | + --display-name="LangSmith Bulk Export" |
| 47 | +``` |
| 48 | + |
| 49 | +Grant the service account write access to your bucket. The minimum required permission is `storage.objects.create`. Granting `storage.objects.delete` is optional, but recommended. LangSmith uses it to clean up a temporary test file created during destination validation. If this permission is absent, a `tmp/` folder may remain in your bucket. |
| 50 | + |
| 51 | +The "Storage Object Admin" predefined role covers all required and recommended permissions: |
| 52 | + |
| 53 | +```bash |
| 54 | +gcloud storage buckets add-iam-policy-binding gs://YOUR_BUCKET_NAME \ |
| 55 | + --member="serviceAccount:langsmith-bulk-export@YOUR_PROJECT.iam.gserviceaccount.com" \ |
| 56 | + --role="roles/storage.objectAdmin" |
| 57 | +``` |
| 58 | + |
| 59 | +To use a minimal custom role instead, grant only: |
| 60 | + |
| 61 | +- `storage.objects.create` (required) |
| 62 | +- `storage.objects.delete` (optional, for test file cleanup) |
| 63 | +- `storage.objects.get` (optional but recommended, for file size verification) |
| 64 | +- `storage.multipartUploads.create` (optional but recommended, for large file uploads) |
| 65 | + |
| 66 | +## 3. Generate HMAC keys |
| 67 | + |
| 68 | +LangSmith connects to GCS using the S3-compatible XML API, which requires HMAC keys rather than a service account JSON key. |
| 69 | + |
| 70 | +Generate HMAC keys for your service account: |
| 71 | + |
| 72 | +```bash |
| 73 | +gcloud storage hmac create \ |
| 74 | + langsmith-bulk-export@YOUR_PROJECT.iam.gserviceaccount.com |
| 75 | +``` |
| 76 | + |
| 77 | +Save the `accessId` and `secret` from the output. You can also generate HMAC keys in the GCP Console under **Cloud Storage → Settings → Interoperability → Create a key for a service account**. |
| 78 | + |
| 79 | +## 4. Create a bulk export destination |
| 80 | + |
| 81 | +Create a destination in LangSmith pointing to your GCS bucket. Set `endpoint_url` to `https://storage.googleapis.com` to use the GCS S3-compatible API. |
| 82 | + |
| 83 | +You will need your [LangSmith API key](/langsmith/create-account-api-key) and [workspace ID](/langsmith/set-up-hierarchy#set-up-a-workspace). |
| 84 | + |
| 85 | +```bash |
| 86 | +curl --request POST \ |
| 87 | + --url 'https://api.smith.langchain.com/api/v1/bulk-exports/destinations' \ |
| 88 | + --header 'Content-Type: application/json' \ |
| 89 | + --header 'X-API-Key: YOUR_API_KEY' \ |
| 90 | + --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \ |
| 91 | + --data '{ |
| 92 | + "destination_type": "s3", |
| 93 | + "display_name": "GCS for BigQuery", |
| 94 | + "config": { |
| 95 | + "bucket_name": "YOUR_BUCKET_NAME", |
| 96 | + "prefix": "YOUR_PREFIX", |
| 97 | + "endpoint_url": "https://storage.googleapis.com" |
| 98 | + }, |
| 99 | + "credentials": { |
| 100 | + "access_key_id": "YOUR_HMAC_ACCESS_ID", |
| 101 | + "secret_access_key": "YOUR_HMAC_SECRET" |
| 102 | + } |
| 103 | + }' |
| 104 | +``` |
| 105 | + |
| 106 | +`prefix` is a path within the bucket where LangSmith will write exported files. For example, `langsmith-exports` or `data/traces`. Choose any value that works for your bucket layout. |
| 107 | + |
| 108 | +LangSmith validates the credentials by performing a test write before saving the destination. If the request returns a `400` error, refer to [Debug destination errors](/langsmith/data-export-destinations#debug-destination-errors). |
| 109 | + |
| 110 | +Save the `id` from the response; you will need it in the next step. |
| 111 | + |
| 112 | +### Temporary validation file |
| 113 | + |
| 114 | +During destination creation (and [credential rotation](#credential-rotation)), LangSmith writes a temporary `.txt` file to `YOUR_PREFIX/tmp/` to verify write access, then attempts to delete it. The deletion is best-effort: if the service account lacks `storage.objects.delete`, the file is not deleted and the `tmp/` folder remains in your bucket. |
| 115 | + |
| 116 | +The `tmp/` folder does not affect exports, but it will be included in broad GCS URI globs (e.g., `gs://YOUR_BUCKET_NAME/YOUR_PREFIX/*`). |
| 117 | + |
| 118 | +## 5. Create a bulk export job |
| 119 | + |
| 120 | +Create an export targeting a specific project. Use `format_version: v2_beta` for BigQuery compatibility—it produces UTC timezone-aware timestamps that BigQuery handles correctly. |
| 121 | + |
| 122 | +You will need the project ID (`session_id`), which you can copy from the project view in the [**Tracing Projects** list](https://smith.langchain.com). |
| 123 | + |
| 124 | +**One-time export:** |
| 125 | + |
| 126 | +```bash |
| 127 | +curl --request POST \ |
| 128 | + --url 'https://api.smith.langchain.com/api/v1/bulk-exports' \ |
| 129 | + --header 'Content-Type: application/json' \ |
| 130 | + --header 'X-API-Key: YOUR_API_KEY' \ |
| 131 | + --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \ |
| 132 | + --data '{ |
| 133 | + "bulk_export_destination_id": "YOUR_DESTINATION_ID", |
| 134 | + "session_id": "YOUR_PROJECT_ID", |
| 135 | + "start_time": "2024-01-01T00:00:00Z", |
| 136 | + "end_time": "2024-02-01T00:00:00Z", |
| 137 | + "format_version": "v2_beta", |
| 138 | + "compression": "snappy" |
| 139 | + }' |
| 140 | +``` |
| 141 | + |
| 142 | +**Scheduled (recurring) export:** |
| 143 | + |
| 144 | +```bash |
| 145 | +curl --request POST \ |
| 146 | + --url 'https://api.smith.langchain.com/api/v1/bulk-exports' \ |
| 147 | + --header 'Content-Type: application/json' \ |
| 148 | + --header 'X-API-Key: YOUR_API_KEY' \ |
| 149 | + --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \ |
| 150 | + --data '{ |
| 151 | + "bulk_export_destination_id": "YOUR_DESTINATION_ID", |
| 152 | + "session_id": "YOUR_PROJECT_ID", |
| 153 | + "start_time": "2024-01-01T00:00:00Z", |
| 154 | + "interval_hours": 24, |
| 155 | + "format_version": "v2_beta", |
| 156 | + "compression": "snappy" |
| 157 | + }' |
| 158 | +``` |
| 159 | + |
| 160 | +Snappy compression is fast and widely supported by BigQuery. For all available options, refer to [Bulk export trace data](/langsmith/data-export#2-create-an-export-job), including field filtering and filter expressions. |
| 161 | + |
| 162 | +### Output file structure |
| 163 | + |
| 164 | +Exported files land in GCS using a Hive-partitioned path structure: |
| 165 | + |
| 166 | +``` |
| 167 | +gs://YOUR_BUCKET_NAME/YOUR_PREFIX/export_id=<uuid>/tenant_id=<uuid>/session_id=<uuid>/resource=runs/year=<year>/month=<month>/day=<day>/<filename>.parquet |
| 168 | +``` |
| 169 | + |
| 170 | +The partition columns in the path (`export_id`, `tenant_id`, `session_id`, `resource`, `year`, `month`, `day`) are available as queryable columns in BigQuery when Hive partition detection is enabled. |
| 171 | + |
| 172 | +## 6. Load data into BigQuery |
| 173 | + |
| 174 | +BigQuery offers two ways to access your exported data. Both require granting the BigQuery service account read access to your GCS bucket first. Choose based on your needs: |
| 175 | + |
| 176 | +- **External table:** data stays in GCS and BigQuery queries it in place. No storage costs in BigQuery, but query performance is slower than native storage. Refer to [Required roles](https://docs.cloud.google.com/bigquery/docs/query-cloud-storage-data#required-roles). |
| 177 | +- **Native table:** data is copied into BigQuery storage. Faster queries and full support for BigQuery features, but incurs BigQuery storage costs. Refer to [Required permissions](https://docs.cloud.google.com/bigquery/docs/cloud-storage-transfer#required_permissions). |
| 178 | + |
| 179 | +### Create the table |
| 180 | + |
| 181 | +<Tabs> |
| 182 | + <Tab title="External table"> |
| 183 | + An external table queries data directly from GCS without copying it into BigQuery. |
| 184 | + |
| 185 | + 1. In the BigQuery console, expand your project and dataset in the **Explorer** pane. |
| 186 | + 1. Click the dataset's **Actions** menu (three dots) and select **Create table**. |
| 187 | + 1. Under **Source**: |
| 188 | + - Set **Create table from** to **Google Cloud Storage**. |
| 189 | + - Set the file path to `gs://YOUR_BUCKET_NAME/YOUR_PREFIX/export_id=*`. Using `export_id=*` scopes BigQuery to Hive-partitioned export directories and excludes the `tmp/` folder that LangSmith writes during destination validation (see [Temporary validation file](#temporary-validation-file)). |
| 190 | + - Set **File format** to **Parquet**. |
| 191 | + 1. Check **Source data partitioning**, then: |
| 192 | + - Set **Source URI prefix** to `gs://YOUR_BUCKET_NAME/YOUR_PREFIX`. |
| 193 | + - Set **Partition inference mode** to **Automatically infer types**. |
| 194 | + 1. Under **Destination**: |
| 195 | + - Select your project and dataset. |
| 196 | + - Enter a table name, for example `langsmith_runs`. |
| 197 | + - Set **Table type** to **External table**. |
| 198 | + 1. Under **Schema**, enable **Auto-detect**. |
| 199 | + 1. Click **Create table**. |
| 200 | + |
| 201 | + The partition path columns (`export_id`, `tenant_id`, `session_id`, `resource`, `year`, `month`, `day`) are available as queryable columns. Filter on `year`, `month`, or `day` in your queries to enable partition pruning. |
| 202 | + </Tab> |
| 203 | + <Tab title="Native table"> |
| 204 | + A native table transfers the Parquet data into BigQuery storage for full query performance. |
| 205 | + |
| 206 | + 1. Go to the [Data Transfer page](https://console.cloud.google.com/bigquery/transfers) in the Google Cloud console and select **+ Create transfer**. |
| 207 | + 1. For **Source type**, select **Google Cloud Storage**. |
| 208 | + 1. Enter a **Transfer name**. You'll have access to edit the transfer at a point if necessary. |
| 209 | + 1. Select a **Schedule option**. If you do not want to repeat the export, you can select **On demand** and trigger the export manually. |
| 210 | + |
| 211 | + 1. In the BigQuery console, expand your project and dataset in the **Explorer** pane. |
| 212 | + 1. Click the dataset's **Actions** menu (three dots) and select **Create table**. |
| 213 | + 1. Under **Source**: |
| 214 | + - Set **Create table from** to **Google Cloud Storage**. |
| 215 | + - Set the file path to `gs://YOUR_BUCKET_NAME/YOUR_PREFIX/export_id=*`. Using `export_id=*` excludes the `tmp/` folder that LangSmith writes during destination validation (see [Temporary validation file](#temporary-validation-file)). |
| 216 | + - Set **File format** to **Parquet**. |
| 217 | + 1. Check **Source data partitioning**, then: |
| 218 | + - Set **Source URI prefix** to `gs://YOUR_BUCKET_NAME/YOUR_PREFIX`. |
| 219 | + - Set **Partition inference mode** to **Automatically infer types**. |
| 220 | + 1. Under **Destination**: |
| 221 | + - Select your project and dataset. |
| 222 | + - Enter a table name, for example `langsmith_runs`. |
| 223 | + - Set **Table type** to **Native table**. |
| 224 | + 1. Under **Advanced options**, set **Write preference** to **Write if empty** for a new table. |
| 225 | + 1. Click **Create table**. |
| 226 | + |
| 227 | + BigQuery runs a load job to copy the data. The Hive partition columns appear as regular columns in the table. For the full list of available data columns, see [Exportable fields](/langsmith/data-export#exportable-fields). |
| 228 | + </Tab> |
| 229 | +</Tabs> |
| 230 | + |
| 231 | +## Credential rotation |
| 232 | + |
| 233 | +To rotate your HMAC keys without interrupting active exports: |
| 234 | + |
| 235 | +1. **Generate new HMAC keys** in GCP for the same service account. |
| 236 | +2. **Call the PATCH endpoint** with the new credentials: |
| 237 | + |
| 238 | + ```bash |
| 239 | + curl --request PATCH \ |
| 240 | + --url 'https://api.smith.langchain.com/api/v1/bulk-exports/destinations/YOUR_DESTINATION_ID' \ |
| 241 | + --header 'Content-Type: application/json' \ |
| 242 | + --header 'X-API-Key: YOUR_API_KEY' \ |
| 243 | + --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \ |
| 244 | + --data '{ |
| 245 | + "credentials": { |
| 246 | + "access_key_id": "NEW_HMAC_ACCESS_ID", |
| 247 | + "secret_access_key": "NEW_HMAC_SECRET" |
| 248 | + } |
| 249 | + }' |
| 250 | + ``` |
| 251 | + |
| 252 | + LangSmith validates the new credentials with a test write before saving. A new `tmp/` file may appear in your bucket during this validation (see [Temporary validation file](#temporary-validation-file)). |
| 253 | + |
| 254 | +3. **Keep old HMAC keys active** until all in-flight export runs complete. Both credential sets are valid simultaneously during the transition window. |
| 255 | +4. **Delete the old HMAC keys** in GCP once you have confirmed no in-flight runs are using them. |
| 256 | + |
| 257 | +For full details, see [Rotate destination credentials](/langsmith/data-export-destinations#rotate-destination-credentials). |
| 258 | + |
| 259 | +## Troubleshooting |
| 260 | + |
| 261 | +| Symptom | Likely cause | Fix | |
| 262 | +|---------|--------------|-----| |
| 263 | +| `400 Access denied` on destination creation | HMAC credentials lack write permission | Verify the service account has `storage.objects.create` on the bucket | |
| 264 | +| `400 Key ID you provided does not exist` | HMAC access ID is invalid | Regenerate HMAC keys in GCP | |
| 265 | +| `400 Invalid endpoint` | Endpoint URL is malformed | Use exactly `https://storage.googleapis.com` | |
| 266 | +| BigQuery table shows no rows | Export not yet complete | Check export status with `GET /api/v1/bulk-exports/{export_id}` | |
| 267 | +| BigQuery partition pruning not working | Incorrect source URI prefix | Ensure the source URI prefix ends before the first partition key, e.g. `gs://BUCKET/PREFIX` | |
| 268 | +| BigQuery picks up `tmp/` files | Broad file path glob | Use `export_id=*` in your file path instead of `*` | |
| 269 | + |
| 270 | +For additional error codes and export status details, see [Monitor and troubleshoot bulk exports](/langsmith/data-export-monitor). |
0 commit comments