Skip to content

Commit 63adc78

Browse files
authored
feat(source/bigquery): add optional write mode config (googleapis#1157)
Summary Adds an optional write_mode configuration to the BigQuery source, enhancing security by controlling the types of SQL statements that can be executed to prevent unauthorized data modification. Key Changes Added writeMode Configuration: A new write_mode field is added to the BigQuery source, supporting three modes: allowed (Default): Permits all SQL statements. blocked: Allows only SELECT queries. protected: Enables session-based execution, restricting write operations (like CREATE TABLE) to the session's temporary dataset, thus protecting permanent datasets. Note: at the moment, this won't work with useClientOAuth, will fix this in the future. These restrictions primarily apply to the bigquery-execute-sql tool and the session may be used in other tools.
1 parent 2c4d73b commit 63adc78

14 files changed

Lines changed: 787 additions & 104 deletions

File tree

docs/en/resources/sources/bigquery.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,7 @@ sources:
119119
kind: "bigquery"
120120
project: "my-project-id"
121121
# location: "US" # Optional: Specifies the location for query jobs.
122+
# writeMode: "allowed" # One of: allowed, blocked, protected. Defaults to "allowed".
122123
# allowedDatasets: # Optional: Restricts tool access to a specific list of datasets.
123124
# - "my_dataset_1"
124125
# - "other_project.my_dataset_2"
@@ -133,6 +134,7 @@ sources:
133134
project: "my-project-id"
134135
useClientOAuth: true
135136
# location: "US" # Optional: Specifies the location for query jobs.
137+
# writeMode: "allowed" # One of: allowed, blocked, protected. Defaults to "allowed".
136138
# allowedDatasets: # Optional: Restricts tool access to a specific list of datasets.
137139
# - "my_dataset_1"
138140
# - "other_project.my_dataset_2"
@@ -145,5 +147,6 @@ sources:
145147
| kind | string | true | Must be "bigquery". |
146148
| project | string | true | Id of the Google Cloud project to use for billing and as the default project for BigQuery resources. |
147149
| location | string | false | Specifies the location (e.g., 'us', 'asia-northeast1') in which to run the query job. This location must match the location of any tables referenced in the query. Defaults to the table's location or 'US' if the location cannot be determined. [Learn More](https://cloud.google.com/bigquery/docs/locations) |
150+
| writeMode | string | false | Controls the write behavior for tools. `allowed` (default): All queries are permitted. `blocked`: Only `SELECT` statements are allowed for the `bigquery-execute-sql` tool. `protected`: Enables session-based execution where all tools associated with this source instance share the same [BigQuery session](https://cloud.google.com/bigquery/docs/sessions-intro). This allows for stateful operations using temporary tables (e.g., `CREATE TEMP TABLE`). For `bigquery-execute-sql`, `SELECT` statements can be used on all tables, but write operations are restricted to the session's temporary dataset. For tools like `bigquery-sql`, `bigquery-forecast`, and `bigquery-analyze-contribution`, the `writeMode` restrictions do not apply, but they will operate within the shared session. **Note:** The `protected` mode cannot be used with `useClientOAuth: true`. It is also not recommended for multi-user server environments, as all users would share the same session. A session is terminated automatically after 24 hours of inactivity or after 7 days, whichever comes first. A new session is created on the next request, and any temporary data from the previous session will be lost. |
148151
| allowedDatasets | []string | false | An optional list of dataset IDs that tools using this source are allowed to access. If provided, any tool operation attempting to access a dataset not in this list will be rejected. To enforce this, two types of operations are also disallowed: 1) Dataset-level operations (e.g., `CREATE SCHEMA`), and 2) operations where table access cannot be statically analyzed (e.g., `EXECUTE IMMEDIATE`, `CREATE PROCEDURE`). If a single dataset is provided, it will be treated as the default for prebuilt tools. |
149-
| useClientOAuth | bool | false | If true, forwards the client's OAuth access token from the "Authorization" header to downstream queries. |
152+
| useClientOAuth | bool | false | If true, forwards the client's OAuth access token from the "Authorization" header to downstream queries. **Note:** This cannot be used with `writeMode: protected`. |

docs/en/resources/tools/bigquery/bigquery-analyze-contribution.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,13 @@ It's compatible with the following sources:
3939
insights. Can be `'NO_PRUNING'` or `'PRUNE_REDUNDANT_INSIGHTS'`. Defaults to
4040
`'PRUNE_REDUNDANT_INSIGHTS'`.
4141

42+
The behavior of this tool is influenced by the `writeMode` setting on its `bigquery` source:
43+
44+
- **`allowed` (default) and `blocked`:** These modes do not impose any special restrictions on the `bigquery-analyze-contribution` tool.
45+
- **`protected`:** This mode enables session-based execution. The tool will operate within the same BigQuery session as other
46+
tools using the same source. This allows the `input_data` parameter to be a query that references temporary resources (e.g.,
47+
`TEMP` tables) created within that session.
48+
4249

4350
## Example
4451

docs/en/resources/tools/bigquery/bigquery-execute-sql.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,15 @@ It's compatible with the following sources:
2020
- **`dry_run`** (optional): If set to `true`, the query is validated but not run,
2121
returning information about the execution instead. Defaults to `false`.
2222

23+
The behavior of this tool is influenced by the `writeMode` setting on its `bigquery` source:
24+
25+
- **`allowed` (default):** All SQL statements are permitted.
26+
- **`blocked`:** Only `SELECT` statements are allowed. Any other type of statement (e.g., `INSERT`, `UPDATE`, `CREATE`) will be rejected.
27+
- **`protected`:** This mode enables session-based execution. `SELECT` statements can be used on all tables, while write operations are allowed only for the session's temporary dataset (e.g., `CREATE TEMP TABLE ...`). This prevents modifications to permanent datasets while allowing stateful, multi-step operations within a secure session.
28+
2329
The tool's behavior is influenced by the `allowedDatasets` restriction on the
24-
`bigquery` source:
30+
`bigquery` source. Similar to `writeMode`, this setting provides an additional layer of security by controlling which datasets can be accessed:
31+
2532
- **Without `allowedDatasets` restriction:** The tool can execute any valid GoogleSQL
2633
query.
2734
- **With `allowedDatasets` restriction:** Before execution, the tool performs a dry run
@@ -33,6 +40,8 @@ The tool's behavior is influenced by the `allowedDatasets` restriction on the
3340
- **Unanalyzable operations** where the accessed tables cannot be determined
3441
statically (e.g., `EXECUTE IMMEDIATE`, `CREATE PROCEDURE`, `CALL`).
3542

43+
> **Note:** This tool is intended for developer assistant workflows with human-in-the-loop and shouldn't be used for production agents.
44+
3645
## Example
3746

3847
```yaml

docs/en/resources/tools/bigquery/bigquery-forecast.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,19 @@ query based on the provided parameters:
3333
- **horizon** (integer, optional): The number of future time steps you want to
3434
predict. It defaults to 10 if not specified.
3535

36-
The tool's behavior regarding these parameters is influenced by the `allowedDatasets` restriction on the `bigquery` source:
36+
The behavior of this tool is influenced by the `writeMode` setting on its `bigquery` source:
37+
38+
- **`allowed` (default) and `blocked`:** These modes do not impose any special restrictions on the `bigquery-forecast` tool.
39+
- **`protected`:** This mode enables session-based execution. The tool will operate within the same BigQuery session as other
40+
tools using the same source. This allows the `history_data` parameter to be a query that references temporary resources (e.g.,
41+
`TEMP` tables) created within that session.
42+
43+
The tool's behavior is also influenced by the `allowedDatasets` restriction on the `bigquery` source:
44+
3745
- **Without `allowedDatasets` restriction:** The tool can use any table or query for the `history_data` parameter.
38-
- **With `allowedDatasets` restriction:** The tool verifies that the `history_data` parameter only accesses tables
39-
within the allowed datasets. If `history_data` is a table ID, the tool checks if the table's dataset is in the
40-
allowed list. If `history_data` is a query, the tool performs a dry run to analyze the query and rejects it
41-
if it accesses any table outside the allowed list.
46+
- **With `allowedDatasets` restriction:** The tool verifies that the `history_data` parameter only accesses tables within the allowed datasets.
47+
- If `history_data` is a table ID, the tool checks if the table's dataset is in the allowed list.
48+
- If `history_data` is a query, the tool performs a dry run to analyze the query and rejects it if it accesses any table outside the allowed list.
4249

4350
## Example
4451

docs/en/resources/tools/bigquery/bigquery-sql.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,11 @@ the following sources:
1515

1616
- [bigquery](../../sources/bigquery.md)
1717

18+
The behavior of this tool is influenced by the `writeMode` setting on its `bigquery` source:
19+
20+
- **`allowed` (default) and `blocked`:** These modes do not impose any restrictions on the `bigquery-sql` tool. The pre-defined SQL statement will be executed as-is.
21+
- **`protected`:** This mode enables session-based execution. The tool will operate within the same BigQuery session as other tools using the same source, allowing it to interact with temporary resources like `TEMP` tables created within that session.
22+
1823
### GoogleSQL
1924

2025
BigQuery uses [GoogleSQL][bigquery-googlesql] for querying data. The integration

internal/sources/bigquery/bigquery.go

Lines changed: 137 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ import (
2020
"net/http"
2121
"strings"
2222
"sync"
23+
"time"
2324

2425
bigqueryapi "cloud.google.com/go/bigquery"
2526
dataplexapi "cloud.google.com/go/dataplex/apiv1"
@@ -36,11 +37,22 @@ import (
3637

3738
const SourceKind string = "bigquery"
3839

40+
const (
41+
// No write operations are allowed.
42+
WriteModeBlocked string = "blocked"
43+
// Only protected write operations are allowed in a BigQuery session.
44+
WriteModeProtected string = "protected"
45+
// All write operations are allowed.
46+
WriteModeAllowed string = "allowed"
47+
)
48+
3949
// validate interface
4050
var _ sources.SourceConfig = Config{}
4151

4252
type BigqueryClientCreator func(tokenString string, wantRestService bool) (*bigqueryapi.Client, *bigqueryrestapi.Service, error)
4353

54+
type BigQuerySessionProvider func(ctx context.Context) (*Session, error)
55+
4456
type DataplexClientCreator func(tokenString string) (*dataplexapi.CatalogClient, error)
4557

4658
func init() {
@@ -63,6 +75,7 @@ type Config struct {
6375
Kind string `yaml:"kind" validate:"required"`
6476
Project string `yaml:"project" validate:"required"`
6577
Location string `yaml:"location"`
78+
WriteMode string `yaml:"writeMode"`
6679
AllowedDatasets []string `yaml:"allowedDatasets"`
6780
UseClientOAuth bool `yaml:"useClientOAuth"`
6881
}
@@ -73,6 +86,14 @@ func (r Config) SourceConfigKind() string {
7386
}
7487

7588
func (r Config) Initialize(ctx context.Context, tracer trace.Tracer) (sources.Source, error) {
89+
if r.WriteMode == "" {
90+
r.WriteMode = WriteModeAllowed
91+
}
92+
93+
if r.WriteMode == WriteModeProtected && r.UseClientOAuth {
94+
return nil, fmt.Errorf("writeMode 'protected' cannot be used with useClientOAuth 'true'")
95+
}
96+
7697
var client *bigqueryapi.Client
7798
var restService *bigqueryrestapi.Service
7899
var tokenSource oauth2.TokenSource
@@ -133,9 +154,15 @@ func (r Config) Initialize(ctx context.Context, tracer trace.Tracer) (sources.So
133154
TokenSource: tokenSource,
134155
MaxQueryResultRows: 50,
135156
ClientCreator: clientCreator,
157+
WriteMode: r.WriteMode,
136158
AllowedDatasets: allowedDatasets,
137159
UseClientOAuth: r.UseClientOAuth,
138160
}
161+
s.SessionProvider = s.newBigQuerySessionProvider()
162+
163+
if r.WriteMode != WriteModeAllowed && r.WriteMode != WriteModeBlocked && r.WriteMode != WriteModeProtected {
164+
return nil, fmt.Errorf("invalid writeMode %q: must be one of %q, %q, or %q", r.WriteMode, WriteModeAllowed, WriteModeProtected, WriteModeBlocked)
165+
}
139166
s.makeDataplexCatalogClient = s.lazyInitDataplexClient(ctx, tracer)
140167
return s, nil
141168

@@ -156,7 +183,19 @@ type Source struct {
156183
ClientCreator BigqueryClientCreator
157184
AllowedDatasets map[string]struct{}
158185
UseClientOAuth bool
186+
WriteMode string
187+
sessionMutex sync.Mutex
159188
makeDataplexCatalogClient func() (*dataplexapi.CatalogClient, DataplexClientCreator, error)
189+
SessionProvider BigQuerySessionProvider
190+
Session *Session
191+
}
192+
193+
type Session struct {
194+
ID string
195+
ProjectID string
196+
DatasetID string
197+
CreationTime time.Time
198+
LastUsed time.Time
160199
}
161200

162201
func (s *Source) SourceKind() string {
@@ -172,6 +211,103 @@ func (s *Source) BigQueryRestService() *bigqueryrestapi.Service {
172211
return s.RestService
173212
}
174213

214+
func (s *Source) BigQueryWriteMode() string {
215+
return s.WriteMode
216+
}
217+
218+
func (s *Source) BigQuerySession() BigQuerySessionProvider {
219+
return s.SessionProvider
220+
}
221+
222+
func (s *Source) newBigQuerySessionProvider() BigQuerySessionProvider {
223+
return func(ctx context.Context) (*Session, error) {
224+
if s.WriteMode != WriteModeProtected {
225+
return nil, nil
226+
}
227+
228+
s.sessionMutex.Lock()
229+
defer s.sessionMutex.Unlock()
230+
231+
logger, err := util.LoggerFromContext(ctx)
232+
if err != nil {
233+
return nil, fmt.Errorf("failed to get logger from context: %w", err)
234+
}
235+
236+
if s.Session != nil {
237+
// Absolute 7-day lifetime check.
238+
const sessionMaxLifetime = 7 * 24 * time.Hour
239+
// This assumes a single task will not exceed 30 minutes, preventing it from failing mid-execution.
240+
const refreshThreshold = 30 * time.Minute
241+
if time.Since(s.Session.CreationTime) > (sessionMaxLifetime - refreshThreshold) {
242+
logger.DebugContext(ctx, "Session is approaching its 7-day maximum lifetime. Creating a new one.")
243+
} else {
244+
job := &bigqueryrestapi.Job{
245+
Configuration: &bigqueryrestapi.JobConfiguration{
246+
DryRun: true,
247+
Query: &bigqueryrestapi.JobConfigurationQuery{
248+
Query: "SELECT 1",
249+
UseLegacySql: new(bool),
250+
ConnectionProperties: []*bigqueryrestapi.ConnectionProperty{{Key: "session_id", Value: s.Session.ID}},
251+
},
252+
},
253+
}
254+
_, err := s.RestService.Jobs.Insert(s.Project, job).Do()
255+
if err == nil {
256+
s.Session.LastUsed = time.Now()
257+
return s.Session, nil
258+
}
259+
logger.DebugContext(ctx, "Session validation failed (likely expired), creating a new one.", "error", err)
260+
}
261+
}
262+
263+
// Create a new session if one doesn't exist, it has passed its 7-day lifetime,
264+
// or it failed the validation dry run.
265+
266+
creationTime := time.Now()
267+
job := &bigqueryrestapi.Job{
268+
JobReference: &bigqueryrestapi.JobReference{
269+
ProjectId: s.Project,
270+
Location: s.Location,
271+
},
272+
Configuration: &bigqueryrestapi.JobConfiguration{
273+
DryRun: true,
274+
Query: &bigqueryrestapi.JobConfigurationQuery{
275+
Query: "SELECT 1",
276+
CreateSession: true,
277+
},
278+
},
279+
}
280+
281+
createdJob, err := s.RestService.Jobs.Insert(s.Project, job).Do()
282+
if err != nil {
283+
return nil, fmt.Errorf("failed to create new session: %w", err)
284+
}
285+
286+
var sessionID, sessionDatasetID, projectID string
287+
if createdJob.Status != nil && createdJob.Statistics.SessionInfo != nil {
288+
sessionID = createdJob.Statistics.SessionInfo.SessionId
289+
} else {
290+
return nil, fmt.Errorf("failed to get session ID from new session job")
291+
}
292+
293+
if createdJob.Configuration != nil && createdJob.Configuration.Query != nil && createdJob.Configuration.Query.DestinationTable != nil {
294+
sessionDatasetID = createdJob.Configuration.Query.DestinationTable.DatasetId
295+
projectID = createdJob.Configuration.Query.DestinationTable.ProjectId
296+
} else {
297+
return nil, fmt.Errorf("failed to get session dataset ID from new session job")
298+
}
299+
300+
s.Session = &Session{
301+
ID: sessionID,
302+
ProjectID: projectID,
303+
DatasetID: sessionDatasetID,
304+
CreationTime: creationTime,
305+
LastUsed: creationTime,
306+
}
307+
return s.Session, nil
308+
}
309+
}
310+
175311
func (s *Source) UseClientAuthorization() bool {
176312
return s.UseClientOAuth
177313
}
@@ -257,7 +393,7 @@ func initBigQueryConnection(
257393
ctx, span := sources.InitConnectionSpan(ctx, tracer, SourceKind, name)
258394
defer span.End()
259395

260-
cred, err := google.FindDefaultCredentials(ctx, bigqueryapi.Scope)
396+
cred, err := google.FindDefaultCredentials(ctx, "https://www.googleapis.com/auth/cloud-platform")
261397
if err != nil {
262398
return nil, nil, nil, fmt.Errorf("failed to find default Google Cloud credentials with scope %q: %w", bigqueryapi.Scope, err)
263399
}

internal/sources/bigquery/bigquery_test.go

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,14 +37,34 @@ func TestParseFromYamlBigQuery(t *testing.T) {
3737
my-instance:
3838
kind: bigquery
3939
project: my-project
40-
location: us
40+
`,
41+
want: server.SourceConfigs{
42+
"my-instance": bigquery.Config{
43+
Name: "my-instance",
44+
Kind: bigquery.SourceKind,
45+
Project: "my-project",
46+
Location: "",
47+
WriteMode: "",
48+
},
49+
},
50+
},
51+
{
52+
desc: "all fields specified",
53+
in: `
54+
sources:
55+
my-instance:
56+
kind: bigquery
57+
project: my-project
58+
location: asia
59+
writeMode: blocked
4160
`,
4261
want: server.SourceConfigs{
4362
"my-instance": bigquery.Config{
4463
Name: "my-instance",
4564
Kind: bigquery.SourceKind,
4665
Project: "my-project",
47-
Location: "us",
66+
Location: "asia",
67+
WriteMode: "blocked",
4868
UseClientOAuth: false,
4969
},
5070
},

0 commit comments

Comments
 (0)