You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| `databricks_aws_region` | Optional. The AWS region for the S3 object store. E.g. `us-west-2`. |
78
+
| `databricks_aws_access_key_id` | The access key ID for the S3 object store. |
79
+
| `databricks_aws_secret_access_key` | The secret access key for the S3 object store. |
80
+
| `databricks_aws_endpoint` | Optional. The endpoint for the S3 object store. E.g. `s3.us-west-2.amazonaws.com`. |
81
+
| `databricks_aws_allow_http` | Optional. Enables insecure HTTP connections to `databricks_aws_endpoint`. Defaults to `false`. |
81
82
82
83
### Azure Blob
83
84
@@ -208,15 +209,15 @@ Spice integrates with multiple secret stores to help manage sensitive data secur
208
209
209
210
- When using `mode: spark_connect`, correlated scalar subqueries can only be used in filters, aggregations, projections, and UPDATE/MERGE/DELETE commands. [Spark Docs](https://spark.apache.org/docs/latest/sql-error-conditions-unsupported-subquery-expression-category-error-class.html#unsupported_correlated_scalar_subquery)
210
211
211
-
:::warning[Memory Considerations]
212
+
:::warning[Memory Considerations]
212
213
213
-
When using the Databricks (mode: delta_lake) Data connector without acceleration, data is loaded into memory during query execution. Ensure sufficient memory is available, including overhead for queries and the runtime, especially with concurrent queries.
214
+
When using the Databricks (mode: delta_lake) Data connector without acceleration, data is loaded into memory during query execution. Ensure sufficient memory is available, including overhead for queries and the runtime, especially with concurrent queries.
214
215
215
-
Memory limitations can be mitigated by storing acceleration data on disk, which is supported by [`duckdb`](../data-accelerators/duckdb.md) and [`sqlite`](../data-accelerators/sqlite.md) accelerators by specifying `mode: file`.
216
+
Memory limitations can be mitigated by storing acceleration data on disk, which is supported by [`duckdb`](../data-accelerators/duckdb.md) and [`sqlite`](../data-accelerators/sqlite.md) accelerators by specifying `mode: file`.
216
217
217
218
- The Databricks Connector (`mode: spark_connect`) does not yet support streaming query results from Spark.
description: 'Instructions for using Databricks Mosaic AI Models'
4
+
sidebar_label: 'Databricks'
5
+
sidebar_position: 8
6
+
---
7
+
8
+
To use a language model deployed to [Databricks Mosaic AI Model Serving](https://docs.databricks.com/aws/en/machine-learning/model-serving/), specify the model endpoint name prefixed with `databricks:` in the `from` field and include the required parameters in the `params` section.
|`databricks_endpoint`| The Databricks workspace endpoint, e.g., `dbc-a12cd3e4-56f7.cloud.databricks.com`. |
15
+
|`databricks_token`| The Databricks API token to authenticate with the Unity Catalog API. Use the [secret replacement syntax](../secret-stores/index.md) to reference a secret, e.g., `${secrets:my_databricks_token}`. |
Refer to the [Moasic AI Model Serving documentation](https://docs.databricks.com/aws/en/machine-learning/model-serving/) for more details on available models and configurations.
|[`databricks`][databricks]| Models deployed to Databricks Mosaic AI | Alpha | - | OpenAI-compatible HTTP endpoint |
19
20
20
21
[file]: /components/embeddings/local.md
21
22
[hf]: ./huggingface.md
@@ -24,6 +25,7 @@ Spice supports various model providers for traditional machine learning (ML) mod
24
25
[azure]: ./azure.md
25
26
[ant]: ./anthropic.md
26
27
[xai]: ./xai.md
28
+
[databricks]: ./databricks.md
27
29
28
30
Spice also tests and evaluates common models and grades their ability to integrate with Spice. See the [Models Grade Report](/docs/reference/models.md).
description: 'Detailed documentation for workers in the Spice runtime.'
4
+
sidebar_label: 'Workers Overview'
5
+
sidebar_position: 8
6
+
---
7
+
8
+
Workers in the Spice runtime represent configurable units of compute that help coordinate and manage interactions between models and tools. Each worker is defined as a component in the `spicepod.yaml` file, specifying its behavior and interaction logic.
9
+
10
+
## Configuration
11
+
12
+
Workers are configured in the `workers` section of the `spicepod.yaml` file. Each worker definition includes a name, description, and a list of models or tools it encapsulates.
13
+
14
+
**Example `spicepod.yaml` configuration:**
15
+
16
+
```yaml
17
+
workers:
18
+
- name: round-robin
19
+
description: |
20
+
Distributes requests between 'foo' and 'bar' models in a round-robin fashion.
21
+
models:
22
+
- from: foo
23
+
- from: bar
24
+
- name: fallback
25
+
description: |
26
+
Attempts 'bar' first, then 'foo', then 'baz' if previous models fail.
27
+
models:
28
+
- from: foo
29
+
order: 2
30
+
- from: bar
31
+
order: 1
32
+
- from: baz
33
+
order: 3
34
+
```
35
+
36
+
## Use-Cases
37
+
38
+
Workers currently help implement:
39
+
40
+
- Model fallback and error handling
41
+
- Load balancing across multiple models
42
+
43
+
## Usage
44
+
45
+
Workers can be invoked using the same API endpoints as individual models. For example, to call a worker named `fallback` using the OpenAI-compatible HTTP API:
46
+
47
+
```bash
48
+
curl http://localhost:8090/v1/chat/completions \
49
+
-H "Content-Type: application/json" \
50
+
-d '{
51
+
"model": "fallback",
52
+
"messages": [{ "role": "user", "content": "Tell me a joke"}]
53
+
}'
54
+
```
55
+
56
+
## Roadmap
57
+
58
+
The vision for workers includes support for dynamic serverless compute, enabling execution of user-defined functions within the Spice runtime. This direction aims to help developers define custom logic and orchestration patterns directly in the worker configuration, supporting more advanced workflows and automation. Further details and implementation timelines will be provided in future updates. For ongoing progress, refer to the project repository and documentation.
59
+
60
+
## Further Reading
61
+
62
+
For a complete specification of worker configuration, routing rules, and available options, refer to the [Spicepod Workers Reference](/docs/reference/spicepod/workers.md).
Spice.ai uses DataFusion as its query execution engine. By default, DataFusion does not enforce strict memory limits, which can lead to unbounded usage. Spice.ai addresses this through:
37
37
38
+
-**Memory Limit**: The `runtime.memory_limit` parameter defines the maximum memory available for query execution. Once the memory limit is reached, supported query operations spill data to disk, helping prevent out-of-memory errors and maintain query stability. See [Spicepod Configuration](spicepod/index.md#memory-limit) for details.
38
39
-**Memory Budgeting**: Limits memory per query execution. Queries exceeding the limit return an error. See [Spicepod Configuration](spicepod/index.md) for details.
39
40
-**Spill-to-Disk**: Operators such as Sort, Join, and GroupByHash spill intermediate results to disk when memory limits are exceeded, preventing out-of-memory errors.
40
41
42
+
DataFusion supports spilling for several operators, but not all operations are currently supported. Notably, the following operations do not support spilling:
- ExternalSorterMerge (no current tracking issue; previously discussed in the context of SortMergeJoin)
46
+
- RepartitionMerge (spilling is suggested to be supported, but may depend on HashJoin support; see [issue](https://github.com/apache/arrow-datafusion/issues/1047))
47
+
41
48
## Embedded Data Accelerators
42
49
43
50
Spice.ai integrates with embedded accelerators like [SQLite](/docs/components/data-accelerators/sqlite.md) and [DuckDB](/docs/components/data-accelerators/duckdb.md), each with unique memory considerations:
Copy file name to clipboardExpand all lines: website/docs/reference/spicepod/runtime.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -169,6 +169,19 @@ runtime:
169
169
170
170
This configuration permits requests only from the `https://example.com` origin.
171
171
172
+
## `runtime.memory_limit`
173
+
174
+
The `memory_limit` parameter sets a memory usage cap for the Spice runtime query engine. This limit applies **only** to the query engine and should be used in addition to other memory configuration options, such as `duckdb_memory_limit`. When `memory_limit` is specified, the value of `runtime.temp_directory` determines the directory DataFusion uses for spilling intermediate data to disk.
175
+
176
+
```yaml
177
+
runtime:
178
+
memory_limit: 4GiB
179
+
```
180
+
181
+
Specify the value as a size, for example `4GiB` or `1024MiB`.
182
+
183
+
For detailed memory information, see [Memory](/docs/reference/memory.md).
184
+
172
185
## `runtime.temp_directory`
173
186
174
187
The path to a temporary directory that Spice uses for query and acceleration operations that spill to disk. For more details, see the [Managing Memory Usage documentation](../memory.md) and the [DuckDB Data Accelerator documentation](../../components/data-accelerators/duckdb.md).
Workers in the Spice runtime represent configurable units of compute that help coordinate and manage interactions between models and tools. Currently, workers define how one or more [llms](../models.md) can be combined into a logically single model.
8
+
9
+
## `workers`
10
+
11
+
The `workers` section in your configuration specifies one or more workers.
12
+
13
+
Example:
14
+
15
+
```yaml
16
+
workers:
17
+
- name: round-robin
18
+
description: |
19
+
Distributes requests between 'foo' and 'bar' models in a round-robin fashion.
20
+
models:
21
+
- from: foo
22
+
- from: bar
23
+
- name: fallback
24
+
description: |
25
+
Attempts 'bar' first, then 'foo', then 'baz' if previous models fail.
26
+
models:
27
+
- from: foo
28
+
order: 2
29
+
- from: bar
30
+
order: 1
31
+
- from: baz
32
+
order: 3
33
+
- name: weighted
34
+
description: |
35
+
Routes 80% of traffic to 'foo'.
36
+
models:
37
+
- from: foo
38
+
order: 4
39
+
- from: bar
40
+
order: 1
41
+
```
42
+
43
+
### `name`
44
+
45
+
A unique identifier for this worker component.
46
+
47
+
### `description`
48
+
49
+
Additional details about the worker, useful for displaying to users and providing to LLM context.
50
+
51
+
### `models` {#models}
52
+
53
+
A list of model configurations that define how the model worker behaves.
54
+
55
+
The elements' structure uniquely determine the model worker algorithm. List elements should be of consistent type.
| from | String | The `model.name` of a defined `model` spicepod component. |
60
+
| order | Integer, positive | The priority of the model in order. The lowest value is used first, followed by increasing order. The ordering of models with equal `order` is undefined. |
61
+
62
+
#### Worker with round-robin routing across models
63
+
64
+
Example
65
+
66
+
```yaml
67
+
workers:
68
+
- name: round-robin
69
+
description: |
70
+
Call models 'foo' & 'bar' in round robin.
71
+
models:
72
+
- from: foo
73
+
- from: bar
74
+
```
75
+
76
+
The worker selects each model in turn for subsequent requests.
77
+
78
+
#### Worker with fallback model routing
79
+
80
+
Example
81
+
82
+
```yaml
83
+
workers:
84
+
- name: fallback
85
+
description: |
86
+
Call 'bar'. On error, call 'foo'. Failing that 'baz'.
87
+
models:
88
+
- from: foo
89
+
order: 2
90
+
- from: bar
91
+
order: 1
92
+
- from: baz
93
+
order: 3
94
+
```
95
+
96
+
The worker uses the models in increasing order, returning the first result that is not an error.
0 commit comments