Skip to content

Commit 4373cbf

Browse files
authored
feat: Add Couchbase Columnar as an Offline Store (feast-dev#5025)
* Add Couchbase Columnar and Sync Deps Signed-off-by: Elliot Scribner <[email protected]> * Couchbase Columnar Offline Store Signed-off-by: Elliot Scribner <[email protected]> * Testing Config Signed-off-by: Elliot Scribner <[email protected]> * Warnings for Experimental Store Signed-off-by: Elliot Scribner <[email protected]> * Initial Template Signed-off-by: Elliot Scribner <[email protected]> * Temp Timeout Fix and Lint Signed-off-by: Elliot Scribner <[email protected]> * Initial Docs Signed-off-by: Elliot Scribner <[email protected]> * Fixing Template Signed-off-by: Elliot Scribner <[email protected]> * Protos Signed-off-by: Elliot Scribner <[email protected]> * Make build-sphinx Signed-off-by: Elliot Scribner <[email protected]> * Add info on Columnar setup to docs Signed-off-by: Elliot Scribner <[email protected]> * Docs Adjustment Signed-off-by: Elliot Scribner <[email protected]> * Lint Fix Signed-off-by: Elliot Scribner <[email protected]> * Dispatch Timeouts Signed-off-by: Elliot Scribner <[email protected]> * Cleanup Steps for Test Resources Signed-off-by: Elliot Scribner <[email protected]> * Remove unneccesary cleanup util file Signed-off-by: Elliot Scribner <[email protected]> * Refactor `couchbase` to `couchbase.offline` Signed-off-by: Elliot Scribner <[email protected]> * Refactor `couchbase` online literal to `couchbase.online` Signed-off-by: Elliot Scribner <[email protected]> * Test cleanup Signed-off-by: Elliot Scribner <[email protected]> * Added `couchbase.offline` to operator types Signed-off-by: Elliot Scribner <[email protected]> * Add Couchbase to Roadmap for Offline, Online, and Data Source Signed-off-by: Elliot Scribner <[email protected]> * Add `couchbase-columnar` to `pyproject.toml` Signed-off-by: Elliot Scribner <[email protected]> --------- Signed-off-by: Elliot Scribner <[email protected]>
1 parent 40ea7a9 commit 4373cbf

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+2153
-73
lines changed

Makefile

+28-1
Original file line numberDiff line numberDiff line change
@@ -402,9 +402,36 @@ test-python-universal-qdrant-online:
402402
-k "test_retrieve_online_documents" \
403403
sdk/python/tests/integration/online_store/test_universal_online.py
404404

405+
# To use Couchbase as an offline store, you need to create an Couchbase Capella Columnar cluster on cloud.couchbase.com.
406+
# Modify environment variables COUCHBASE_COLUMNAR_CONNECTION_STRING, COUCHBASE_COLUMNAR_USER, and COUCHBASE_COLUMNAR_PASSWORD
407+
# with the details from your Couchbase Columnar Cluster.
408+
test-python-universal-couchbase-offline:
409+
PYTHONPATH='.' \
410+
FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.offline_stores.contrib.couchbase_columnar_repo_configuration \
411+
PYTEST_PLUGINS=feast.infra.offline_stores.contrib.couchbase_offline_store.tests \
412+
COUCHBASE_COLUMNAR_CONNECTION_STRING=couchbases://<connection_string> \
413+
COUCHBASE_COLUMNAR_USER=username \
414+
COUCHBASE_COLUMNAR_PASSWORD=password \
415+
python -m pytest -n 8 --integration \
416+
-k "not test_historical_retrieval_with_validation and \
417+
not test_historical_features_persisting and \
418+
not test_universal_cli and \
419+
not test_go_feature_server and \
420+
not test_feature_logging and \
421+
not test_reorder_columns and \
422+
not test_logged_features_validation and \
423+
not test_lambda_materialization_consistency and \
424+
not test_offline_write and \
425+
not test_push_features_to_offline_store and \
426+
not gcs_registry and \
427+
not s3_registry and \
428+
not test_snowflake and \
429+
not test_universal_types" \
430+
sdk/python/tests
431+
405432
test-python-universal-couchbase-online:
406433
PYTHONPATH='.' \
407-
FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.online_stores.contrib.couchbase_repo_configuration \
434+
FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.online_stores.couchbase_online_store.couchbase_repo_configuration \
408435
PYTEST_PLUGINS=sdk.python.tests.integration.feature_repos.universal.online_store.couchbase \
409436
python -m pytest -n 8 --integration \
410437
-k "not test_universal_cli and \

README.md

+3
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,7 @@ The list below contains the functionality that contributors are planning to deve
163163
* [x] [Hive (community plugin)](https://github.com/baineng/feast-hive)
164164
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/data-sources/postgres)
165165
* [x] [Spark (contrib plugin)](https://docs.feast.dev/reference/data-sources/spark)
166+
* [x] [Couchbase (contrib plugin)](https://docs.feast.dev/reference/data-sources/couchbase)
166167
* [x] Kafka / Kinesis sources (via [push support into the online store](https://docs.feast.dev/reference/data-sources/push))
167168
* **Offline Stores**
168169
* [x] [Snowflake](https://docs.feast.dev/reference/offline-stores/snowflake)
@@ -173,6 +174,7 @@ The list below contains the functionality that contributors are planning to deve
173174
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/offline-stores/postgres)
174175
* [x] [Trino (contrib plugin)](https://github.com/Shopify/feast-trino)
175176
* [x] [Spark (contrib plugin)](https://docs.feast.dev/reference/offline-stores/spark)
177+
* [x] [Couchbase (contrib plugin)](https://docs.feast.dev/reference/offline-stores/couchbase)
176178
* [x] [In-memory / Pandas](https://docs.feast.dev/reference/offline-stores/file)
177179
* [x] [Custom offline store support](https://docs.feast.dev/how-to-guides/customizing-feast/adding-a-new-offline-store)
178180
* **Online Stores**
@@ -188,6 +190,7 @@ The list below contains the functionality that contributors are planning to deve
188190
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/online-stores/postgres)
189191
* [x] [Cassandra / AstraDB (contrib plugin)](https://docs.feast.dev/reference/online-stores/cassandra)
190192
* [x] [ScyllaDB (contrib plugin)](https://docs.feast.dev/reference/online-stores/scylladb)
193+
* [x] [Couchbase (contrib plugin)](https://docs.feast.dev/reference/online-stores/couchbase)
191194
* [x] [Custom online store support](https://docs.feast.dev/how-to-guides/customizing-feast/adding-support-for-a-new-online-store)
192195
* **Feature Engineering**
193196
* [x] On-demand Transformations (On Read) (Beta release. See [RFC](https://docs.google.com/document/d/1lgfIw0Drc65LpaxbUu49RCeJgMew547meSJttnUqz7c/edit#))

docs/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@
9494
* [BigQuery](reference/offline-stores/bigquery.md)
9595
* [Redshift](reference/offline-stores/redshift.md)
9696
* [DuckDB](reference/offline-stores/duckdb.md)
97+
* [Couchbase Columnar (contrib)](reference/offline-stores/couchbase.md)
9798
* [Spark (contrib)](reference/offline-stores/spark.md)
9899
* [PostgreSQL (contrib)](reference/offline-stores/postgres.md)
99100
* [Trino (contrib)](reference/offline-stores/trino.md)

docs/reference/data-sources/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@ Please see [Data Source](../../getting-started/concepts/data-ingestion.md) for a
3434
[kinesis.md](kinesis.md)
3535
{% endcontent-ref %}
3636

37+
{% content-ref url="couchbase.md" %}
38+
[couchbase.md](couchbase.md)
39+
{% endcontent-ref %}
40+
3741
{% content-ref url="spark.md" %}
3842
[spark.md](spark.md)
3943
{% endcontent-ref %}
+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Couchbase Columnar source (contrib)
2+
3+
## Description
4+
5+
Couchbase Columnar data sources are [Couchbase Capella Columnar](https://docs.couchbase.com/columnar/intro/intro.html) collections that can be used as a source for feature data. **Note that Couchbase Columnar is available through [Couchbase Capella](https://cloud.couchbase.com/).**
6+
7+
## Disclaimer
8+
9+
The Couchbase Columnar data source does not achieve full test coverage.
10+
Please do not assume complete stability.
11+
12+
## Examples
13+
14+
Defining a Couchbase Columnar source:
15+
16+
```python
17+
from feast.infra.offline_stores.contrib.couchbase_offline_store.couchbase_source import (
18+
CouchbaseColumnarSource,
19+
)
20+
21+
driver_stats_source = CouchbaseColumnarSource(
22+
name="driver_hourly_stats_source",
23+
query="SELECT * FROM Default.Default.`feast_driver_hourly_stats`",
24+
database="Default",
25+
scope="Default",
26+
collection="feast_driver_hourly_stats",
27+
timestamp_field="event_timestamp",
28+
created_timestamp_column="created",
29+
)
30+
```
31+
32+
The full set of configuration options is available [here](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.contrib.couchbase_offline_store.couchbase_source.CouchbaseColumnarSource).
33+
34+
## Supported Types
35+
36+
Couchbase Capella Columnar data sources support `BOOLEAN`, `STRING`, `BIGINT`, and `DOUBLE` primitive types.
37+
For a comparison against other batch data sources, please see [here](overview.md#functionality-matrix).

docs/reference/data-sources/overview.md

+11-11
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,14 @@ Details for each specific data source can be found [here](README.md).
1818

1919
Below is a matrix indicating which data sources support which types.
2020

21-
| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino |
22-
| :-------------------------------- | :-- | :-- |:----------| :-- | :-- | :-- | :-- |
23-
| `bytes` | yes | yes | yes | yes | yes | yes | yes |
24-
| `string` | yes | yes | yes | yes | yes | yes | yes |
25-
| `int32` | yes | yes | yes | yes | yes | yes | yes |
26-
| `int64` | yes | yes | yes | yes | yes | yes | yes |
27-
| `float32` | yes | yes | yes | yes | yes | yes | yes |
28-
| `float64` | yes | yes | yes | yes | yes | yes | yes |
29-
| `bool` | yes | yes | yes | yes | yes | yes | yes |
30-
| `timestamp` | yes | yes | yes | yes | yes | yes | yes |
31-
| array types | yes | yes | yes | no | yes | yes | no |
21+
| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | Couchbase |
22+
| :-------------------------------- | :-- | :-- |:----------| :-- | :-- | :-- | :-- |:----------|
23+
| `bytes` | yes | yes | yes | yes | yes | yes | yes | yes |
24+
| `string` | yes | yes | yes | yes | yes | yes | yes | yes |
25+
| `int32` | yes | yes | yes | yes | yes | yes | yes | yes |
26+
| `int64` | yes | yes | yes | yes | yes | yes | yes | yes |
27+
| `float32` | yes | yes | yes | yes | yes | yes | yes | yes |
28+
| `float64` | yes | yes | yes | yes | yes | yes | yes | yes |
29+
| `bool` | yes | yes | yes | yes | yes | yes | yes | yes |
30+
| `timestamp` | yes | yes | yes | yes | yes | yes | yes | yes |
31+
| array types | yes | yes | yes | no | yes | yes | no | no |

docs/reference/offline-stores/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,10 @@ Please see [Offline Store](../../getting-started/components/offline-store.md) fo
2626
[duckdb.md](duckdb.md)
2727
{% endcontent-ref %}
2828

29+
{% content-ref url="couchbase.md" %}
30+
[couchbase.md](couchbase.md)
31+
{% endcontent-ref %}
32+
2933
{% content-ref url="spark.md" %}
3034
[spark.md](spark.md)
3135
{% endcontent-ref %}
+79
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Couchbase Columnar offline store (contrib)
2+
3+
## Description
4+
5+
The Couchbase Columnar offline store provides support for reading [CouchbaseColumnarSources](../data-sources/couchbase.md). **Note that Couchbase Columnar is available through [Couchbase Capella](https://cloud.couchbase.com/).**
6+
* Entity dataframes can be provided as a SQL++ query or can be provided as a Pandas dataframe. A Pandas dataframe will be uploaded to Couchbase Capella Columnar as a collection.
7+
8+
## Disclaimer
9+
10+
The Couchbase Columnar offline store does not achieve full test coverage.
11+
Please do not assume complete stability.
12+
13+
## Getting started
14+
15+
In order to use this offline store, you'll need to run `pip install 'feast[couchbase]'`. You can get started by then running `feast init -t couchbase`.
16+
17+
To get started with Couchbase Capella Columnar:
18+
1. Sign up for a [Couchbase Capella](https://cloud.couchbase.com/) account
19+
2. [Deploy a Columnar cluster](https://docs.couchbase.com/columnar/admin/prepare-project.html)
20+
3. [Create an Access Control Account](https://docs.couchbase.com/columnar/admin/auth/auth-data.html)
21+
- This account should be able to read and write.
22+
- For testing purposes, it is recommended to assign all roles to avoid any permission issues.
23+
4. [Configure allowed IP addresses](https://docs.couchbase.com/columnar/admin/ip-allowed-list.html)
24+
- You must allow the IP address of the machine running Feast.
25+
26+
27+
## Example
28+
29+
{% code title="feature_store.yaml" %}
30+
```yaml
31+
project: my_project
32+
registry: data/registry.db
33+
provider: local
34+
offline_store:
35+
type: couchbase.offline
36+
connection_string: COUCHBASE_COLUMNAR_CONNECTION_STRING # Copied from Settings > Connection String page in Capella Columnar console, starts with couchbases://
37+
user: COUCHBASE_COLUMNAR_USER # Couchbase cluster access name from Settings > Access Control page in Capella Columnar console
38+
password: COUCHBASE_COLUMNAR_PASSWORD # Couchbase password from Settings > Access Control page in Capella Columnar console
39+
timeout: 120 # Timeout in seconds for Columnar operations, optional
40+
online_store:
41+
path: data/online_store.db
42+
```
43+
{% endcode %}
44+
45+
Note that `timeout`is an optional parameter.
46+
The full set of configuration options is available in [CouchbaseColumnarOfflineStoreConfig](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.contrib.couchbase_offline_store.couchbase.CouchbaseColumnarOfflineStoreConfig).
47+
48+
49+
## Functionality Matrix
50+
51+
The set of functionality supported by offline stores is described in detail [here](overview.md#functionality).
52+
Below is a matrix indicating which functionality is supported by the Couchbase Columnar offline store.
53+
54+
| | Couchbase Columnar |
55+
| :----------------------------------------------------------------- |:-------------------|
56+
| `get_historical_features` (point-in-time correct join) | yes |
57+
| `pull_latest_from_table_or_query` (retrieve latest feature values) | yes |
58+
| `pull_all_from_table_or_query` (retrieve a saved dataset) | yes |
59+
| `offline_write_batch` (persist dataframes to offline store) | no |
60+
| `write_logged_features` (persist logged features to offline store) | no |
61+
62+
Below is a matrix indicating which functionality is supported by `CouchbaseColumnarRetrievalJob`.
63+
64+
| | Couchbase Columnar |
65+
| ----------------------------------------------------- |--------------------|
66+
| export to dataframe | yes |
67+
| export to arrow table | yes |
68+
| export to arrow batches | no |
69+
| export to SQL | yes |
70+
| export to data lake (S3, GCS, etc.) | yes |
71+
| export to data warehouse | yes |
72+
| export as Spark dataframe | no |
73+
| local execution of Python-based on-demand transforms | yes |
74+
| remote execution of Python-based on-demand transforms | no |
75+
| persist results in the offline store | yes |
76+
| preview the query plan before execution | yes |
77+
| read partitioned data | yes |
78+
79+
To compare this set of functionality against other offline stores, please see the full [functionality matrix](overview.md#functionality-matrix).

docs/reference/offline-stores/overview.md

+21-21
Original file line numberDiff line numberDiff line change
@@ -31,28 +31,28 @@ Details for each specific offline store, such as how to configure it in a `featu
3131

3232
Below is a matrix indicating which offline stores support which methods.
3333

34-
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino |
35-
| :-------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
36-
| `get_historical_features` | yes | yes | yes | yes | yes | yes | yes |
37-
| `pull_latest_from_table_or_query` | yes | yes | yes | yes | yes | yes | yes |
38-
| `pull_all_from_table_or_query` | yes | yes | yes | yes | yes | yes | yes |
39-
| `offline_write_batch` | yes | yes | yes | yes | no | no | no |
40-
| `write_logged_features` | yes | yes | yes | yes | no | no | no |
34+
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | Couchbase |
35+
| :-------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
36+
| `get_historical_features` | yes | yes | yes | yes | yes | yes | yes | yes |
37+
| `pull_latest_from_table_or_query` | yes | yes | yes | yes | yes | yes | yes | yes |
38+
| `pull_all_from_table_or_query` | yes | yes | yes | yes | yes | yes | yes | yes |
39+
| `offline_write_batch` | yes | yes | yes | yes | no | no | no | no |
40+
| `write_logged_features` | yes | yes | yes | yes | no | no | no | no |
4141

4242

4343
Below is a matrix indicating which `RetrievalJob`s support what functionality.
4444

45-
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB |
46-
| --------------------------------- | --- | --- | --- | --- | --- | --- | --- | --- |
47-
| export to dataframe | yes | yes | yes | yes | yes | yes | yes | yes |
48-
| export to arrow table | yes | yes | yes | yes | yes | yes | yes | yes |
49-
| export to arrow batches | no | no | no | yes | no | no | no | no |
50-
| export to SQL | no | yes | yes | yes | yes | no | yes | no |
51-
| export to data lake (S3, GCS, etc.) | no | no | yes | no | yes | no | no | no |
52-
| export to data warehouse | no | yes | yes | yes | yes | no | no | no |
53-
| export as Spark dataframe | no | no | yes | no | no | yes | no | no |
54-
| local execution of Python-based on-demand transforms | yes | yes | yes | yes | yes | no | yes | yes |
55-
| remote execution of Python-based on-demand transforms | no | no | no | no | no | no | no | no |
56-
| persist results in the offline store | yes | yes | yes | yes | yes | yes | no | yes |
57-
| preview the query plan before execution | yes | yes | yes | yes | yes | yes | yes | no |
58-
| read partitioned data | yes | yes | yes | yes | yes | yes | yes | yes |
45+
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB | Couchbase |
46+
| --------------------------------- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
47+
| export to dataframe | yes | yes | yes | yes | yes | yes | yes | yes | yes |
48+
| export to arrow table | yes | yes | yes | yes | yes | yes | yes | yes | yes |
49+
| export to arrow batches | no | no | no | yes | no | no | no | no | no |
50+
| export to SQL | no | yes | yes | yes | yes | no | yes | no | yes |
51+
| export to data lake (S3, GCS, etc.) | no | no | yes | no | yes | no | no | no | yes |
52+
| export to data warehouse | no | yes | yes | yes | yes | no | no | no | yes |
53+
| export as Spark dataframe | no | no | yes | no | no | yes | no | no | no |
54+
| local execution of Python-based on-demand transforms | yes | yes | yes | yes | yes | no | yes | yes | yes |
55+
| remote execution of Python-based on-demand transforms | no | no | no | no | no | no | no | no | no |
56+
| persist results in the offline store | yes | yes | yes | yes | yes | yes | no | yes | yes |
57+
| preview the query plan before execution | yes | yes | yes | yes | yes | yes | yes | no | yes |
58+
| read partitioned data | yes | yes | yes | yes | yes | yes | yes | yes | yes |

docs/reference/online-stores/couchbase.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ project: my_feature_repo
3838
registry: data/registry.db
3939
provider: local
4040
online_store:
41-
type: couchbase
41+
type: couchbase.online
4242
connection_string: couchbase://127.0.0.1 # Couchbase connection string, copied from 'Connect' page in Couchbase Capella console
4343
user: Administrator # Couchbase username from access credentials
4444
password: password # Couchbase password from access credentials

docs/roadmap.md

+3
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ The list below contains the functionality that contributors are planning to deve
1616
* [x] [Hive (community plugin)](https://github.com/baineng/feast-hive)
1717
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/data-sources/postgres)
1818
* [x] [Spark (contrib plugin)](https://docs.feast.dev/reference/data-sources/spark)
19+
* [x] [Couchbase (contrib plugin)](https://docs.feast.dev/reference/data-sources/couchbase)
1920
* [x] Kafka / Kinesis sources (via [push support into the online store](https://docs.feast.dev/reference/data-sources/push))
2021
* **Offline Stores**
2122
* [x] [Snowflake](https://docs.feast.dev/reference/offline-stores/snowflake)
@@ -26,6 +27,7 @@ The list below contains the functionality that contributors are planning to deve
2627
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/offline-stores/postgres)
2728
* [x] [Trino (contrib plugin)](https://github.com/Shopify/feast-trino)
2829
* [x] [Spark (contrib plugin)](https://docs.feast.dev/reference/offline-stores/spark)
30+
* [x] [Couchbase (contrib plugin)](https://docs.feast.dev/reference/offline-stores/couchbase)
2931
* [x] [In-memory / Pandas](https://docs.feast.dev/reference/offline-stores/file)
3032
* [x] [Custom offline store support](https://docs.feast.dev/how-to-guides/customizing-feast/adding-a-new-offline-store)
3133
* **Online Stores**
@@ -41,6 +43,7 @@ The list below contains the functionality that contributors are planning to deve
4143
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/online-stores/postgres)
4244
* [x] [Cassandra / AstraDB (contrib plugin)](https://docs.feast.dev/reference/online-stores/cassandra)
4345
* [x] [ScyllaDB (contrib plugin)](https://docs.feast.dev/reference/online-stores/scylladb)
46+
* [x] [Couchbase (contrib plugin)](https://docs.feast.dev/reference/online-stores/couchbase)
4447
* [x] [Custom online store support](https://docs.feast.dev/how-to-guides/customizing-feast/adding-support-for-a-new-online-store)
4548
* **Feature Engineering**
4649
* [x] On-demand Transformations (On Read) (Beta release. See [RFC](https://docs.google.com/document/d/1lgfIw0Drc65LpaxbUu49RCeJgMew547meSJttnUqz7c/edit#))

go/internal/test/flexible_coyote/feature_repo/data/online_store_for_pg.db

Whitespace-only changes.

0 commit comments

Comments
 (0)