| description | MongoDB Data Connector Documentation |
|---|
MongoDB is an open-source NoSQL database that stores data in flexible, JSON-like documents, providing dynamic schemas and easy scalability.
The MongoDB Data Connector enables federated/accelerated SQL queries on data stored in MongoDB databases.
datasets:
- from: mongodb:mytable
name: my_dataset
params:
mongodb_host: localhost
mongodb_port: 27017
mongodb_db: my_database
mongodb_user: my_user
mongodb_pass: ${secrets:mongodb_pass}
mongodb_pool_min: 0
mongodb_pool_max: 10The from field takes the form mongodb:{table_name} where table_name is the table identifer in the MongoDB server to read from.
{% hint style="info" %}
Unquoted identifiers are normalized to lowercase. To reference a collection with mixed-case characters, wrap it in double quotes: mongodb:"MixedCaseCollection". See Identifier Case Sensitivity.
{% endhint %}
datasets:
- from: mongodb:mytable
name: my_dataset
params:
mongodb_db: my_database
...The dataset name. This will be used as the table name within Spice.
Example:
datasets:
- from: mongodb:my_dataset
name: cool_dataset
params: ...SELECT COUNT(*) FROM cool_dataset;+----------+
| count(*) |
+----------+
| 6001215 |
+----------+The MongoDB data connector can be configured by providing the following params. Use the secret replacement syntax to load the secret from a secret store, e.g. ${secrets:my_mongodb_conn_string}.
| Parameter Name | Description |
|---|---|
mongodb_connection_string |
The connection string to use to connect to the MongoDB server. This can be used instead of providing individual connection parameters. |
mongodb_user |
The MongoDB username. |
mongodb_pass |
The password to connect with. |
mongodb_host |
The hostname of the MongoDB server. |
mongodb_port |
The port of the MongoDB server. |
mongodb_db |
The name of the database to connect to. |
mongodb_sslmode |
Optional. Specifies the SSL/TLS behavior for the connection, supported values: required (default) requires an SSL connection, preferred tries SSL but falls back to insecure if unavailable, disabled does not use SSL. |
mongodb_sslrootcert |
Optional parameter specifying the path to a custom PEM certificate that the connector will trust. |
mongodb_time_zone |
Optional. Specifies connection time zone. Default is UTC. Accepts fixed offsets (e.g., +02:00) or IANA time zone names (e.g., America/Los_Angeles). |
mongodb_auth_source |
Optional. Authentication source database. Overrides the default auth source in the connection string. |
mongodb_unnest_depth |
Optional. Maximum nesting depth for unnesting embedded documents into a flattened structure. Higher values expand deeper nested fields. Default: 0 |
mongodb_num_docs_to_infer_schema |
Optional. Number of documents to use to infer the schema. Defaults to 400. |
mongodb_pool_min |
The minimum number of connections to keep open in the pool, lazily created when requested. Default: 0 |
mongodb_pool_max |
The maximum number of connections in the pool. Default: 10 |
The table below shows the MongoDB data types supported, along with the type mapping to Apache Arrow types in Spice.
| MongoDB Type | Arrow Type |
|---|---|
String |
Utf8 |
Boolean |
Boolean |
Int32 |
Int32 |
Int64 |
Int64 |
Double |
Float64 |
Decimal128 |
Decimal128 |
Binary |
Binary |
Datetime without time |
Date32 |
Datetime with time |
Timestamp(Millisecond, <Timezone>) |
Timestamp |
Timestamp(Millisecond, None) |
Array |
List<Utf8> |
Null |
Null |
Undefined |
Null |
RegularExpression |
Utf8 |
JavaScriptCode |
Utf8 |
JavaScriptCodeWithScope |
Utf8 |
Symbol |
Utf8 |
MaxKey |
Utf8 |
MinKey |
Utf8 |
DbPointer |
Utf8 |
ObjectId |
Utf8 |
Document |
See unnesting section |
{% hint style="info" %}
The MongoDB Datetime value is retrieved as a UTC time value by default. Use the mongodb_time_zone configuration parameter to specify the desired time zone for interpreting TIMESTAMP values during data retrieval.
{% endhint %}
Consider the following document:
{
"a": 1,
"b": {
"x": 2,
"y": {
"z": 3
}
}
}Using mongodb_unnest_depth you can control the unnesting behavior. Here are the examples:
sql> select * from test_table;
+-----------+---------------------+
| a (Int32) | b (Utf8) |
+-----------+---------------------+
| 1 | {"x":2,"y":{"z":3}} |
+---+-----------------------------+sql> select * from test_table;
+-----------+-------------+------------+
| a (Int32) | b.x (Int32) | b.y (Utf8) |
+-----------+-------------+------------+
| 1 | 2 | {"z":3} |
+-----------+-------------+------------+sql> select * from test_table;
+-----------+-------------+---------------+
| a (Int32) | b.x (Int32) | b.y.z (Int32) |
+-----------+-------------+---------------+
| 1 | 2 | 3 |
+-----------+-------------+---------------+datasets:
- from: mongodb:my_dataset
name: my_dataset
params:
mongodb_host: localhost
mongodb_port: 27017
mongodb_db: my_database
mongodb_user: my_user
mongodb_pass: ${secrets:mongodb_pass}
mongodb_auth_source: admindatasets:
- from: mongodb:my_dataset
name: my_dataset
params:
mongodb_host: localhost
mongodb_port: 27017
mongodb_db: my_database
mongodb_user: my_user
mongodb_pass: ${secrets:mongodb_pass}
mongodb_sslmode: preferred
mongodb_sslrootcert: ./custom_cert.pemdatasets:
- from: mongodb:my_dataset
name: my_dataset
params:
mongodb_connection_string: mongodb://${secrets:my_user}:${secrets:my_password}@localhost:27017/my_db?authSource=admindatasets:
- from: mongodb:my_dataset
name: my_dataset
params:
mongodb_host: localhost
mongodb_port: 27017
mongodb_db: my_database
mongodb_user: my_user
mongodb_pass: ${secrets:mongodb_pass}
mongodb_pool_min: 5
mongodb_pool_max: 10