Skip to content

Latest commit

 

History

History
227 lines (190 loc) · 13.6 KB

File metadata and controls

227 lines (190 loc) · 13.6 KB
description MongoDB Data Connector Documentation

MongoDB Data Connector

MongoDB is an open-source NoSQL database that stores data in flexible, JSON-like documents, providing dynamic schemas and easy scalability.

The MongoDB Data Connector enables federated/accelerated SQL queries on data stored in MongoDB databases.

datasets:
  - from: mongodb:mytable
    name: my_dataset
    params:
      mongodb_host: localhost
      mongodb_port: 27017
      mongodb_db: my_database
      mongodb_user: my_user
      mongodb_pass: ${secrets:mongodb_pass}
      mongodb_pool_min: 0
      mongodb_pool_max: 10

Configuration

from

The from field takes the form mongodb:{table_name} where table_name is the table identifer in the MongoDB server to read from.

{% hint style="info" %} Unquoted identifiers are normalized to lowercase. To reference a collection with mixed-case characters, wrap it in double quotes: mongodb:"MixedCaseCollection". See Identifier Case Sensitivity. {% endhint %}

datasets:
  - from: mongodb:mytable
    name: my_dataset
    params:
      mongodb_db: my_database
      ...

name

The dataset name. This will be used as the table name within Spice.

Example:

datasets:
  - from: mongodb:my_dataset
    name: cool_dataset
    params: ...
SELECT COUNT(*) FROM cool_dataset;
+----------+
| count(*) |
+----------+
| 6001215  |
+----------+

params

The MongoDB data connector can be configured by providing the following params. Use the secret replacement syntax to load the secret from a secret store, e.g. ${secrets:my_mongodb_conn_string}.

Parameter Name Description
mongodb_connection_string The connection string to use to connect to the MongoDB server. This can be used instead of providing individual connection parameters.
mongodb_user The MongoDB username.
mongodb_pass The password to connect with.
mongodb_host The hostname of the MongoDB server.
mongodb_port The port of the MongoDB server.
mongodb_db The name of the database to connect to.
mongodb_sslmode Optional. Specifies the SSL/TLS behavior for the connection, supported values: required (default) requires an SSL connection, preferred tries SSL but falls back to insecure if unavailable, disabled does not use SSL.
mongodb_sslrootcert Optional parameter specifying the path to a custom PEM certificate that the connector will trust.
mongodb_time_zone Optional. Specifies connection time zone. Default is UTC. Accepts fixed offsets (e.g., +02:00) or IANA time zone names (e.g., America/Los_Angeles).
mongodb_auth_source Optional. Authentication source database. Overrides the default auth source in the connection string.
mongodb_unnest_depth Optional. Maximum nesting depth for unnesting embedded documents into a flattened structure. Higher values expand deeper nested fields. Default: 0
mongodb_num_docs_to_infer_schema Optional. Number of documents to use to infer the schema. Defaults to 400.
mongodb_pool_min The minimum number of connections to keep open in the pool, lazily created when requested. Default: 0
mongodb_pool_max The maximum number of connections in the pool. Default: 10

Types

The table below shows the MongoDB data types supported, along with the type mapping to Apache Arrow types in Spice.

MongoDB Type Arrow Type
String Utf8
Boolean Boolean
Int32 Int32
Int64 Int64
Double Float64
Decimal128 Decimal128
Binary Binary
Datetime without time Date32
Datetime with time Timestamp(Millisecond, <Timezone>)
Timestamp Timestamp(Millisecond, None)
Array List<Utf8>
Null Null
Undefined Null
RegularExpression Utf8
JavaScriptCode Utf8
JavaScriptCodeWithScope Utf8
Symbol Utf8
MaxKey Utf8
MinKey Utf8
DbPointer Utf8
ObjectId Utf8
Document See unnesting section

{% hint style="info" %} The MongoDB Datetime value is retrieved as a UTC time value by default. Use the mongodb_time_zone configuration parameter to specify the desired time zone for interpreting TIMESTAMP values during data retrieval. {% endhint %}

Unnesting

Consider the following document:

{
  "a": 1,
  "b": {
    "x": 2,
    "y": {
      "z": 3
    }
  }
}

Using mongodb_unnest_depth you can control the unnesting behavior. Here are the examples:

mongodb_unnest_depth: 0

sql> select * from test_table;
+-----------+---------------------+
| a (Int32) | b (Utf8)            |
+-----------+---------------------+
| 1         | {"x":2,"y":{"z":3}} |
+---+-----------------------------+

mongodb_unnest_depth: 1

sql> select * from test_table;
+-----------+-------------+------------+
| a (Int32) | b.x (Int32) | b.y (Utf8) |
+-----------+-------------+------------+
| 1         | 2           | {"z":3}    | 
+-----------+-------------+------------+

mongodb_unnest_depth: 2

sql> select * from test_table;
+-----------+-------------+---------------+
| a (Int32) | b.x (Int32) | b.y.z (Int32) |
+-----------+-------------+---------------+
| 1         | 2           | 3             |
+-----------+-------------+---------------+

Examples

Connecting using username and password and custom auth table

datasets:
  - from: mongodb:my_dataset
    name: my_dataset
    params:
      mongodb_host: localhost
      mongodb_port: 27017
      mongodb_db: my_database
      mongodb_user: my_user
      mongodb_pass: ${secrets:mongodb_pass}
      mongodb_auth_source: admin

Connecting using SSL

datasets:
  - from: mongodb:my_dataset
    name: my_dataset
    params:
      mongodb_host: localhost
      mongodb_port: 27017
      mongodb_db: my_database
      mongodb_user: my_user
      mongodb_pass: ${secrets:mongodb_pass}
      mongodb_sslmode: preferred
      mongodb_sslrootcert: ./custom_cert.pem

Connecting using a Connection String

datasets:
  - from: mongodb:my_dataset
    name: my_dataset
    params:
      mongodb_connection_string: mongodb://${secrets:my_user}:${secrets:my_password}@localhost:27017/my_db?authSource=admin

With custom connection pool settings

datasets:
  - from: mongodb:my_dataset
    name: my_dataset
    params:
      mongodb_host: localhost
      mongodb_port: 27017
      mongodb_db: my_database
      mongodb_user: my_user
      mongodb_pass: ${secrets:mongodb_pass}
      mongodb_pool_min: 5
      mongodb_pool_max: 10