Skip to content

feat(new source): Add ODBC source implementation#24044

Open
powerumc wants to merge 121 commits into
vectordotdev:masterfrom
powerumc:powerumc/odbc-source
Open

feat(new source): Add ODBC source implementation#24044
powerumc wants to merge 121 commits into
vectordotdev:masterfrom
powerumc:powerumc/odbc-source

Conversation

@powerumc

@powerumc powerumc commented Oct 22, 2025

Copy link
Copy Markdown
Contributor

Summary

This PR implemented a new ODBC(Open Database Connectivity) Source.

Vector configuration

Manual configuration example

[sources.odbc]
type = "odbc"
connection_string = "driver={MySQL ODBC 8.0 ANSI Driver};server=localhost;port=13306;database=vector_db;uid=vector;pwd=vector;"
statement = "SELECT * FROM odbc_table WHERE id > ? LIMIT 1;"
schedule = "*/5 * * * * *"
schedule_timezone = "UTC"
last_run_metadata_path = "odbc_tracking.json"
tracking_columns = ["id"]
statement_init_params = { id = "0" }

[sinks.console]
type = "console"
inputs = ["odbc"]
encoding.codec = "json"

Create example sql file

cat << EOF > example.sql
DROP TABLE IF EXISTS odbc_table;
CREATE TABLE odbc_table
(
    id int auto_increment primary key,
    name varchar(255) null,
    datetime datetime null
);

INSERT INTO odbc_table (name, datetime) VALUES
('test1', now()),
('test2', now()),
('test3', now()),
('test4', now()),
('test5', now());
EOF

Configure ODBC and MariaDB Driver on MacOS

brew install unixodbc
brew install mariadb-connector-odbc

cat << EOF >> /opt/homebrew/etc/odbcinst.ini
[MariaDB Unicode]
Description=MariaDB Connector/ODBC v.3.0
Driver=/opt/homebrew/Cellar/mariadb-connector-odbc/3.2.7/lib/mariadb/libmaodbc.dylib
EOF

Run MariaDB docker container

docker run \
  --rm \
  --name mariadb \
  -e MYSQL_ROOT_PASSWORD=vector \
  -e MYSQL_USER=vector \
  -e MYSQL_PASSWORD=vector \
  -e MYSQL_DATABASE=vector_db \
  -v $(pwd)/example.sql:/docker-entrypoint-initdb.d/example.sql:ro \
  -p 3306:3306 \
  mariadb:latest

How did you test this PR?

I tested it with integration tests of two databases: MariaDB and Postgresql.
(Testing the MySQL container integration in an ARM64 architecture is not simple, so I tested it with MariaDB. Instead, I manually tested MySQL locally.)

cargo vdev integration test odbc-mariadb
cargo vdev integration test odbc-postgresql

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

powerumc added 29 commits July 24, 2025 20:53
# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	scripts/integration/Dockerfile
#	src/internal_events/mod.rs
#	src/sources/mod.rs
@github-actions github-actions Bot added the domain: sources Anything related to the Vector's sources label Oct 22, 2025

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5c798594fc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/sources/odbc/config.rs Outdated
Comment thread src/internal_events/odbc_metrics.rs Outdated
Comment thread src/sources/odbc/client.rs
Comment thread src/sources/odbc/client.rs
Comment thread src/sources/odbc/config.rs

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a9e08785f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/sources/odbc/client.rs
Comment thread src/sources/odbc/client.rs Outdated
Comment thread src/sources/odbc/client.rs Outdated
Comment thread src/sources/odbc/client.rs

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26419942f7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/sources/odbc/config.rs Outdated
Comment thread src/sources/odbc/client.rs Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c08668179a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread Cargo.toml
"sources-logstash",
"sources-mqtt",
"sources-nats",
"sources-odbc",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Install unixODBC in release runtimes

Adding sources-odbc to the default log sources pulls odbc-api/odbc-sys into the base release binary, so Linux images now need the system ODBC driver manager at load time. I checked the release Dockerfiles: the Debian image only installs ca-certificates tzdata systemd libsasl2-2, and the distroless-libc image only stages zlib, so their vector --version smoke tests and container startup can fail with a missing libodbc unless the runtime images/package dependencies include unixODBC or ODBC stays out of the default feature set.

Useful? React with 👍 / 👎.

Comment thread tests/integration/odbc-mariadb/config/compose.yaml
Comment thread src/sources/odbc/config.rs

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07cc01a7ce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


#[tokio::test]
async fn query_executed_with_init_params() {
const LAST_RUN_METADATA_PATH: &str = "odbc_tracking-integration-tests.json";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Isolate ODBC integration test state

When cargo vdev int test odbc-* invokes cargo nextest run (see vdev/src/testing/runner.rs) with the new ODBC test.yaml filter and no --test-threads=1, these #[tokio::test]s can run concurrently. This checkpoint path is also used by query_executed_with_filepath, and both tests drop/recreate odbc_table, so one test can delete or advance the other's table/checkpoint while its scheduled source is still polling, making the integration suite flaky; use per-test table/file names or force these tests to run serially.

Useful? React with 👍 / 👎.

Comment on lines +440 to +443
.filter_map(|col| {
let value = map.get(col)?;
value_to_sql_parameter(value).map(|param| param.into_parameter())
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject missing tracking parameter values

When tracking_columns is set, this filter_map silently drops declared columns that are absent from statement_init_params/metadata/last result row, or whose value is Null/unconvertible. For a query whose ? placeholders correspond to those columns, the next scheduled execution binds fewer ODBC parameters than the SQL expects and fails every tick instead of surfacing the bad tracking state when it is loaded or saved; return an error if any declared tracking column cannot be converted to a parameter.

Useful? React with 👍 / 👎.

Comment thread src/sources/odbc/client.rs
Comment thread src/sources/odbc/config.rs
Comment thread src/sources/odbc/client.rs Outdated
@pront pront added the meta: awaiting author Pull requests that are awaiting their author. label Jun 17, 2026
@github-actions github-actions Bot removed the meta: awaiting author Pull requests that are awaiting their author. label Jun 17, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9b79856da3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +241 to +245
let latest_result = extract_and_save_tracking(
cfg.last_run_metadata_path.as_deref(),
last.clone(),
tracking_columns,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate metadata path before sending rows

When last_run_metadata_path points to a missing parent directory or an unwritable file, this checkpoint write runs only after send_batch has accepted the rows. The write error is returned, prev_result is not advanced, and the next scheduled poll falls back to the same old metadata/init params, so the source re-emits the same database rows every tick until the path is fixed; validate or prepare the checkpoint path before sending rows, or avoid sending rows that cannot be checkpointed.

Useful? React with 👍 / 👎.


classes: {
commonly_used: false
delivery: "at_least_once"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't advertise at-least-once delivery

The component metadata says ODBC is at_least_once, but the source intentionally disables acknowledgements and src/sources/odbc/config.rs:345 documents at-most-once behavior when last_run_metadata_path is used. Users reading the generated reference will expect rows to survive crashes or downstream sink failures, while this source can advance its checkpoint once rows enter the topology; remove the at-least-once claim or mark the weaker delivery guarantee.

Useful? React with 👍 / 👎.

cols.insert(key, value);
}

rows.push(Value::Object(cols))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stream large ODBC result sets in bounded chunks

When a scheduled query returns a large result set, this loop keeps appending every fetched row to rows until the cursor is exhausted before decoding or sending anything. odbc_batch_size only limits the driver fetch buffer, so a broad query can still build one unbounded in-memory JSON payload and exhaust Vector memory; send/decode each fetched batch or enforce a configured result limit instead of accumulating the entire result set.

Useful? React with 👍 / 👎.

return Value::Null;
};

naive_local_to_timestamp_value(NaiveDateTime::new(NaiveDate::default(), time), tz, s)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve TIME values when binding tracking params

When tracking_columns includes a SQL TIME column, this maps the time-only value into a Timestamp anchored to the default date, and the next poll formats timestamps as a full YYYY-MM-DD HH:MM:SS parameter. Databases such as PostgreSQL will compare a TIME placeholder against 1970-01-01 15:30:00 instead of 15:30:00, which can fail the query or advance tracking incorrectly; keep time-only values as their original text for checkpoint/bind purposes.

Useful? React with 👍 / 👎.

@pront pront added the meta: awaiting author Pull requests that are awaiting their author. label Jun 17, 2026
@pront

pront commented Jun 17, 2026

Copy link
Copy Markdown
Member

Hello @powerumc, we are very interested in this PR. There a few merge conflicts and unresolved threads though. Ping us if you need any assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: ci Anything related to Vector's CI environment domain: external docs Anything related to Vector's external, public documentation domain: sources Anything related to the Vector's sources meta: awaiting author Pull requests that are awaiting their author. source: new Request or implementation of a new source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants