feat(new source): Add ODBC source implementation#24044
Conversation
…ata_path properties tests
# Conflicts: # Cargo.lock # Cargo.toml # scripts/integration/Dockerfile # src/internal_events/mod.rs # src/sources/mod.rs
…ostgreSQL services
…ion for an ODBC source
…for MariaDB usage
… preserve exact values
…r file in OdbcConfig
…eturn Result type
…lue_to_sql_parameter function for improved SQL binding
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5c798594fc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… OdbcConfig and enforce non-empty statement requirement
…ze errors and improve metric tracking
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3a9e08785f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 26419942f7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…to_timestamp_value function for improved DST and fallback management
…or and allow indefinite wait
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c08668179a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "sources-logstash", | ||
| "sources-mqtt", | ||
| "sources-nats", | ||
| "sources-odbc", |
There was a problem hiding this comment.
Install unixODBC in release runtimes
Adding sources-odbc to the default log sources pulls odbc-api/odbc-sys into the base release binary, so Linux images now need the system ODBC driver manager at load time. I checked the release Dockerfiles: the Debian image only installs ca-certificates tzdata systemd libsasl2-2, and the distroless-libc image only stages zlib, so their vector --version smoke tests and container startup can fail with a missing libodbc unless the runtime images/package dependencies include unixODBC or ODBC stays out of the default feature set.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 07cc01a7ce
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| #[tokio::test] | ||
| async fn query_executed_with_init_params() { | ||
| const LAST_RUN_METADATA_PATH: &str = "odbc_tracking-integration-tests.json"; |
There was a problem hiding this comment.
Isolate ODBC integration test state
When cargo vdev int test odbc-* invokes cargo nextest run (see vdev/src/testing/runner.rs) with the new ODBC test.yaml filter and no --test-threads=1, these #[tokio::test]s can run concurrently. This checkpoint path is also used by query_executed_with_filepath, and both tests drop/recreate odbc_table, so one test can delete or advance the other's table/checkpoint while its scheduled source is still polling, making the integration suite flaky; use per-test table/file names or force these tests to run serially.
Useful? React with 👍 / 👎.
| .filter_map(|col| { | ||
| let value = map.get(col)?; | ||
| value_to_sql_parameter(value).map(|param| param.into_parameter()) | ||
| }) |
There was a problem hiding this comment.
Reject missing tracking parameter values
When tracking_columns is set, this filter_map silently drops declared columns that are absent from statement_init_params/metadata/last result row, or whose value is Null/unconvertible. For a query whose ? placeholders correspond to those columns, the next scheduled execution binds fewer ODBC parameters than the SQL expects and fails every tick instead of surfacing the bad tracking state when it is loaded or saved; return an error if any declared tracking column cannot be converted to a parameter.
Useful? React with 👍 / 👎.
…loss when last_run_metadata_path is set
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9b79856da3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let latest_result = extract_and_save_tracking( | ||
| cfg.last_run_metadata_path.as_deref(), | ||
| last.clone(), | ||
| tracking_columns, | ||
| ) |
There was a problem hiding this comment.
Validate metadata path before sending rows
When last_run_metadata_path points to a missing parent directory or an unwritable file, this checkpoint write runs only after send_batch has accepted the rows. The write error is returned, prev_result is not advanced, and the next scheduled poll falls back to the same old metadata/init params, so the source re-emits the same database rows every tick until the path is fixed; validate or prepare the checkpoint path before sending rows, or avoid sending rows that cannot be checkpointed.
Useful? React with 👍 / 👎.
|
|
||
| classes: { | ||
| commonly_used: false | ||
| delivery: "at_least_once" |
There was a problem hiding this comment.
Don't advertise at-least-once delivery
The component metadata says ODBC is at_least_once, but the source intentionally disables acknowledgements and src/sources/odbc/config.rs:345 documents at-most-once behavior when last_run_metadata_path is used. Users reading the generated reference will expect rows to survive crashes or downstream sink failures, while this source can advance its checkpoint once rows enter the topology; remove the at-least-once claim or mark the weaker delivery guarantee.
Useful? React with 👍 / 👎.
| cols.insert(key, value); | ||
| } | ||
|
|
||
| rows.push(Value::Object(cols)) |
There was a problem hiding this comment.
Stream large ODBC result sets in bounded chunks
When a scheduled query returns a large result set, this loop keeps appending every fetched row to rows until the cursor is exhausted before decoding or sending anything. odbc_batch_size only limits the driver fetch buffer, so a broad query can still build one unbounded in-memory JSON payload and exhaust Vector memory; send/decode each fetched batch or enforce a configured result limit instead of accumulating the entire result set.
Useful? React with 👍 / 👎.
| return Value::Null; | ||
| }; | ||
|
|
||
| naive_local_to_timestamp_value(NaiveDateTime::new(NaiveDate::default(), time), tz, s) |
There was a problem hiding this comment.
Preserve TIME values when binding tracking params
When tracking_columns includes a SQL TIME column, this maps the time-only value into a Timestamp anchored to the default date, and the next poll formats timestamps as a full YYYY-MM-DD HH:MM:SS parameter. Databases such as PostgreSQL will compare a TIME placeholder against 1970-01-01 15:30:00 instead of 15:30:00, which can fail the query or advance tracking incorrectly; keep time-only values as their original text for checkpoint/bind purposes.
Useful? React with 👍 / 👎.
|
Hello @powerumc, we are very interested in this PR. There a few merge conflicts and unresolved threads though. Ping us if you need any assistance. |
Summary
This PR implemented a new ODBC(Open Database Connectivity) Source.
Vector configuration
Manual configuration example
Create example sql file
Configure ODBC and MariaDB Driver on MacOS
Run MariaDB docker container
docker run \ --rm \ --name mariadb \ -e MYSQL_ROOT_PASSWORD=vector \ -e MYSQL_USER=vector \ -e MYSQL_PASSWORD=vector \ -e MYSQL_DATABASE=vector_db \ -v $(pwd)/example.sql:/docker-entrypoint-initdb.d/example.sql:ro \ -p 3306:3306 \ mariadb:latestHow did you test this PR?
I tested it with integration tests of two databases: MariaDB and Postgresql.
(Testing the MySQL container integration in an ARM64 architecture is not simple, so I tested it with MariaDB. Instead, I manually tested MySQL locally.)
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details here.