From 1a31873544e935c0a905dec8a669c1d1ea5571de Mon Sep 17 00:00:00 2001 From: Artem Ermoshkin Date: Tue, 9 Jun 2026 16:24:30 +0300 Subject: [PATCH 1/2] add cpp rules --- docs/authoring.md | 1 + promptfooconfig.yaml | 8 + skills/ydb-core/SKILL.md | 1 + skills/ydb-table/SKILL.md | 8 +- skills/ydb-table/references/embed/cpp.md | 159 ++++++++++++++++++ skills/ydb-table/rules/embed/cpp.md | 144 ++++++++++++++++ tests/routing/09-cpp-audit-table.yaml | 31 ++++ tests/routing/descriptions.md | 2 +- .../cpp-rule-cpp-01-unbounded-read.yaml | 43 +++++ .../cpp-rule-cpp-02-closure-mutation.yaml | 45 +++++ .../cpp-rule-cpp-03-missing-idempotent.yaml | 41 +++++ .../cpp-rule-cpp-04-outer-retry-loop.yaml | 46 +++++ .../cpp-rule-cpp-05-custom-sleep-retrier.yaml | 39 +++++ .../cpp-rule-cpp-06-driver-per-call.yaml | 55 ++++++ .../cpp-rule-cpp-07-string-built-yql.yaml | 33 ++++ ...cpp-rule-cpp-08-explicit-begin-commit.yaml | 48 ++++++ .../cpp-rule-cpp-09-stream-duplicates.yaml | 59 +++++++ ...pp-rule-cpp-10-insert-with-idempotent.yaml | 51 ++++++ .../cpp-rule-cpp-11-ddl-in-retry.yaml | 61 +++++++ 19 files changed, 871 insertions(+), 4 deletions(-) create mode 100644 skills/ydb-table/references/embed/cpp.md create mode 100644 skills/ydb-table/rules/embed/cpp.md create mode 100644 tests/routing/09-cpp-audit-table.yaml create mode 100644 tests/ydb-table/cpp-rule-cpp-01-unbounded-read.yaml create mode 100644 tests/ydb-table/cpp-rule-cpp-02-closure-mutation.yaml create mode 100644 tests/ydb-table/cpp-rule-cpp-03-missing-idempotent.yaml create mode 100644 tests/ydb-table/cpp-rule-cpp-04-outer-retry-loop.yaml create mode 100644 tests/ydb-table/cpp-rule-cpp-05-custom-sleep-retrier.yaml create mode 100644 tests/ydb-table/cpp-rule-cpp-06-driver-per-call.yaml create mode 100644 tests/ydb-table/cpp-rule-cpp-07-string-built-yql.yaml create mode 100644 tests/ydb-table/cpp-rule-cpp-08-explicit-begin-commit.yaml create mode 100644 tests/ydb-table/cpp-rule-cpp-09-stream-duplicates.yaml create mode 100644 tests/ydb-table/cpp-rule-cpp-10-insert-with-idempotent.yaml create mode 100644 tests/ydb-table/cpp-rule-cpp-11-ddl-in-retry.yaml diff --git a/docs/authoring.md b/docs/authoring.md index 6c6246b..a5cd526 100644 --- a/docs/authoring.md +++ b/docs/authoring.md @@ -101,6 +101,7 @@ Rule IDs have the shape `RULE--`. Prefixes are **not pre-allocated** |--------|-------|---------------| | JV | Java SDK / JDBC / Hibernate / Spring Data anti-patterns | skills/ydb-table/rules/embed/java.md | | GO | Go SDK (`ydb-go-sdk/v3`) — driver, sessions, query/table services, retry, transactions | skills/ydb-table/rules/embed/go.md | +| CPP | C++ SDK (`ydb-cpp-sdk`) — query/table clients, retry, transactions, parameterization | skills/ydb-table/rules/embed/cpp.md | ### Severity labels diff --git a/promptfooconfig.yaml b/promptfooconfig.yaml index 6b6ffbe..e6da448 100644 --- a/promptfooconfig.yaml +++ b/promptfooconfig.yaml @@ -86,6 +86,12 @@ _messages: &messages --- ydb-table / rules / embed / go.md --- {{ ydb_table_rules_go }} + + --- ydb-table / references / embed / cpp.md --- + {{ ydb_table_refs_cpp }} + + --- ydb-table / rules / embed / cpp.md --- + {{ ydb_table_rules_cpp }} - role: user content: "{{ user_prompt }}" @@ -223,6 +229,8 @@ defaultTest: ydb_table_rules_java: file://skills/ydb-table/rules/embed/java.md ydb_table_refs_go: file://skills/ydb-table/references/embed/go.md ydb_table_rules_go: file://skills/ydb-table/rules/embed/go.md + ydb_table_refs_cpp: file://skills/ydb-table/references/embed/cpp.md + ydb_table_rules_cpp: file://skills/ydb-table/rules/embed/cpp.md # Hard pre-filter: any output that came back as the defensive # "[provider error] ..." stub is a fail regardless of what the grader says. # Without this the grader will happily pass a 400/401 error as a "valid diff --git a/skills/ydb-core/SKILL.md b/skills/ydb-core/SKILL.md index 4dc7411..8b8cb3f 100644 --- a/skills/ydb-core/SKILL.md +++ b/skills/ydb-core/SKILL.md @@ -55,6 +55,7 @@ SDKs, all official under https://github.com/ydb-platform/: | Python | ydb-python-sdk | PyPI `ydb` | ✅ | ✅ | ✅ | | Java | ydb-java-sdk | Maven `tech.ydb:ydb-sdk-bom` + `ydb-sdk-query` / `ydb-sdk-topic` / `ydb-sdk-coordination` | ✅ | ✅ | ✅ | | JS/TS | ydb-js-sdk | npm `@ydbjs/core`, `@ydbjs/query`, `@ydbjs/topic`, `@ydbjs/coordination` | ✅ | ✅ | ✅ | +| C++ | ydb-cpp-sdk | CMake `find_package(ydb-cpp-sdk)` / Debian `libydb-cpp-dev`; link `YDB-CPP-SDK::Driver` + `Query` / `Table` / `Topic` / `Coordination` | ✅ | ✅ | ✅ | Q = queries, T = topics, C = coordination. diff --git a/skills/ydb-table/SKILL.md b/skills/ydb-table/SKILL.md index fe2a09c..5782118 100644 --- a/skills/ydb-table/SKILL.md +++ b/skills/ydb-table/SKILL.md @@ -1,6 +1,6 @@ --- name: ydb-table -description: Writing and auditing code that runs YQL against YDB tables. Use when the user writes a query, designs a table or primary key, reads an `EXPLAIN`, or asks to review Java (ydb-java-sdk, ydb-jdbc-driver, Hibernate, Spring Data JPA) or Go (`ydb-go-sdk/v3`) application code that talks to YDB. Triggers on YQL keywords (`UPSERT`, `SELECT`, `DECLARE`, `AS_TABLE`, `VIEW `, `CREATE TABLE`, `ALTER TABLE`, `EXPLAIN`), on the `BulkUpsert` SDK API, on JDBC / Hibernate / Spring symbols (`JpaRepository`, `findAllById`, `saveAll`, `deleteAllByIdInBatch`, `hibernate.jdbc.batch_size`, `@Version`, `@Retryable`, `SQLRecoverableException`, `SQLTransientException`), on `ydb-go-sdk/v3` symbols (`ydb.Open`, `db.Query().Do`, `db.Query().DoTx`, `db.Table().Do`, `query.WithIdempotent`, `query.WithCommit`, `query.WithStatsMode`, `query.Stats`, `query.StatsModeBasic`, `result.Close`, `ydb.WithLazyTx`, `ydb.ParamsBuilder`, `s.BeginTransaction`, `table.TxControl`, `table.BeginTx`, `BulkUpsertDataRows`, `balancers.PreferLocalDC`, `balancers.PreferNearestDC`), on flaky empty/zero query stats right after `Query` returns, on YDB transaction-mode names (`SerializableRW`, `SnapshotRO`), and on PostgreSQL / MySQL → YDB conversion prompts. For other SDKs (Python, C++, C#) this skill covers only the YQL / schema / transaction-mode side; SDK-specific guidance for those languages is not in this skill yet — say so and point at upstream docs. +description: Writing and auditing code that runs YQL against YDB tables. Use when the user writes a query, designs a table or primary key, reads an `EXPLAIN`, or asks to review Java (ydb-java-sdk, ydb-jdbc-driver, Hibernate, Spring Data JPA), Go (`ydb-go-sdk/v3`), or C++ (`ydb-cpp-sdk`) application code that talks to YDB. Triggers on YQL keywords (`UPSERT`, `SELECT`, `DECLARE`, `AS_TABLE`, `VIEW `, `CREATE TABLE`, `ALTER TABLE`, `EXPLAIN`), on the `BulkUpsert` SDK API, on JDBC / Hibernate / Spring symbols (`JpaRepository`, `findAllById`, `saveAll`, `deleteAllByIdInBatch`, `hibernate.jdbc.batch_size`, `@Version`, `@Retryable`, `SQLRecoverableException`, `SQLTransientException`), on `ydb-go-sdk/v3` symbols (`ydb.Open`, `db.Query().Do`, `db.Query().DoTx`, `db.Table().Do`, `query.WithIdempotent`, `query.WithCommit`, `query.WithStatsMode`, `query.Stats`, `query.StatsModeBasic`, `result.Close`, `ydb.WithLazyTx`, `ydb.ParamsBuilder`, `s.BeginTransaction`, `table.TxControl`, `table.BeginTx`, `BulkUpsertDataRows`, `balancers.PreferLocalDC`, `balancers.PreferNearestDC`), on `ydb-cpp-sdk` symbols (`#include — do not reproduce the spec from memory | ## Content rules -- Always parameterize: bind values through the SDK's typed parameter API (e.g. `ydb.ParamsBuilder()` in Go, `PreparedStatement` in JDBC), do not concatenate them into the query text. Plan-cache reuse depends on it; concatenated literals miss the cache. A leading `DECLARE` block in the query body is optional in modern YDB — scalar parameter types are inferred from the bound values — and earns its place on compound shapes (`List>`) or as an explicit caller contract. +- Always parameterize: bind values through the SDK's typed parameter API (e.g. `ydb.ParamsBuilder()` in Go, `TParamsBuilder` in C++, `PreparedStatement` in JDBC), do not concatenate them into the query text. Plan-cache reuse depends on it; concatenated literals miss the cache. A leading `DECLARE` block in the query body is optional in modern YDB — scalar parameter types are inferred from the bound values — and earns its place on compound shapes (`List>`) or as an explicit caller contract. - Prefer the Query Service over the deprecated Table Service for new code. - When converting from another SQL dialect, surface where YDB diverges — primary keys are partition keys, no `SERIAL` / `AUTO_INCREMENT`, JOIN behavior and built-in function names differ — rather than producing code that happens to parse. - Don't fabricate YQL syntax, built-in names, or SDK symbols. If the loaded sources don't cover the question, link the relevant page under and state the uncertainty. diff --git a/skills/ydb-table/references/embed/cpp.md b/skills/ydb-table/references/embed/cpp.md new file mode 100644 index 0000000..c818990 --- /dev/null +++ b/skills/ydb-table/references/embed/cpp.md @@ -0,0 +1,159 @@ +# Embedding YDB in C++ applications + +## Stack + +The official C++ SDK is **`ydb-cpp-sdk`** (). Public API lives in namespace `NYdb::inline V3` under `#include `. + +Two application surfaces for table work: + +- **`NYdb::NQuery::TQueryClient`** — Query Service. Preferred for new YQL-centric code. Supports `TTxControl::NoTx()` for DDL, `StreamExecuteQuery` for large reads, `ReadCommittedRW` transaction mode. +- **`NYdb::NTable::TTableClient`** — Table / KQP Service. Use for `PrepareDataQuery`, `BulkUpsert`, `StreamExecuteScanQuery`, schema builders (`TTableBuilder`, `CreateTable`). `NTable::TTxControl` has no `NoTx()` — DDL via `CreateTable` / `ExecuteSchemeQuery`. + +Open one **`NYdb::TDriver`** per process; stack surface clients on top. APIs return `NThreading::TFuture`; examples block with `.GetValueSync()` / `.ExtractValueSync()`. + +Build: C++20, static libraries. CMake consumer pattern from upstream README: + +```cmake +find_package(ydb-cpp-sdk REQUIRED COMPONENTS Driver Query Table) +target_link_libraries(myapp PRIVATE YDB-CPP-SDK::Driver YDB-CPP-SDK::Query) +``` + +Debian packages: `libydb-cpp-dev` (core), optional `libydb-cpp-iam-dev`. After install, pass `-DCMAKE_PREFIX_PATH=/usr/share/yandex`. + +Worked examples: . + +## Connection + +See [`../../../ydb-core/SKILL.md#connecting`](../../../ydb-core/SKILL.md#connecting) for connection-string shape and auth env vars. + +```cpp +#include + +auto cfg = NYdb::TDriverConfig() + .SetEndpoint("grpc://localhost:2136") + .SetDatabase("/local") + .SetAuthToken(std::getenv("YDB_TOKEN") ? std::getenv("YDB_TOKEN") : ""); +NYdb::TDriver driver(cfg); +// ... work ... +driver.Stop(true); +``` + +`TDriverConfig` also accepts a connection string: `grpc://host:port/?database=/path` or `grpcs://...`. Credentials factories: `CreateOAuthCredentialsProviderFactory`, `CreateInsecureCredentialsProviderFactory` — `include/ydb-cpp-sdk/client/types/credentials/credentials.h`. + +For production code, prefer letting the SDK pick credentials from the standard `YDB_*` environment variables via `NYdb::CreateFromEnvironment(connectionString)` in `include/ydb-cpp-sdk/client/helpers/helpers.h` — it returns a ready `TDriverConfig` honouring `YDB_SERVICE_ACCOUNT_KEY_FILE_CREDENTIALS`, `YDB_ACCESS_TOKEN_CREDENTIALS`, `YDB_METADATA_CREDENTIALS`, `YDB_OAUTH2_KEY_FILE`, and `YDB_ANONYMOUS_CREDENTIALS`. The full env-var list is in [`../../../ydb-core/SKILL.md#connecting`](../../../ydb-core/SKILL.md#connecting). + +## Query execution + +Canonical Query Service pattern — `RetryQuerySync` wraps a lambda; the lambda is the retry unit: + +```cpp +#include +#include + +using namespace NYdb::NQuery; + +static TStatus SelectUserById(TSession session) { + auto params = TParamsBuilder() + .AddParam("$id").Uint64(42).Build() + .Build(); + return session.ExecuteQuery( + "SELECT name FROM users WHERE id = $id", + TTxControl::BeginTx(TTxSettings::SerializableRW()).CommitTx(), + params + ).GetValueSync(); +} + +ThrowOnError(client.RetryQuerySync( + SelectUserById, + NYdb::NRetry::TRetryOperationSettings().Idempotent(true))); +``` + +Three load-bearing pieces: + +- **`TRetryOperationSettings().Idempotent(true)`** on `RetryQuerySync` / `RetryOperationSync` declares replay-safe work. Required for reads and for writes keyed on a client-generated id. Omit on non-idempotent writes (counter increment, unkeyed `INSERT`). +- **`TParamsBuilder`** binds values — do not concatenate them into the query text. A leading `DECLARE` block is optional for scalars; use it for `List>` and other compound shapes. +- **Build results inside the lambda.** Assign to outer variables only on the success path (the returned `TStatus` is success). Mutations to outer state mid-lambda survive across retry attempts. + +Source: `examples/basic_example/basic_example.cpp`. + +## Transactions + +**Single-statement** — fuse begin and commit in `TTxControl`: + +```cpp +TTxControl::BeginTx(TTxSettings::SerializableRW()).CommitTx() +``` + +**Multi-step client logic** — first query opens the tx (no `CommitTx`), second commits: + +```cpp +auto result = session.ExecuteQuery(query1, TTxControl::BeginTx(TTxSettings::SerializableRW()), params1) + .GetValueSync(); +auto tx = *result.GetTransaction(); +auto result2 = session.ExecuteQuery(query2, TTxControl::Tx(tx).CommitTx(), params2).GetValueSync(); +``` + +Canonical multi-step shape: `examples/basic_example/basic_example.cpp` `MultiStep()`. + +For transaction modes and optimistic-locking consequences, see [`../working-with-data.md`](../working-with-data.md). + +## Retries + +YDB uses optimistic concurrency — application code that talks to YDB must run inside SDK retriers, not as bare one-shot RPCs. + +`RetryQuerySync` / `RetryOperationSync` classify errors internally (`src/client/impl/internal/retry/retry.h`): + +- **Always retried**: `ABORTED`, `OVERLOADED`, `CLIENT_RESOURCE_EXHAUSTED`, `UNAVAILABLE`, `BAD_SESSION`, `SESSION_BUSY` (session reset). +- **Retried only when `.Idempotent(true)`**: `UNDETERMINED`, `TRANSPORT_UNAVAILABLE`. +- **Non-retryable**: schema/semantic failures — propagated to caller. + +No outer `for` loop or hand-rolled `Sleep` backoff around SDK calls. Tune via `TRetryOperationSettings` (`MaxRetries`, `FastBackoffSettings`, `SlowBackoffSettings`). + +## Result parsing + +```cpp +TResultSetParser parser(result.GetResultSet(0)); +while (parser.TryNextRow()) { + auto id = parser.ColumnParser("id").GetOptionalUint64(); +} +``` + +## Bulk upsert + +Non-transactional ingest via Table client: + +```cpp +NYdb::TValueBuilder rows; +rows.BeginList().AddListItem().BeginStruct() + .AddMember("id").Uint64(1) + .AddMember("payload").Utf8("x") + .EndStruct().EndList(); + +struct TBulkUpsertOp { + std::string TablePath; + NYdb::TValue Rows; + TStatus operator()(NYdb::NTable::TTableClient& tableClient) const { + return tableClient.BulkUpsert(TablePath, Rows).GetValueSync(); + } +}; + +client.RetryOperationSync( + TBulkUpsertOp{tablePath, rows.Build()}, + NYdb::NTable::TRetryOperationSettings().Idempotent(true).MaxRetries(20)); +``` + +`BulkUpsert` is UPSERT-keyed (insert-or-overwrite by primary key), so replaying the same chunk converges to the same final state — that's why `.Idempotent(true)` is safe here and is the conventional setting. Each `BulkUpsert` call is its own non-transactional batch, not part of a surrounding `TTxControl`. + +When bulk is appropriate vs `AS_TABLE` in a transaction — see [`../working-with-data.md`](../working-with-data.md). + +Source: `examples/bulk_upsert_simple/main.cpp`. + +## Large reads + +Pick one structural path: + +- **Query Service streaming** — `client.StreamExecuteQuery(...)` returns `TExecuteQueryIterator`; iterate with `ReadNext()`. The SDK may replay the stream on retry — consumers must tolerate duplicate rows or dedupe (see `StreamQuerySelect` comment in `basic_example.cpp`). +- **Table Service scan** — `session.StreamExecuteScanQuery(...)` for unbounded scans without the Table `ExecuteDataQuery` result cap. +- **Keyset pagination** — outer loop with cursor predicate over the primary key; each page is its own `RetryQuerySync` call. See [`../working-with-data.md`](../working-with-data.md) and `examples/pagination/pagination.cpp`. + +If using `ExecuteDataQuery`, check `TResultSet::Truncated()` — a `true` value means the result was cut off and the read must be continued (pagination or streaming). diff --git a/skills/ydb-table/rules/embed/cpp.md b/skills/ydb-table/rules/embed/cpp.md new file mode 100644 index 0000000..f8d6fdf --- /dev/null +++ b/skills/ydb-table/rules/embed/cpp.md @@ -0,0 +1,144 @@ +# C++ SDK (`ydb-cpp-sdk`) — anti-patterns + +Audit rules for application code talking to YDB through the C++ SDK. Each rule is self-contained: the surface skill must produce correct audit output on its own. For positive patterns, see [`../../references/embed/cpp.md`](../../references/embed/cpp.md). + +### RULE-CPP-01: Reading "all matching rows" through Table Service `ExecuteDataQuery` without pagination + +**Severity**: Critical + +**What to look for**: `session.ExecuteDataQuery(...)` on `NYdb::NTable::TSession` where the read is **intended to exhaust a result set** — an unbounded `SELECT` (no key-equality `WHERE`), a range predicate over many rows, anything followed by a `TResultSetParser` loop that processes the full match — and there is no check of `TResultSet::Truncated()`, no outer keyset-pagination loop, and no `StreamExecuteScanQuery`. A bounded point read (`WHERE id = $id` with a single key) or a small explicit `LIMIT` where the caller cannot accept more rows by construction is not the target. `StreamExecuteScanQuery` and Query Service `StreamExecuteQuery` are *not* targets — they are the legitimate streaming paths. + +**Problem**: Table Service `ExecuteDataQuery` caps its result set; `TResultSet::Truncated()` signals the match was cut off. Code that assumes one call returns everything matching will under-process in production once the match exceeds the cap. Ignoring `Truncated()` or never paginating is the same structural failure as silently dropping rows. (Query Service `StreamExecuteQuery` does not set `Truncated()` the same way — it streams `ReadNext()` parts. This rule is scoped to `ExecuteDataQuery`; the streaming path has its own concern, RULE-CPP-09.) + +**Fix** — pick one of three structural paths: + +- **Switch to Query Service streaming**: rewrite through `TQueryClient::StreamExecuteQuery` and iterate `ReadNext()` parts (design for possible duplicate rows on retry — see RULE-CPP-09). +- **Keyset-paginate**: wrap the call in an outer loop with a cursor predicate and `ORDER BY` over the table's primary key; terminate when a page returns zero rows. The loop continuation is driven by rows / cursor, not by retry status — see RULE-CPP-04. +- **Table scan stream**: use `StreamExecuteScanQuery` when staying on the Table client. + +**Source**: `ydb-platform/ydb-cpp-sdk` — `TResultSet::Truncated()` in (backed by `Ydb::ResultSet::truncated()` in ). + +### RULE-CPP-02: External state mutation from inside the retry lambda + +**Severity**: High + +**What to look for**: mutations of state outside the `RetryQuerySync` / `RetryOperationSync` lambda *while the lambda is still running* — `outerVec.push_back(row)` mid-iteration, `outerMap[k] = v` after each row, capturing a per-attempt result handle into an outer reference before the lambda returns success, emitting side effects (RPC, log, charge) from inside the lambda body. The single allowed pattern is the final `outerVar = local` assignment on the success path, immediately before the lambda returns a successful `TStatus` — that one is what the Fix prescribes and must not be flagged. + +**Problem**: the retry lambda is the unit of work — the SDK invokes it again on every retryable error (`ABORTED`, `UNAVAILABLE`, `BAD_SESSION`, etc.). Mutations to external state survive across attempts and produce wrong values: `push_back` duplicates on retry, an outer result reference may hold a stream from a failed attempt, an outer map accumulates entries from partial reads. Build the result as a per-attempt local; assign to the outer variable only when returning success. + +**Fix**: build the result inside the lambda as a per-attempt local; assign to the outer variable only on the path that returns a successful `TStatus`. The lambda owns all data processing; only the success decision crosses the boundary. + +**Source**: `ydb-platform/ydb-cpp-sdk` — retry loop in (`RetryQuerySync`); session-pool retry in . + +### RULE-CPP-03: Missing `.Idempotent(true)` on `RetryQuerySync` / `RetryOperationSync` + +**Severity**: High + +**What to look for**: a mismatch between the lambda's idempotency and the setting in either direction. + +- **Missing flag on safe-to-replay work**: `RetryQuerySync` / `RetryOperationSync` whose lambda body is replay-safe (a read; an `UPSERT` keyed on a value the caller already has; a write guarded by an idempotency key) but no `NYdb::NRetry::TRetryOperationSettings().Idempotent(true)` passed as the settings argument. Fix: add `.Idempotent(true)`. +- **Flag set on non-idempotent work**: retry call carrying `.Idempotent(true)` while the lambda performs a non-idempotent write (counter increment, money transfer, raw `INSERT` of a generated row). Fix: remove the flag *and* rework the write to be idempotent before opting back in. + +**Problem**: `GetNextStep` in the SDK retry context classifies `UNDETERMINED` and `TRANSPORT_UNAVAILABLE` as retryable only when `Settings_.Idempotent_` is true. These are transport-class failures where the server may have already committed the write before the client saw the failure. The SDK cannot infer idempotency from the API surface — only the developer knows. Setting the flag on a non-idempotent write causes double effect; omitting it on an idempotent write makes the program propagate errors it could have absorbed. + +**Fix**: pass `NYdb::NRetry::TRetryOperationSettings().Idempotent(true)` (or `NYdb::NTable::TRetryOperationSettings().Idempotent(true)`) when the inner work is idempotent. For non-idempotent writes, do not set the flag; make the write idempotent first (client-generated request id, dedup guard) before opting in. + +**Source**: `ydb-platform/ydb-cpp-sdk` — `GetNextStep` in (`UNDETERMINED`, `TRANSPORT_UNAVAILABLE` branches). + +### RULE-CPP-04: `for` loop wrapping `RetryQuerySync` / `RetryOperationSync` for retry purposes + +**Severity**: High + +**What to look for**: an outer `for` / `while` block whose body calls `RetryQuerySync` or `RetryOperationSync` and **decides whether to repeat based on the returned `TStatus`** (success-or-error). Common shapes: `for (int attempt = 0; attempt < N; ++attempt) { status = client.RetryQuerySync(...); if (status.IsSuccess()) break; }`, or a `while (!status.IsSuccess())` wrapper. + +**Not a target**: a keyset-pagination outer loop whose continuation depends on rows returned, a cursor advancing, or an `EOS` flag — even though it also wraps `RetryQuerySync` (see RULE-CPP-01 fix). The signal is what drives the next iteration, not the loop syntax. + +**Problem**: `RetryQuerySync` / `RetryOperationSync` already retry the lambda internally with classified backoff (`retry.h` `GetNextStep`). A status-driven outer loop multiplies the backoff schedule, re-runs work on non-retryable errors the SDK has correctly decided not to retry, and silently inflates the retry budget the caller thinks they configured. + +**Fix**: remove the outer status-driven loop. Express tuning through `TRetryOperationSettings` (`MaxRetries`, backoff settings), not by wrapping the SDK retrier. + +**Source**: `ydb-platform/ydb-cpp-sdk` — (`RetryQuerySync` implementation); classification in (`GetNextStep`). + +### RULE-CPP-05: Custom retrier with `Sleep` wrapping YDB calls + +**Severity**: High + +**What to look for**: `for` loop with explicit `Sleep` / `std::this_thread::sleep_for` / manual backoff between attempts, calling any YDB-facing method inside — `session.ExecuteQuery`, `session.ExecuteDataQuery`, `client.GetSession`, `BulkUpsert`, or arbitrary helpers that call into the SDK. + +**Problem**: a hand-rolled retrier replays every non-success `TStatus` indiscriminately. Non-retryable failures (`PRECONDITION_FAILED`, schema mismatch) burn the retry budget on errors that will never recover, and conditionally-retryable failures (`UNDETERMINED`, `TRANSPORT_UNAVAILABLE`) get retried with no idempotency gate — which can double-apply a non-idempotent write. The SDK retrier classifies via `GetNextStep` and only retries the conditional bucket when `.Idempotent(true)` is set; backoff with jitter comes from `FastBackoffSettings` / `SlowBackoffSettings`. + +**Fix**: delete the custom loop and use `RetryQuerySync` / `RetryOperationSync`; express tuning through `TRetryOperationSettings` rather than caller-side `for`/`Sleep` code. + +**Source**: `ydb-platform/ydb-cpp-sdk` — ; . + +### RULE-CPP-06: `NYdb::TDriver` constructed per request instead of once per process + +**Severity**: High + +**What to look for**: `NYdb::TDriver` (or a `TDriverConfig` that immediately feeds one) constructed **inside** a request handler, controller method, RPC stub body, per-iteration of a worker loop, or any per-call helper that talks to YDB — then used to build a `TQueryClient` / `TTableClient` for a single piece of work. The grep signal is `TDriver driver(...)` or `NYdb::TDriver(...)` inside a function that is called more than once during the process lifetime. A driver constructed once at program startup (e.g. in `main`) and threaded to handlers is *not* the target. + +**Problem**: `TDriver` owns the gRPC channel pool, endpoint-discovery state, and background worker threads. Constructing one per request pays full endpoint discovery, gRPC channel setup, and TLS handshake before every YDB call, then tears the state back down — a latency cliff under any non-trivial RPS and a connection-churn signal at the cluster. Failing to call `driver.Stop(true)` on the short-lived driver also leaks the background threads. + +**Fix**: hold a single `NYdb::TDriver` for the process lifetime (build it at startup, stop it at shutdown via `driver.Stop(true)`); pass it to surface clients (`TQueryClient`, `TTableClient`) which are cheap to construct on demand. The upstream `basic_example` does exactly this in `main.cpp`. + +**Source**: `ydb-platform/ydb-cpp-sdk` — `TDriver` lifecycle in ; one-driver-per-process pattern in (driver constructed at `main` and `Stop(true)` on exit). + +### RULE-CPP-07: Non-parametrized YQL — `std::format` / string concat into query text + +**Severity**: Critical + +**What to look for**: `std::format` / `fmt::format` building a query string, `"SELECT ... " + variable` concatenation, rendering caller values into the YQL literal rather than binding them through `TParamsBuilder`. + +**Problem**: two failure modes the SDK's parameter API closes at once. Injection — caller values become YQL syntax when concatenated. Per-call query-plan miss — YDB's query-compilation cache works best with stable query text; every distinct rendered string defeats reuse and forces re-compilation work on the server. + +**Fix**: bind values through `TParamsBuilder().AddParam("$name").(value).Build()` and pass the resulting `TParams` to `ExecuteQuery` / `ExecuteDataQuery`. A `DECLARE` block in the query body is optional for scalars — types are inferred from bound values — and is justified for compound shapes (`List>`). + +**Source**: `ydb-platform/ydb-cpp-sdk` — `TParamsBuilder` in . YQL parameters: . + +### RULE-CPP-08: Explicit `BeginTransaction` + `Commit` when fused `TTxControl` suffices + +**Severity**: Medium + +**What to look for**: `session.BeginTransaction(...)` followed by one `ExecuteQuery` / `ExecuteDataQuery` and `tx.Commit()` for **single-statement** work where `TTxControl::BeginTx(...).CommitTx()` on the query call would fuse begin, execute, and commit into fewer round trips. Multi-step flows that genuinely need client logic between statements (as in `MultiStep` in the basic example) are not the target. + +**Problem**: separate begin and commit RPCs add latency and session churn. The upstream basic example documents that inline `TTxControl` on `ExecuteQuery` is preferable in most cases because it avoids additional hops to the cluster. + +**Fix**: for single-statement transactions, pass `TTxControl::BeginTx(TTxSettings::SerializableRW()).CommitTx()` as the second argument to `ExecuteQuery` / `ExecuteDataQuery` instead of explicit `BeginTransaction` + `Commit()`. + +**Source**: `ydb-platform/ydb-cpp-sdk` — `ExplicitTcl` comment in (lines 354–357). + +### RULE-CPP-09: `StreamExecuteQuery` consumer assumes exactly-once rows + +**Severity**: High + +**What to look for**: `StreamExecuteQuery` / `TExecuteQueryIterator::ReadNext` loop that processes rows with no deduplication strategy, no idempotent sink, and no comment acknowledging retry-induced duplicates — especially when the stream call sits inside or under `RetryQuerySync`. + +**Problem**: duplicate lines in the output stream are possible due to the external retryer. A consumer that counts rows, bills per row, or appends to an external queue without dedupe will double-count on replay. + +**Fix**: design the sink to be idempotent (keyed UPSERT, dedup by primary key), or track the last-seen cursor and skip duplicates. Do not assume one physical row per logical row in a retried stream. + +**Source**: `ydb-platform/ydb-cpp-sdk` — comment in `StreamQuerySelect` in (line 443). + +### RULE-CPP-10: `INSERT INTO` inside a retry lambda with `.Idempotent(true)` + +**Severity**: High + +**What to look for**: a YQL query string starting with `INSERT INTO` executed by `session.ExecuteQuery` / `session.ExecuteDataQuery` inside the lambda body of `RetryQuerySync` / `RetryOperationSync` **whose settings carry `.Idempotent(true)`**. The grep pair is `INSERT INTO` co-located with `TRetryOperationSettings().Idempotent(true)`. A `RetryQuerySync` without the flag is not the target — that's RULE-CPP-03's other half. + +**Problem**: `INSERT INTO` in YDB fails with `PRECONDITION_FAILED` (`Operation aborted due to constraint violation: insert_pk`) when the primary key already exists. `.Idempotent(true)` enables the SDK retrier to replay on `UNDETERMINED` / `TRANSPORT_UNAVAILABLE` — situations where the **first attempt may have already committed**. The replay then hits `PRECONDITION_FAILED`, the SDK surfaces it as a non-retryable terminal status, and the caller sees a hard failure for a write that did in fact land. Net effect: a write that succeeded looks failed; downstream compensations / re-tries propagate the wrong outcome. + +**Fix**: pick one — (a) switch the statement to `UPSERT INTO` (replay-safe by construction; converges to the same final state), (b) keep `INSERT INTO` and drop `.Idempotent(true)` so the SDK propagates `UNDETERMINED` instead of replaying, or (c) wrap the INSERT in a server-side idempotency guard (existence check + INSERT in one transaction, or a dedup table keyed on a client-generated request id) before opting back into `.Idempotent(true)`. + +**Source**: `ydb-platform/ydb-cpp-sdk` — `PRECONDITION_FAILED` in ; idempotent-gated retry of `UNDETERMINED` / `TRANSPORT_UNAVAILABLE` in (`GetNextStep`). YQL `INSERT` semantics: (the page documents the `PRECONDITION_FAILED` / `insert_pk` failure on duplicate primary key). + +### RULE-CPP-11: DDL (`ExecuteSchemeQuery` / `CreateTable`) executed inside a `RetryQuerySync` / `RetryOperationSync` lambda + +**Severity**: Medium + +**What to look for**: `session.ExecuteSchemeQuery(...)` or `tableClient.CreateTable(...)` (or a YQL string beginning with `CREATE TABLE` / `ALTER TABLE` / `DROP TABLE` passed to a query-execute call) appearing inside the lambda body of `RetryQuerySync` / `RetryOperationSync` — typically alongside DML on the same retry path. DDL run *outside* a retry loop, in dedicated migration / setup code, is not the target. + +**Problem**: DDL operations are not transactional and not modeled by the retry classifier the same way DML errors are. `RetryQuerySync`'s `GetNextStep` reacts to `ABORTED`, `UNAVAILABLE`, `BAD_SESSION` etc. with assumptions about transactional rollback / session-reset semantics that don't hold for schema changes. A retried `CREATE TABLE` on `BAD_SESSION` can land twice; a retried `ALTER TABLE` mid-failure leaves the schema in a partially-applied state; co-locating DDL with DML in the same lambda binds the retry behaviour of both to the worst case of either. + +**Fix**: run schema-creation / migration steps in dedicated, idempotent setup code outside the SDK retrier — typically a startup-time bootstrap that uses `ExecuteSchemeQuery` directly and reasons about its own failure mode. Keep `RetryQuerySync` / `RetryOperationSync` lambdas DML-only. If runtime-issued DDL is unavoidable, give it its own bounded retry strategy rather than reusing the query-retry classifier. + +**Source**: `ydb-platform/ydb-cpp-sdk` — `ExecuteSchemeQuery` in ; `CreateTable` in the same header; query-classifier assumptions in (`GetNextStep`). diff --git a/tests/routing/09-cpp-audit-table.yaml b/tests/routing/09-cpp-audit-table.yaml new file mode 100644 index 0000000..4db502d --- /dev/null +++ b/tests/routing/09-cpp-audit-table.yaml @@ -0,0 +1,31 @@ +description: Routing · C++ RetryQuerySync audit routes to ydb-table + +vars: + user_prompt: | + Please audit this C++ code that uses `ydb-cpp-sdk` and + `RetryQuerySync`. Are there YDB anti-patterns? + + ```cpp + client.RetryQuerySync([](NYdb::NQuery::TSession session) { + return session.ExecuteQuery("SELECT 1", + NYdb::NQuery::TTxControl::BeginTx( + NYdb::NQuery::TTxSettings::OnlineRO()).CommitTx() + ).GetValueSync(); + }); + ``` + +assert: + - type: llm-rubric + value: | + The response should: + - Treat this as ydb-table scope (C++ application code against + YDB tables), not ydb-core-only onboarding. + - Load or reference C++ SDK guidance (`RULE-CPP-*` or + `references/embed/cpp.md` patterns) rather than saying C++ + SDK guidance is unavailable. + - Mention `RetryQuerySync` / idempotency or parameterization + where relevant to the snippet. + - Not invent YQL or SDK APIs absent from the loaded skills. + + Full fail if the response says C++ SDK-specific guidance is not + shipped in this skill. diff --git a/tests/routing/descriptions.md b/tests/routing/descriptions.md index e178e2a..4177fba 100644 --- a/tests/routing/descriptions.md +++ b/tests/routing/descriptions.md @@ -5,4 +5,4 @@ slug: ydb-core description: Entry point and router for YDB-related work. Orients an LLM about YDB — what it is, what surfaces it exposes, where to read upstream docs, which specialist skill to load for surface-specific questions. Covers SDK packages, connection strings and auth, local Docker, schema fundamentals, common integrations (ORMs, migration tools, Terraform), client-side balancing, and session lifecycle / resilience under rolling restart. Use when the user asks a general YDB question, mentions YDB without naming a specific surface (queries, topics, coordination), needs setup help, asks about balancing policies or `BAD_SESSION` / `shutdownHint` / rolling restart, or when another YDB skill needs foundational context. Also triggers on `grpcs://` / `grpc://`, `ydb profile`, `ydb scheme`, `balancers.RandomChoice`, `balancers.PreferNearestDC`, `ydb.WithBalancer`, `session-balancer`, and "getting started with YDB" prompts. slug: ydb-table -description: Writing and auditing code that runs YQL against YDB tables. Use when the user writes a query, designs a table or primary key, reads an `EXPLAIN`, or asks to review Java (ydb-java-sdk, ydb-jdbc-driver, Hibernate, Spring Data JPA) or Go (`ydb-go-sdk/v3`) application code that talks to YDB. Triggers on YQL keywords (`UPSERT`, `SELECT`, `DECLARE`, `AS_TABLE`, `VIEW `, `CREATE TABLE`, `ALTER TABLE`, `EXPLAIN`), on the `BulkUpsert` SDK API, on JDBC / Hibernate / Spring symbols (`JpaRepository`, `findAllById`, `saveAll`, `deleteAllByIdInBatch`, `hibernate.jdbc.batch_size`, `@Version`, `@Retryable`, `SQLRecoverableException`, `SQLTransientException`), on `ydb-go-sdk/v3` symbols (`ydb.Open`, `db.Query().Do`, `db.Query().DoTx`, `db.Table().Do`, `query.WithIdempotent`, `query.WithCommit`, `query.WithStatsMode`, `query.Stats`, `query.StatsModeBasic`, `result.Close`, `ydb.WithLazyTx`, `ydb.ParamsBuilder`, `s.BeginTransaction`, `table.TxControl`, `table.BeginTx`, `BulkUpsertDataRows`, `balancers.PreferLocalDC`, `balancers.PreferNearestDC`), on flaky empty/zero query stats right after `Query` returns, on YDB transaction-mode names (`SerializableRW`, `SnapshotRO`), and on PostgreSQL / MySQL → YDB conversion prompts. For other SDKs (Python, C++, C#) this skill covers only the YQL / schema / transaction-mode side; SDK-specific guidance for those languages is not in this skill yet — say so and point at upstream docs. +description: Writing and auditing code that runs YQL against YDB tables. Use when the user writes a query, designs a table or primary key, reads an `EXPLAIN`, or asks to review Java (ydb-java-sdk, ydb-jdbc-driver, Hibernate, Spring Data JPA), Go (`ydb-go-sdk/v3`), or C++ (`ydb-cpp-sdk`) application code that talks to YDB. Triggers on YQL keywords (`UPSERT`, `SELECT`, `DECLARE`, `AS_TABLE`, `VIEW `, `CREATE TABLE`, `ALTER TABLE`, `EXPLAIN`), on the `BulkUpsert` SDK API, on JDBC / Hibernate / Spring symbols (`JpaRepository`, `findAllById`, `saveAll`, `deleteAllByIdInBatch`, `hibernate.jdbc.batch_size`, `@Version`, `@Retryable`, `SQLRecoverableException`, `SQLTransientException`), on `ydb-go-sdk/v3` symbols (`ydb.Open`, `db.Query().Do`, `db.Query().DoTx`, `db.Table().Do`, `query.WithIdempotent`, `query.WithCommit`, `query.WithStatsMode`, `query.Stats`, `query.StatsModeBasic`, `result.Close`, `ydb.WithLazyTx`, `ydb.ParamsBuilder`, `s.BeginTransaction`, `table.TxControl`, `table.BeginTx`, `BulkUpsertDataRows`, `balancers.PreferLocalDC`, `balancers.PreferNearestDC`), on `ydb-cpp-sdk` symbols (`#include loadUsers(NYdb::NQuery::TQueryClient& client) { + std::vector out; + ThrowOnError(client.RetryQuerySync([&out](NYdb::NQuery::TSession session) { + auto result = session.ExecuteQuery( + "SELECT id, name FROM users", + TTxControl::BeginTx(TTxSettings::SnapshotRO()).CommitTx() + ).GetValueSync(); + if (!result.IsSuccess()) { + return result; + } + TResultSetParser parser(result.GetResultSet(0)); + while (parser.TryNextRow()) { + out.push_back({ + *parser.ColumnParser("id").GetOptionalUint64(), + *parser.ColumnParser("name").GetOptionalUtf8(), + }); + } + return result; + })); + return out; + } + ``` + +assert: + - type: llm-rubric + value: | + The response should: + - Cite `RULE-CPP-02` by ID. + - Explain that `push_back` on an outer vector inside + `RetryQuerySync` survives across retry attempts and duplicates + rows on replay. + - Recommend building a local vector inside the lambda and + assigning to `out` only on the success path. + - Not invent a different RULE-CPP-XX ID. + + Full fail if the response says duplicates are expected YDB + behavior without flagging the closure mutation pattern. diff --git a/tests/ydb-table/cpp-rule-cpp-03-missing-idempotent.yaml b/tests/ydb-table/cpp-rule-cpp-03-missing-idempotent.yaml new file mode 100644 index 0000000..3d0f2ef --- /dev/null +++ b/tests/ydb-table/cpp-rule-cpp-03-missing-idempotent.yaml @@ -0,0 +1,41 @@ +description: C++ audit · Missing Idempotent on RetryQuerySync (RULE-CPP-03) + +vars: + user_prompt: | + Review this C++ function. It UPSERTs an event keyed by client-side + `eventId`. During cluster upgrades we see sporadic errors, but the + row is already in the table when we check. Is the SDK call correct? + + ```cpp + TStatus recordEvent(NYdb::NQuery::TQueryClient& client, uint64_t eventId, + const std::string& payload) { + return client.RetryQuerySync([eventId, payload](TSession session) { + auto params = TParamsBuilder() + .AddParam("$id").Uint64(eventId).Build() + .AddParam("$payload").Utf8(payload).Build() + .Build(); + return session.ExecuteQuery( + R"(UPSERT INTO events (id, payload) VALUES ($id, $payload))", + TTxControl::BeginTx(TTxSettings::SerializableRW()).CommitTx(), + params + ).GetValueSync(); + }); + } + ``` + +assert: + - type: llm-rubric + value: | + The response should: + - Cite `RULE-CPP-03` by ID. + - Identify missing `TRetryOperationSettings().Idempotent(true)` + on `RetryQuerySync`. + - Explain that `UNDETERMINED` / `TRANSPORT_UNAVAILABLE` are only + retried when idempotent is set, matching the user's symptom of + committed writes after client-side failure. + - Recommend adding `.Idempotent(true)` because the UPSERT is + keyed on a client-generated id. + - Not invent a different RULE-CPP-XX ID. + + Full fail if the response says the code is fine or recommends an + outer manual retry loop instead of the idempotent flag. diff --git a/tests/ydb-table/cpp-rule-cpp-04-outer-retry-loop.yaml b/tests/ydb-table/cpp-rule-cpp-04-outer-retry-loop.yaml new file mode 100644 index 0000000..091350b --- /dev/null +++ b/tests/ydb-table/cpp-rule-cpp-04-outer-retry-loop.yaml @@ -0,0 +1,46 @@ +description: C++ audit · Outer for loop around RetryQuerySync (RULE-CPP-04) + +vars: + user_prompt: | + Review this C++ retry wrapper around a YDB update. We added it + after seeing `ABORTED` under contention. Is this the right pattern? + + ```cpp + TStatus updateBalance(NYdb::NQuery::TQueryClient& client, uint64_t userId, + int64_t delta) { + TStatus status(EStatus::SUCCESS, NYdb::NIssue::TIssues()); + for (int attempt = 0; attempt < 5; ++attempt) { + status = client.RetryQuerySync([userId, delta](TSession session) { + auto params = TParamsBuilder() + .AddParam("$id").Uint64(userId).Build() + .AddParam("$delta").Int64(delta).Build() + .Build(); + return session.ExecuteQuery( + "UPDATE accounts SET balance = balance + $delta WHERE id = $id", + TTxControl::BeginTx(TTxSettings::SerializableRW()).CommitTx(), + params + ).GetValueSync(); + }); + if (status.IsSuccess()) { + break; + } + } + return status; + } + ``` + +assert: + - type: llm-rubric + value: | + The response should: + - Cite `RULE-CPP-04` by ID. + - Explain that `RetryQuerySync` already retries internally and + an outer `for` multiplies backoff / retry budget. + - Recommend removing the outer loop and tuning + `TRetryOperationSettings` instead. + - Note the inner write is non-idempotent (`balance + $delta`) so + `.Idempotent(true)` must not be set blindly — separate from + but compatible with RULE-CPP-04. + - Not invent a different RULE-CPP-XX ID. + + Full fail if the response endorses the outer loop as best practice. diff --git a/tests/ydb-table/cpp-rule-cpp-05-custom-sleep-retrier.yaml b/tests/ydb-table/cpp-rule-cpp-05-custom-sleep-retrier.yaml new file mode 100644 index 0000000..094a5c7 --- /dev/null +++ b/tests/ydb-table/cpp-rule-cpp-05-custom-sleep-retrier.yaml @@ -0,0 +1,39 @@ +description: C++ audit · Custom Sleep retrier around YDB calls (RULE-CPP-05) + +vars: + user_prompt: | + Review this C++ helper. We retry YDB ourselves with exponential + backoff when the cluster is flaky. Any issues? + + ```cpp + TStatus runQueryWithBackoff(NYdb::NQuery::TQueryClient& client, + const std::string& sql) { + auto session = client.GetSession().GetValueSync().GetSession(); + TStatus status(EStatus::GENERIC_ERROR, NYdb::NIssue::TIssues()); + for (int i = 0; i < 10; ++i) { + status = session.ExecuteQuery( + sql, + TTxControl::BeginTx(TTxSettings::SnapshotRO()).CommitTx() + ).GetValueSync(); + if (status.IsSuccess()) { + return status; + } + std::this_thread::sleep_for(std::chrono::milliseconds(50 * (1 << i))); + } + return status; + } + ``` + +assert: + - type: llm-rubric + value: | + The response should: + - Cite `RULE-CPP-05` by ID. + - Flag the hand-rolled `sleep_for` loop around bare + `ExecuteQuery` instead of `RetryQuerySync`. + - Explain that custom retriers miss SDK error classification + and idempotency gating. + - Recommend `RetryQuerySync` with `TRetryOperationSettings`. + - Not invent a different RULE-CPP-XX ID. + + Full fail if the response says the custom backoff is fine. diff --git a/tests/ydb-table/cpp-rule-cpp-06-driver-per-call.yaml b/tests/ydb-table/cpp-rule-cpp-06-driver-per-call.yaml new file mode 100644 index 0000000..5c0857c --- /dev/null +++ b/tests/ydb-table/cpp-rule-cpp-06-driver-per-call.yaml @@ -0,0 +1,55 @@ +description: C++ audit · TDriver constructed per request instead of once per process (RULE-CPP-06) + +vars: + user_prompt: | + Review this C++ HTTP handler. Latency feels high under load even + for tiny queries. Is the YDB setup right? + + ```cpp + void handleGetUser(const HttpRequest& req, HttpResponse& res) { + auto cfg = NYdb::TDriverConfig() + .SetEndpoint("grpcs://ydb.example.net:2135") + .SetDatabase("/prod/users") + .SetAuthToken(std::getenv("YDB_TOKEN")); + NYdb::TDriver driver(cfg); + NYdb::NQuery::TQueryClient client(driver); + + ThrowOnError(client.RetryQuerySync([&](NYdb::NQuery::TSession session) { + auto params = NYdb::TParamsBuilder() + .AddParam("$id").Uint64(req.userId).Build() + .Build(); + auto result = session.ExecuteQuery( + "SELECT name FROM users WHERE id = $id", + NYdb::NQuery::TTxControl::BeginTx( + NYdb::NQuery::TTxSettings::SnapshotRO()).CommitTx(), + params + ).GetValueSync(); + if (!result.IsSuccess()) { + return result; + } + writeUserToResponse(result, res); + return result; + })); + + driver.Stop(true); + } + ``` + +assert: + - type: llm-rubric + value: | + The response should: + - Cite `RULE-CPP-06` by ID. + - Identify that `NYdb::TDriver` is constructed inside the + per-request handler rather than once at process startup. + - Explain the cost: endpoint discovery, gRPC channel setup, and + TLS handshake on every request, with corresponding latency + and connection churn at the cluster. + - Recommend holding one long-lived `TDriver` for the process + and passing it to per-request `TQueryClient` construction, + with `driver.Stop(true)` at shutdown (not per request). + - Not invent a different RULE-CPP-XX ID. + + Full fail if the response says per-request driver construction + is fine or recommends only tuning timeouts without addressing + the driver lifecycle. diff --git a/tests/ydb-table/cpp-rule-cpp-07-string-built-yql.yaml b/tests/ydb-table/cpp-rule-cpp-07-string-built-yql.yaml new file mode 100644 index 0000000..62c7156 --- /dev/null +++ b/tests/ydb-table/cpp-rule-cpp-07-string-built-yql.yaml @@ -0,0 +1,33 @@ +description: C++ audit · Non-parametrized YQL via std::format (RULE-CPP-07) + +vars: + user_prompt: | + Review this C++ lookup. User ids come from an HTTP request. + + ```cpp + TStatus fetchUser(NYdb::NQuery::TQueryClient& client, uint64_t userId) { + auto sql = std::format("SELECT name FROM users WHERE id = {}", userId); + return client.RetryQuerySync([&sql](TSession session) { + return session.ExecuteQuery( + sql, + TTxControl::BeginTx(TTxSettings::SnapshotRO()).CommitTx() + ).GetValueSync(); + }, NYdb::NRetry::TRetryOperationSettings().Idempotent(true)); + } + ``` + +assert: + - type: llm-rubric + value: | + The response should: + - Cite `RULE-CPP-07` by ID. + - Identify `std::format` embedding `userId` into query text + instead of `TParamsBuilder` binding. + - Explain injection risk and plan-cache churn from distinct + query strings. + - Show or describe binding via `TParamsBuilder` and a + parameterized query with `$id`. + - Not invent a different RULE-CPP-XX ID. + + Full fail if the response says formatting a numeric id is safe + enough to skip parameterization. diff --git a/tests/ydb-table/cpp-rule-cpp-08-explicit-begin-commit.yaml b/tests/ydb-table/cpp-rule-cpp-08-explicit-begin-commit.yaml new file mode 100644 index 0000000..1c0f334 --- /dev/null +++ b/tests/ydb-table/cpp-rule-cpp-08-explicit-begin-commit.yaml @@ -0,0 +1,48 @@ +description: C++ audit · Explicit BeginTransaction when fused TTxControl suffices (RULE-CPP-08) + +vars: + user_prompt: | + Review this single-statement C++ update. Latency seems high — extra + round trips? + + ```cpp + TStatus markProcessed(NYdb::NQuery::TQueryClient& client, uint64_t id) { + return client.RetryQuerySync([id](TQueryClient queryClient) -> TStatus { + auto session = queryClient.GetSession().GetValueSync().GetSession(); + auto begin = session.BeginTransaction(TTxSettings::SerializableRW()) + .GetValueSync(); + if (!begin.IsSuccess()) { + return begin; + } + auto tx = begin.GetTransaction(); + auto params = TParamsBuilder() + .AddParam("$id").Uint64(id).Build().Build(); + auto result = session.ExecuteQuery( + "UPDATE logs SET processed = true WHERE id = $id", + TTxControl::Tx(tx), + params + ).GetValueSync(); + if (!result.IsSuccess()) { + return result; + } + return tx.Commit().GetValueSync(); + }); + } + ``` + +assert: + - type: llm-rubric + value: | + The response should: + - Cite `RULE-CPP-08` by ID. + - Identify separate `BeginTransaction` + `Commit` for a single + `ExecuteQuery` where `TTxControl::BeginTx(...).CommitTx()` + would fuse round trips. + - Reference the upstream basic-example guidance that inline + tx control is preferable for most single-statement work. + - Not flag multi-step `MultiStep`-style flows (this snippet is + single-statement). + - Not invent a different RULE-CPP-XX ID. + + Full fail if the response says explicit begin/commit is always + required. diff --git a/tests/ydb-table/cpp-rule-cpp-09-stream-duplicates.yaml b/tests/ydb-table/cpp-rule-cpp-09-stream-duplicates.yaml new file mode 100644 index 0000000..703a0d6 --- /dev/null +++ b/tests/ydb-table/cpp-rule-cpp-09-stream-duplicates.yaml @@ -0,0 +1,59 @@ +description: C++ audit · StreamExecuteQuery without duplicate handling (RULE-CPP-09) + +vars: + user_prompt: | + Review this C++ export job. It streams seasons for billing. We + occasionally double-charge after network blips. Is the stream + handling correct? + + ```cpp + void billSeasons(NYdb::NQuery::TQueryClient& client) { + ThrowOnError(client.RetryQuerySync([&](TQueryClient qc) -> TStatus { + auto stream = qc.StreamExecuteQuery( + "SELECT series_id, season_id FROM seasons", + TTxControl::NoTx() + ).GetValueSync(); + if (!stream.IsSuccess()) { + return stream; + } + bool eos = false; + while (!eos) { + auto part = stream.ReadNext().ExtractValueSync(); + if (!part.IsSuccess()) { + eos = true; + if (!part.EOS()) { + return part; + } + continue; + } + if (part.HasResultSet()) { + TResultSetParser parser(part.ExtractResultSet()); + while (parser.TryNextRow()) { + auto seriesId = *parser.ColumnParser("series_id") + .GetOptionalUint64(); + auto seasonId = *parser.ColumnParser("season_id") + .GetOptionalUint64(); + chargeCustomer(seriesId, seasonId); + } + } + } + return TStatus(EStatus::SUCCESS, NYdb::NIssue::TIssues()); + })); + } + ``` + +assert: + - type: llm-rubric + value: | + The response should: + - Cite `RULE-CPP-09` by ID. + - Explain that `StreamExecuteQuery` inside a retrier can emit + duplicate rows on replay (per upstream basic example). + - Connect this to double-charging via `chargeCustomer` with no + dedupe / idempotent sink. + - Recommend idempotent billing (keyed UPSERT, dedup table) or + cursor-based skip of already-seen keys. + - Not invent a different RULE-CPP-XX ID. + + Full fail if the response blames only application logic outside + the stream without mentioning retry-induced duplicates. diff --git a/tests/ydb-table/cpp-rule-cpp-10-insert-with-idempotent.yaml b/tests/ydb-table/cpp-rule-cpp-10-insert-with-idempotent.yaml new file mode 100644 index 0000000..66394eb --- /dev/null +++ b/tests/ydb-table/cpp-rule-cpp-10-insert-with-idempotent.yaml @@ -0,0 +1,51 @@ +description: C++ audit · INSERT INTO inside .Idempotent(true) retry (RULE-CPP-10) + +vars: + user_prompt: | + Review this C++ event-recording function. During cluster upgrades + we get sporadic errors here even when the event row turns out to + already be in the table. Is the SDK call correct? + + ```cpp + TStatus recordEvent(NYdb::NQuery::TQueryClient& client, uint64_t eventId, + const std::string& payload) { + return client.RetryQuerySync( + [eventId, payload](NYdb::NQuery::TSession session) { + auto params = NYdb::TParamsBuilder() + .AddParam("$id").Uint64(eventId).Build() + .AddParam("$payload").Utf8(payload).Build() + .Build(); + return session.ExecuteQuery( + R"(INSERT INTO events (id, payload) VALUES ($id, $payload))", + NYdb::NQuery::TTxControl::BeginTx( + NYdb::NQuery::TTxSettings::SerializableRW()).CommitTx(), + params + ).GetValueSync(); + }, + NYdb::NRetry::TRetryOperationSettings().Idempotent(true)); + } + ``` + +assert: + - type: llm-rubric + value: | + The response should: + - Cite `RULE-CPP-10` by ID. + - Identify the combination of `INSERT INTO` plus + `.Idempotent(true)` on `RetryQuerySync` as the antipattern. + - Explain that `INSERT` is not idempotent under retry: a + replay after `UNDETERMINED` / `TRANSPORT_UNAVAILABLE` (which + the idempotent flag unlocks) hits `PRECONDITION_FAILED` + (`insert_pk` constraint violation) when the first attempt + actually committed, surfacing a hard failure for a write + that did land. + - Recommend one of: switch to `UPSERT INTO` (replay-safe by + construction), drop `.Idempotent(true)` and accept the + propagated `UNDETERMINED`, or add a server-side idempotency + guard (dedup table / existence check) before opting back in. + - Not invent a different RULE-CPP-XX ID. May also mention + RULE-CPP-03 if discussing the wider idempotency story, but + the primary citation should be CPP-10. + + Full fail if the response endorses `INSERT INTO` with + `.Idempotent(true)` as a safe pattern. diff --git a/tests/ydb-table/cpp-rule-cpp-11-ddl-in-retry.yaml b/tests/ydb-table/cpp-rule-cpp-11-ddl-in-retry.yaml new file mode 100644 index 0000000..183d0e2 --- /dev/null +++ b/tests/ydb-table/cpp-rule-cpp-11-ddl-in-retry.yaml @@ -0,0 +1,61 @@ +description: C++ audit · DDL (ExecuteSchemeQuery) inside RetryOperationSync lambda (RULE-CPP-11) + +vars: + user_prompt: | + Review this C++ helper. We use it to make sure a per-tenant + events table exists before writing the first row. It runs on the + hot write path. Is the YDB usage correct? + + ```cpp + TStatus ensureTenantTableAndInsert(NYdb::NTable::TTableClient& client, + const std::string& tablePath, + uint64_t eventId, + const std::string& payload) { + return client.RetryOperationSync( + [&](NYdb::NTable::TSession session) -> NYdb::TStatus { + auto createStatus = session.ExecuteSchemeQuery( + "CREATE TABLE IF NOT EXISTS `" + tablePath + "` (" + " id Uint64," + " payload Utf8," + " PRIMARY KEY (id))" + ).GetValueSync(); + if (!createStatus.IsSuccess()) { + return createStatus; + } + + auto params = NYdb::TParamsBuilder() + .AddParam("$id").Uint64(eventId).Build() + .AddParam("$payload").Utf8(payload).Build() + .Build(); + return session.ExecuteDataQuery( + "UPSERT INTO `" + tablePath + "` (id, payload) " + "VALUES ($id, $payload)", + NYdb::NTable::TTxControl::BeginTx( + NYdb::NTable::TTxSettings::SerializableRW()).CommitTx(), + params + ).GetValueSync(); + }, + NYdb::NTable::TRetryOperationSettings().Idempotent(true)); + } + ``` + +assert: + - type: llm-rubric + value: | + The response should: + - Cite `RULE-CPP-11` by ID. + - Identify that DDL (`session.ExecuteSchemeQuery` with a + `CREATE TABLE` payload) sits inside the `RetryOperationSync` + lambda alongside DML, so the SDK's query retry classifier + governs schema operations it was not designed for. + - Explain the failure mode: a retried DDL after `BAD_SESSION` + / `UNAVAILABLE` can land twice, an `ALTER`/`CREATE` can be + partially applied, and DML retry semantics do not translate + to schema mutations. + - Recommend moving table creation to dedicated startup / + migration code (run once, outside the SDK retrier) and + keeping the `RetryOperationSync` lambda DML-only. + - Not invent a different RULE-CPP-XX ID. + + Full fail if the response says co-locating DDL with DML inside + `RetryOperationSync` is a fine on-demand pattern. From 59d8839a9b7b5ae411a82bd9f29b41f32d31cf57 Mon Sep 17 00:00:00 2001 From: Artem Ermoshkin Date: Wed, 10 Jun 2026 13:40:22 +0300 Subject: [PATCH 2/2] fix doc references --- skills/ydb-table/references/embed/cpp.md | 22 +++++++++----- skills/ydb-table/rules/embed/cpp.md | 38 ++++++++++++------------ 2 files changed, 33 insertions(+), 27 deletions(-) diff --git a/skills/ydb-table/references/embed/cpp.md b/skills/ydb-table/references/embed/cpp.md index c818990..497d029 100644 --- a/skills/ydb-table/references/embed/cpp.md +++ b/skills/ydb-table/references/embed/cpp.md @@ -20,7 +20,7 @@ target_link_libraries(myapp PRIVATE YDB-CPP-SDK::Driver YDB-CPP-SDK::Query) Debian packages: `libydb-cpp-dev` (core), optional `libydb-cpp-iam-dev`. After install, pass `-DCMAKE_PREFIX_PATH=/usr/share/yandex`. -Worked examples: . +Primary documentation: . Runnable demos for orientation: — illustrative, not normative. ## Connection @@ -74,7 +74,7 @@ Three load-bearing pieces: - **`TParamsBuilder`** binds values — do not concatenate them into the query text. A leading `DECLARE` block is optional for scalars; use it for `List>` and other compound shapes. - **Build results inside the lambda.** Assign to outer variables only on the success path (the returned `TStatus` is success). Mutations to outer state mid-lambda survive across retry attempts. -Source: `examples/basic_example/basic_example.cpp`. +Source: YDB docs — retry recipe at and parameterized queries at . ## Transactions @@ -93,22 +93,26 @@ auto tx = *result.GetTransaction(); auto result2 = session.ExecuteQuery(query2, TTxControl::Tx(tx).CommitTx(), params2).GetValueSync(); ``` -Canonical multi-step shape: `examples/basic_example/basic_example.cpp` `MultiStep()`. +Per the YDB transactions guide: "if the transaction body is fully formed before accessing the database, it will be processed more efficiently" — fuse with `CommitTx()` whenever client logic doesn't sit between statements. For transaction modes and optimistic-locking consequences, see [`../working-with-data.md`](../working-with-data.md). +Source: YDB docs — . + ## Retries YDB uses optimistic concurrency — application code that talks to YDB must run inside SDK retriers, not as bare one-shot RPCs. -`RetryQuerySync` / `RetryOperationSync` classify errors internally (`src/client/impl/internal/retry/retry.h`): +`RetryQuerySync` / `RetryOperationSync` classify errors internally per the YDB status-code table: - **Always retried**: `ABORTED`, `OVERLOADED`, `CLIENT_RESOURCE_EXHAUSTED`, `UNAVAILABLE`, `BAD_SESSION`, `SESSION_BUSY` (session reset). - **Retried only when `.Idempotent(true)`**: `UNDETERMINED`, `TRANSPORT_UNAVAILABLE`. -- **Non-retryable**: schema/semantic failures — propagated to caller. +- **Non-retryable**: schema / semantic failures (`SCHEME_ERROR`, `BAD_REQUEST`, `PRECONDITION_FAILED`) — propagated to caller. No outer `for` loop or hand-rolled `Sleep` backoff around SDK calls. Tune via `TRetryOperationSettings` (`MaxRetries`, `FastBackoffSettings`, `SlowBackoffSettings`). +Source: YDB docs — status-code retry table at , retry recipe at , error-handling guidance at . + ## Result parsing ```cpp @@ -146,14 +150,16 @@ client.RetryOperationSync( When bulk is appropriate vs `AS_TABLE` in a transaction — see [`../working-with-data.md`](../working-with-data.md). -Source: `examples/bulk_upsert_simple/main.cpp`. +Source: YDB docs — batch upload guide at (non-transactional ingest path, incompatible with synchronous secondary indexes). ## Large reads Pick one structural path: -- **Query Service streaming** — `client.StreamExecuteQuery(...)` returns `TExecuteQueryIterator`; iterate with `ReadNext()`. The SDK may replay the stream on retry — consumers must tolerate duplicate rows or dedupe (see `StreamQuerySelect` comment in `basic_example.cpp`). +- **Query Service streaming** — `client.StreamExecuteQuery(...)` returns `TExecuteQueryIterator`; iterate with `ReadNext()`. The SDK may replay the lambda on retry, so an in-progress stream can re-emit already-seen rows — consumers must tolerate duplicates or dedupe. - **Table Service scan** — `session.StreamExecuteScanQuery(...)` for unbounded scans without the Table `ExecuteDataQuery` result cap. -- **Keyset pagination** — outer loop with cursor predicate over the primary key; each page is its own `RetryQuerySync` call. See [`../working-with-data.md`](../working-with-data.md) and `examples/pagination/pagination.cpp`. +- **Keyset pagination** — outer loop with cursor predicate over the primary key; each page is its own `RetryQuerySync` call. See [`../working-with-data.md`](../working-with-data.md). If using `ExecuteDataQuery`, check `TResultSet::Truncated()` — a `true` value means the result was cut off and the read must be continued (pagination or streaming). + +Source: YDB docs — paging guide at (keyset pagination over the primary key as the canonical strategy). diff --git a/skills/ydb-table/rules/embed/cpp.md b/skills/ydb-table/rules/embed/cpp.md index f8d6fdf..2578fc4 100644 --- a/skills/ydb-table/rules/embed/cpp.md +++ b/skills/ydb-table/rules/embed/cpp.md @@ -16,7 +16,7 @@ Audit rules for application code talking to YDB through the C++ SDK. Each rule i - **Keyset-paginate**: wrap the call in an outer loop with a cursor predicate and `ORDER BY` over the table's primary key; terminate when a page returns zero rows. The loop continuation is driven by rows / cursor, not by retry status — see RULE-CPP-04. - **Table scan stream**: use `StreamExecuteScanQuery` when staying on the Table client. -**Source**: `ydb-platform/ydb-cpp-sdk` — `TResultSet::Truncated()` in (backed by `Ydb::ResultSet::truncated()` in ). +**Source**: YDB docs — paging guide at (keyset pagination over the primary key, the canonical strategy). The `TResultSet::Truncated()` flag the rule keys on is part of the public C++ API in `include/ydb-cpp-sdk/client/result/result.h`. ### RULE-CPP-02: External state mutation from inside the retry lambda @@ -28,7 +28,7 @@ Audit rules for application code talking to YDB through the C++ SDK. Each rule i **Fix**: build the result inside the lambda as a per-attempt local; assign to the outer variable only on the path that returns a successful `TStatus`. The lambda owns all data processing; only the success decision crosses the boundary. -**Source**: `ydb-platform/ydb-cpp-sdk` — retry loop in (`RetryQuerySync`); session-pool retry in . +**Source**: YDB docs — retry recipe at (the SDK retries the user-supplied lambda as the unit of work); error-handling guidance at . The C++ retrier that drives the replay is `RetryQuerySync` in `include/ydb-cpp-sdk/client/query/client.h`. ### RULE-CPP-03: Missing `.Idempotent(true)` on `RetryQuerySync` / `RetryOperationSync` @@ -39,11 +39,11 @@ Audit rules for application code talking to YDB through the C++ SDK. Each rule i - **Missing flag on safe-to-replay work**: `RetryQuerySync` / `RetryOperationSync` whose lambda body is replay-safe (a read; an `UPSERT` keyed on a value the caller already has; a write guarded by an idempotency key) but no `NYdb::NRetry::TRetryOperationSettings().Idempotent(true)` passed as the settings argument. Fix: add `.Idempotent(true)`. - **Flag set on non-idempotent work**: retry call carrying `.Idempotent(true)` while the lambda performs a non-idempotent write (counter increment, money transfer, raw `INSERT` of a generated row). Fix: remove the flag *and* rework the write to be idempotent before opting back in. -**Problem**: `GetNextStep` in the SDK retry context classifies `UNDETERMINED` and `TRANSPORT_UNAVAILABLE` as retryable only when `Settings_.Idempotent_` is true. These are transport-class failures where the server may have already committed the write before the client saw the failure. The SDK cannot infer idempotency from the API surface — only the developer knows. Setting the flag on a non-idempotent write causes double effect; omitting it on an idempotent write makes the program propagate errors it could have absorbed. +**Problem**: per the YDB docs, `UNDETERMINED` is conditionally retryable — "only idempotent operations can be fixed with a retry." These are failures where the server may have already committed the write before the client saw the failure. The SDK cannot infer idempotency from the API surface — only the developer knows. Setting the flag on a non-idempotent write causes double effect on retry; omitting it on an idempotent write makes the program propagate errors it could have absorbed. **Fix**: pass `NYdb::NRetry::TRetryOperationSettings().Idempotent(true)` (or `NYdb::NTable::TRetryOperationSettings().Idempotent(true)`) when the inner work is idempotent. For non-idempotent writes, do not set the flag; make the write idempotent first (client-generated request id, dedup guard) before opting in. -**Source**: `ydb-platform/ydb-cpp-sdk` — `GetNextStep` in (`UNDETERMINED`, `TRANSPORT_UNAVAILABLE` branches). +**Source**: YDB docs — status-code retry table at (UNDETERMINED is conditionally retryable for idempotent operations only; PRECONDITION_FAILED non-retryable); retry recipe at ("Idempotent operations are retried for a broader range of errors"); error-handling guidance at ("Only idempotent operations can be fixed with a retry"). ### RULE-CPP-04: `for` loop wrapping `RetryQuerySync` / `RetryOperationSync` for retry purposes @@ -53,11 +53,11 @@ Audit rules for application code talking to YDB through the C++ SDK. Each rule i **Not a target**: a keyset-pagination outer loop whose continuation depends on rows returned, a cursor advancing, or an `EOS` flag — even though it also wraps `RetryQuerySync` (see RULE-CPP-01 fix). The signal is what drives the next iteration, not the loop syntax. -**Problem**: `RetryQuerySync` / `RetryOperationSync` already retry the lambda internally with classified backoff (`retry.h` `GetNextStep`). A status-driven outer loop multiplies the backoff schedule, re-runs work on non-retryable errors the SDK has correctly decided not to retry, and silently inflates the retry budget the caller thinks they configured. +**Problem**: `RetryQuerySync` / `RetryOperationSync` already retry the lambda internally with status-classified backoff. A status-driven outer loop multiplies the backoff schedule, re-runs work on non-retryable errors the SDK has correctly decided not to retry, and silently inflates the retry budget the caller thinks they configured. The YDB error-handling guide is explicit: "Do not use endless retries" and "do not repeat instant retries more than once." -**Fix**: remove the outer status-driven loop. Express tuning through `TRetryOperationSettings` (`MaxRetries`, backoff settings), not by wrapping the SDK retrier. +**Fix**: remove the outer status-driven loop. Express tuning through `TRetryOperationSettings` (`MaxRetries`, `MaxTimeout`, `FastBackoffSettings`, `SlowBackoffSettings`), not by wrapping the SDK retrier. -**Source**: `ydb-platform/ydb-cpp-sdk` — (`RetryQuerySync` implementation); classification in (`GetNextStep`). +**Source**: YDB docs — retry recipe at (the SDK provides the retrier; C++ `TRetryOperationSettings` knobs are listed there: `MaxRetries`, `MaxTimeout`, `FastBackoffSettings`, `SlowBackoffSettings`, `RetryNotFound`); excess-retry guidance at . ### RULE-CPP-05: Custom retrier with `Sleep` wrapping YDB calls @@ -65,11 +65,11 @@ Audit rules for application code talking to YDB through the C++ SDK. Each rule i **What to look for**: `for` loop with explicit `Sleep` / `std::this_thread::sleep_for` / manual backoff between attempts, calling any YDB-facing method inside — `session.ExecuteQuery`, `session.ExecuteDataQuery`, `client.GetSession`, `BulkUpsert`, or arbitrary helpers that call into the SDK. -**Problem**: a hand-rolled retrier replays every non-success `TStatus` indiscriminately. Non-retryable failures (`PRECONDITION_FAILED`, schema mismatch) burn the retry budget on errors that will never recover, and conditionally-retryable failures (`UNDETERMINED`, `TRANSPORT_UNAVAILABLE`) get retried with no idempotency gate — which can double-apply a non-idempotent write. The SDK retrier classifies via `GetNextStep` and only retries the conditional bucket when `.Idempotent(true)` is set; backoff with jitter comes from `FastBackoffSettings` / `SlowBackoffSettings`. +**Problem**: a hand-rolled retrier replays every non-success `TStatus` indiscriminately. Non-retryable failures (`PRECONDITION_FAILED`, `SCHEME_ERROR`, `BAD_REQUEST`) burn the retry budget on errors that will never recover; conditionally-retryable failures (`UNDETERMINED`) get retried with no idempotency gate, which can double-apply a non-idempotent write. The YDB docs are explicit that the SDK ships a built-in retry mechanism for exactly this reason and that idempotency must be opted into per call. -**Fix**: delete the custom loop and use `RetryQuerySync` / `RetryOperationSync`; express tuning through `TRetryOperationSettings` rather than caller-side `for`/`Sleep` code. +**Fix**: delete the custom loop and use `RetryQuerySync` / `RetryOperationSync`; express tuning through `TRetryOperationSettings` rather than caller-side `for` / `Sleep` code. -**Source**: `ydb-platform/ydb-cpp-sdk` — ; . +**Source**: YDB docs — retry recipe at ("YDB SDKs provide built-in tools for retries"); status-code classification at ; error-handling guidance at . ### RULE-CPP-06: `NYdb::TDriver` constructed per request instead of once per process @@ -79,9 +79,9 @@ Audit rules for application code talking to YDB through the C++ SDK. Each rule i **Problem**: `TDriver` owns the gRPC channel pool, endpoint-discovery state, and background worker threads. Constructing one per request pays full endpoint discovery, gRPC channel setup, and TLS handshake before every YDB call, then tears the state back down — a latency cliff under any non-trivial RPS and a connection-churn signal at the cluster. Failing to call `driver.Stop(true)` on the short-lived driver also leaks the background threads. -**Fix**: hold a single `NYdb::TDriver` for the process lifetime (build it at startup, stop it at shutdown via `driver.Stop(true)`); pass it to surface clients (`TQueryClient`, `TTableClient`) which are cheap to construct on demand. The upstream `basic_example` does exactly this in `main.cpp`. +**Fix**: hold a single `NYdb::TDriver` for the process lifetime (build it at startup, stop it at shutdown via `driver.Stop(true)`); pass it to surface clients (`TQueryClient`, `TTableClient`) which are cheap to construct on demand. -**Source**: `ydb-platform/ydb-cpp-sdk` — `TDriver` lifecycle in ; one-driver-per-process pattern in (driver constructed at `main` and `Stop(true)` on exit). +**Source**: YDB docs — SDK initialization recipe at (the canonical shape: one driver constructed at startup, deferred close at shutdown, clients built on top). `TDriver` lifecycle surface in `include/ydb-cpp-sdk/client/driver/driver.h` of `ydb-platform/ydb-cpp-sdk`. ### RULE-CPP-07: Non-parametrized YQL — `std::format` / string concat into query text @@ -93,7 +93,7 @@ Audit rules for application code talking to YDB through the C++ SDK. Each rule i **Fix**: bind values through `TParamsBuilder().AddParam("$name").(value).Build()` and pass the resulting `TParams` to `ExecuteQuery` / `ExecuteDataQuery`. A `DECLARE` block in the query body is optional for scalars — types are inferred from bound values — and is justified for compound shapes (`List>`). -**Source**: `ydb-platform/ydb-cpp-sdk` — `TParamsBuilder` in . YQL parameters: . +**Source**: YDB docs — parameterized queries guide at ("saves from vulnerabilities like SQL Injection" and "cache the query plan for parameterized requests"); YQL `DECLARE` syntax at . C++ binding surface is `TParamsBuilder` in `include/ydb-cpp-sdk/client/params/params.h`. ### RULE-CPP-08: Explicit `BeginTransaction` + `Commit` when fused `TTxControl` suffices @@ -101,11 +101,11 @@ Audit rules for application code talking to YDB through the C++ SDK. Each rule i **What to look for**: `session.BeginTransaction(...)` followed by one `ExecuteQuery` / `ExecuteDataQuery` and `tx.Commit()` for **single-statement** work where `TTxControl::BeginTx(...).CommitTx()` on the query call would fuse begin, execute, and commit into fewer round trips. Multi-step flows that genuinely need client logic between statements (as in `MultiStep` in the basic example) are not the target. -**Problem**: separate begin and commit RPCs add latency and session churn. The upstream basic example documents that inline `TTxControl` on `ExecuteQuery` is preferable in most cases because it avoids additional hops to the cluster. +**Problem**: separate begin and commit RPCs add latency and session churn. Per the YDB transactions guide, "if the transaction body is fully formed before accessing the database, it will be processed more efficiently" — fused `TTxControl::BeginTx(...).CommitTx()` lets the server execute the statement and commit in a single round trip; the explicit `BeginTransaction` / `Commit()` split forces two extra hops with no semantic gain for a single statement. **Fix**: for single-statement transactions, pass `TTxControl::BeginTx(TTxSettings::SerializableRW()).CommitTx()` as the second argument to `ExecuteQuery` / `ExecuteDataQuery` instead of explicit `BeginTransaction` + `Commit()`. -**Source**: `ydb-platform/ydb-cpp-sdk` — `ExplicitTcl` comment in (lines 354–357). +**Source**: YDB docs — transactions guide at ("if the transaction body is fully formed before accessing the database, it will be processed more efficiently"). C++ `TTxControl` API in `include/ydb-cpp-sdk/client/query/tx.h` of `ydb-platform/ydb-cpp-sdk`. ### RULE-CPP-09: `StreamExecuteQuery` consumer assumes exactly-once rows @@ -113,11 +113,11 @@ Audit rules for application code talking to YDB through the C++ SDK. Each rule i **What to look for**: `StreamExecuteQuery` / `TExecuteQueryIterator::ReadNext` loop that processes rows with no deduplication strategy, no idempotent sink, and no comment acknowledging retry-induced duplicates — especially when the stream call sits inside or under `RetryQuerySync`. -**Problem**: duplicate lines in the output stream are possible due to the external retryer. A consumer that counts rows, bills per row, or appends to an external queue without dedupe will double-count on replay. +**Problem**: the SDK retrier replays the user-supplied lambda as a whole on retryable errors; a stream that already emitted N rows before the failure will, on the next attempt, emit those rows again before reaching new ones. A consumer that counts rows, bills per row, or appends to an external queue without dedupe will double-count on replay. **Fix**: design the sink to be idempotent (keyed UPSERT, dedup by primary key), or track the last-seen cursor and skip duplicates. Do not assume one physical row per logical row in a retried stream. -**Source**: `ydb-platform/ydb-cpp-sdk` — comment in `StreamQuerySelect` in (line 443). +**Source**: YDB docs — retry recipe at (the SDK retries the lambda as the unit of work, so partial side effects from a failed attempt are observable again on the next); error-handling guidance at . ### RULE-CPP-10: `INSERT INTO` inside a retry lambda with `.Idempotent(true)` @@ -129,7 +129,7 @@ Audit rules for application code talking to YDB through the C++ SDK. Each rule i **Fix**: pick one — (a) switch the statement to `UPSERT INTO` (replay-safe by construction; converges to the same final state), (b) keep `INSERT INTO` and drop `.Idempotent(true)` so the SDK propagates `UNDETERMINED` instead of replaying, or (c) wrap the INSERT in a server-side idempotency guard (existence check + INSERT in one transaction, or a dedup table keyed on a client-generated request id) before opting back into `.Idempotent(true)`. -**Source**: `ydb-platform/ydb-cpp-sdk` — `PRECONDITION_FAILED` in ; idempotent-gated retry of `UNDETERMINED` / `TRANSPORT_UNAVAILABLE` in (`GetNextStep`). YQL `INSERT` semantics: (the page documents the `PRECONDITION_FAILED` / `insert_pk` failure on duplicate primary key). +**Source**: YDB docs — status-code retry table at (UNDETERMINED conditionally retryable for idempotent operations; PRECONDITION_FAILED non-retryable); YQL `INSERT` semantics at (duplicate primary key surfaces as `PRECONDITION_FAILED` / `insert_pk`); error-handling guidance at ("Only idempotent operations can be fixed with a retry"). ### RULE-CPP-11: DDL (`ExecuteSchemeQuery` / `CreateTable`) executed inside a `RetryQuerySync` / `RetryOperationSync` lambda @@ -141,4 +141,4 @@ Audit rules for application code talking to YDB through the C++ SDK. Each rule i **Fix**: run schema-creation / migration steps in dedicated, idempotent setup code outside the SDK retrier — typically a startup-time bootstrap that uses `ExecuteSchemeQuery` directly and reasons about its own failure mode. Keep `RetryQuerySync` / `RetryOperationSync` lambdas DML-only. If runtime-issued DDL is unavoidable, give it its own bounded retry strategy rather than reusing the query-retry classifier. -**Source**: `ydb-platform/ydb-cpp-sdk` — `ExecuteSchemeQuery` in ; `CreateTable` in the same header; query-classifier assumptions in (`GetNextStep`). +**Source**: YDB docs — error-handling guidance at and retry recipe at (both frame the SDK retrier around DML status codes and transaction semantics, not schema-change semantics); status-code classification at . C++ DDL surface (`ExecuteSchemeQuery`, `CreateTable`) is declared in `include/ydb-cpp-sdk/client/table/table.h` of `ydb-platform/ydb-cpp-sdk`.