feat(query-enhancements): PPL lint backend — feature flag + explain/calcite proxy routes#12255
feat(query-enhancements): PPL lint backend — feature flag + explain/calcite proxy routes#12255Hanyu-W wants to merge 4 commits into
Conversation
…xy routes Add the disabled-by-default queryEnhancements.pplLint capability (DynamicConfigService switcher, mirrors agent_traces) and two read-only OpenSearch proxy routes (_ppl/_explain, _cluster/settings) that the PPL linter will consume. Feature is OFF and inert; no client wiring yet. Signed-off-by: Hanyu Wei <weihanyu@amazon.com>
|
|
||
| return res.ok({ | ||
| body: { | ||
| calciteEnabled: resolveValue('plugins.calcite.enabled') !== 'false', |
There was a problem hiding this comment.
p0: backward-compat issue with calciteEnabled default
resolveValue(...) !== 'false' returns true when the key is absent (undefined !== 'false'). On an older opensearch-sql cluster that predates plugins.calcite.enabled, the key is genuinely absent and the settings read succeeds (so we are on this success path, not the catch block), yet we report calciteEnabled: true for a cluster that has no Calcite engine at all.
The request already sends include_defaults=true, so any cluster that knows the setting surfaces it in the defaults bucket even when unset. Absence therefore reliably indicates that the cluster does not have Calcite, and the safe interpretation is the opposite of the current default. Since calciteEnabled gates the explain-based lint rules, the current behavior makes the editor fire _explain calls on every keystroke against clusters that cannot support them.
Suggested fix is to invert the success-path default:
calciteEnabled: resolveValue('plugins.calcite.enabled') === 'true',New clusters still resolve correctly (the default is surfaced); old clusters correctly resolve to false. The catch-block fail-open (true) is a separate decision and can stay as-is. Worth confirming the intended old-cluster behavior with the SQL team either way.
There was a problem hiding this comment.
Good catch. Confirmed that plugins.calcite.enabled is registered (with default true) only when the SQL plugin loads. I inverted the check to === 'true', left the catch-block fail-open (true) intact and added a comment: a 200 with the key absent is definitive "no Calcite," whereas an error can't distinguish "no plugin" from "transient failure." Side effect: calciteEnabled now matches the === 'true' idiom already used by allJoinTypesAllowed.
| method: 'GET', | ||
| path: EXPECTED_PATH, | ||
| }); | ||
| // Absent setting => calcite treated as enabled (not 'false'), join types not allowed. |
There was a problem hiding this comment.
p0: test locks in the absent-setting default
This test currently encodes the behavior flagged on ppl_calcite_settings.ts (// Absent setting => calcite treated as enabled, asserting calciteEnabled: true). If the success-path default is inverted, this should flip to expect false, and it would be worth renaming or adding a case explicitly framed as an older cluster without plugins.calcite.enabled resolving to calcite disabled, so the backward-compat intent is captured in the test name.
There was a problem hiding this comment.
Done — flipped 'uses the core client …' to expect calciteEnabled: false and added an explicitly-named case ('reports calciteEnabled:false for a cluster missing plugins.calcite.enabled') so the backward-compat intent is self-documenting. The catch-path tests ('swallows transport errors', 'logs auth failures at warn') keep asserting the fail-open true, which documents the success-vs-error asymmetry side by side.
| * otherwise, or `null` when a dataSourceId is requested but the data source | ||
| * plugin is unavailable (the caller should respond 400 in that case). | ||
| */ | ||
| export async function resolveOpenSearchClient( |
There was a problem hiding this comment.
p1: add a direct multi-data-source test for resolveOpenSearchClient
This helper is the seam for the multi-data-source requirement in the parent RFC (sql#5405 section 2.9 plumbs dataSourceId end-to-end), but it is only tested indirectly through the two routes, and no test exercises two different dataSourceId values resolving to distinct clients. getClient is a single mock, so we currently prove that the id is forwarded, not that the right client comes back per id.
Suggested coverage in a dedicated unit test for this helper: ds-1 resolves to getClient('ds-1') and ds-2 resolves to getClient('ds-2') returning distinct clients, no id resolves to asCurrentUser, and an absent context.dataSource resolves to null. Cheap insurance given the rest of the lint feature builds on this.
There was a problem hiding this comment.
Added a describe('resolveOpenSearchClient') to index.test.ts covering: two distinct dataSourceIds resolve to distinct clients (not just id-forwarding), no id resolves to asCurrentUser, and a dataSourceId with context.dataSource absent resolves to null.
| { | ||
| path: API.PPL_EXPLAIN, | ||
| validate: { | ||
| body: schema.object({ query: schema.string({ minLength: 1 }) }), |
There was a problem hiding this comment.
p1: confirm request guarding for a per-keystroke route
This proxies arbitrary PPL to _explain with only minLength: 1 on query. It is read-only and auth is enforced by the downstream client, so the risk is low, but per the RFC this route is hit on every keystroke (debounced). Worth confirming the debounce and abort logic lives client-side, and whether a minimal server-side guard (max query length) is wanted. Non-blocking, flagging for the follow-up that wires the editor.
There was a problem hiding this comment.
Confirmed — the keystroke throttling lives client-side and lands with the editor-wiring follow-up PR, not this backend slice. Specifically: a 500 ms trailing-edge debounce per model, plus an ExplainCache that dedups in-flight requests and LRU-caches results per (dataSourceId, query), so repeated lint passes over the same text issue at most one _explain call. There's no explicit AbortController on the explain request — the debounce + dedup make a superseded response a harmless cache write, so cancellation wasn't needed (easy to add later if we want it). On the server side, I added the maxLength: 65536 guard on the query schema here in 4ac79ae as the minimal request bound you flagged, independent of the global server.maxPayload.
✅ All unit and integration tests passing
|
| }), | ||
| // PPL linter feature flag, read at runtime via DynamicConfigService and | ||
| // surfaced as the queryEnhancements.pplLint capability. Disabled by default. | ||
| pplLint: schema.object({ |
There was a problem hiding this comment.
can this be ppl.lint.enabled, or lintEnabled: ["ppl"]? i'm thinking if we add more languages in the future, can the config path be more structured?
There was a problem hiding this comment.
Good call — restructured to queryEnhancements.ppl.lint.enabled in 031498a (config schema now nests ppl: { lint: { enabled } }) so that future languages/features extend the same shape.
|
|
||
| return res.ok({ | ||
| body: { | ||
| calciteEnabled: resolveValue('plugins.calcite.enabled') !== 'false', |
Respond to ps48's review on opensearch-project#12255: - calciteEnabled: invert `!== 'false'` to `=== 'true'` so a successful cluster-settings read with `plugins.calcite.enabled` absent reports the cluster as having no Calcite engine (disabled), instead of defaulting it on. include_defaults=true surfaces the key on any Calcite-capable cluster, so its absence is definitive. Document the deliberate asymmetry with the catch block, which keeps failing open (an error can't tell "no plugin" from a transient failure). - Flip the matching test assertion and add an explicitly-named case for a cluster missing the key (no/old SQL plugin) to pin the backward-compat contract. - Add a direct unit test block for resolveOpenSearchClient: distinct dataSourceIds resolve to distinct clients, no id resolves to asCurrentUser, and a dataSourceId with the data source plugin unavailable resolves to null. - explain route: add `maxLength: 65536` to the query schema. The body was already bounded by server.maxPayload (1 MiB); this makes the cap explicit and independent of global config. Signed-off-by: Hanyu Wei <weihanyu@amazon.com>
Address joshuali925's review on opensearch-project#12255: restructure the config path from queryEnhancements.pplLint.enabled to queryEnhancements.ppl.lint.enabled so future languages/features (ppl.autocomplete, sql.lint) extend the same shape. The wire capability stays a flat boolean queryEnhancements.pplLint, so capability consumers are unaffected; only the DynamicConfigService read in the switcher changes (config.ppl?.lint?.enabled === true). Signed-off-by: Hanyu Wei <weihanyu@amazon.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 10285e6.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
Description
Backend plumbing for the PPL linter, adding the feature flag and two read-only proxy routes.
queryEnhancements.pplLintcapability, defaultfalse, resolved at runtime via aDynamicConfigServicecapability switcher (same pattern asexplore).POST /api/enhancements/ppl/explain— proxies a PPL query to/_plugins/_ppl/_explainand returns the Calcite execution plan. Validates a non-emptyquery; supports optionaldataSourceId.GET /api/enhancements/ppl/calcite_settings— reads/_cluster/settings(scoped viafilter_path) and returns{ calciteEnabled, allJoinTypesAllowed }. Fails open on errors so a settings-read failure never blocks the editor; logs 401/403 atwarn.Issues Resolved
Backend plumbing for opensearch-project/sql#5405
Screenshot
N/A — no UI changes.
Testing the changes
opensearch-sqlcluster and hitGET /api/enhancements/ppl/calcite_settings— verify the scopedfilter_pathresponse matches the full unfiltered/_cluster/settingsoutput forcalciteEnabledandallJoinTypesAllowed.Check List
yarn test:jestyarn test:jest_integration--signoff