You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Step 5: Mark intermediate assets as traversal-only
157
+
158
+
When users specify discovery targets (e.g., `--discover pods`), intermediate assets that don't match the targets must still be traversed (to discover children) but should NOT appear in scan results. Set `OptionTraversalOnly` on their connection config.
159
+
160
+
The provider already knows the discovery targets from `invConfig.Discover.Targets`. When emitting intermediate scope assets in Stage 1, check whether that scope level is targeted:
-`OptionTraversalOnly` is set per-asset on the connection config, not globally
189
+
- Leaf assets (the bottom of the hierarchy) are never traversal-only — they're always scannable if they match targets
190
+
- Mixed targets (e.g., `--discover pods,namespaces`) — if the intermediate level IS a target, don't set `OptionTraversalOnly`. It gets scanned AND traversed.
191
+
-`DiscoveryAuto` and `DiscoveryAll` targets mean everything is scannable — never set `OptionTraversalOnly`
192
+
193
+
Callers use `explorer.ScannableAssets()` instead of `explorer.Connected()` to get only assets that should be scanned. The depth-first traversal still connects everything.
194
+
195
+
### Step 6: Gate resource methods at higher scopes
157
196
158
197
When the root scope is scanned, resource methods that load lower-scope data should return empty results to avoid loading everything into the root's cache. This is optional but important for large providers.
### Step 6: Verify both paths produce the same assets
218
+
### Step 7: Verify both paths and discovery targets
180
219
181
-
Both the legacy and staged paths must discover the same final set of assets (same platform IDs, same names). They differ only in how discovery is chunked.
220
+
Both the legacy and staged paths must discover the same final set of assets (same platform IDs, same names). They differ only in how discovery is chunked. Also verify that discovery targets correctly filter scannable assets.
182
221
183
222
```bash
184
223
# Build and install
@@ -192,11 +231,15 @@ mql shell <provider-args>
192
231
# Verify the same assets appear
193
232
mql shell <provider-args>
194
233
234
+
# Test discovery target filtering (e.g., only pods, only instances)
235
+
# Verify that intermediate assets are traversed but not scanned
Staged discovery introduces a second concern: **not every intermediate asset should be scanned**. When a user specifies discovery targets like `--discover pods`, they want only pods as scannable assets. Namespaces are still needed for traversal (connecting to a namespace triggers Stage 2 which discovers pods), but namespaces themselves should not appear in the scan results.
121
+
122
+
This is solved with `OptionTraversalOnly` — an inventory connection option that providers set on intermediate assets when those assets don't match the requested discovery targets. `AssetExplorer` treats traversal-only assets normally for connection and child discovery, but excludes them from scan results via `ScannableAssets()`.
123
+
124
+
**Provider side** — the provider already knows the discovery targets from `invConfig.Discover.Targets`. When emitting intermediate assets, check whether that level is a target:
125
+
126
+
```go
127
+
// In discoverClusterStage, when emitting namespace assets:
**AssetExplorer side** — `TrackedAsset` exposes a `TraversalOnly` field, populated from the connection option when the asset is connected. Callers use `ScannableAssets()` to get only assets that should be scanned:
**Caller side** — scan loops use `ScannableAssets()` instead of `Connected()`. The depth-first traversal still connects everything (traversal-only and scannable), but only scannable assets are sent to `SynchronizeAssets` / query execution / scan jobs.
|`gcp --discover compute-instances`| org, projects, service groups | compute instances |
170
+
|`aws --discover ec2-instances`| accounts, regions | EC2 instances |
171
+
172
+
**Mixed targets** (`--discover pods,namespaces`): namespaces are both scannable AND traversal nodes. The provider simply doesn't set `OptionTraversalOnly`. They get scanned and their children get discovered. No special handling needed.
173
+
118
174
### Provider Implementation Guide
119
175
120
176
To add staged discovery to a provider:
@@ -172,6 +228,7 @@ Have `AssetExplorer` automatically infer hierarchy from platform IDs or asset me
172
228
-**Bounded memory per branch:** Each scope boundary creates a separate runtime with its own MQL resource cache. When a scope is closed (`CloseAsset`), its entire cache — all MQL resource objects, API responses, and connection state — is released. Only one branch of the hierarchy is in memory at a time. A 1000-namespace cluster uses the same peak memory as a 5-namespace cluster.
173
229
-**No root cache accumulation:** In single-pass discovery, all resources attach to the root runtime's cache and are never released until the scan completes. Staged discovery breaks this by giving each scope its own cache — pods in namespace A are cached in namespace A's runtime, not the cluster root's. When namespace A is closed, those pods are gone from memory.
174
230
-**Reduced API pressure:** Each stage only queries the APIs needed for its scope. No cluster-wide enumeration of every resource type.
231
+
-**Discovery target filtering without hierarchy knowledge:** Callers specify what to scan (e.g., `--discover pods`), and providers mark intermediate levels as traversal-only. `AssetExplorer.ScannableAssets()` returns only the targeted assets. The caller doesn't need to know which levels are intermediate — it just connects everything and filters at the end.
175
232
-**Composable with AssetExplorer:** Callers don't need to understand stages — they just connect discovered children as usual. The staging is entirely provider-internal.
176
233
-**Backward compatible:** The `OptionStagedDiscovery` flag is opt-in. Providers without staged discovery and callers that don't set the flag continue working unchanged.
177
234
-**Cache sharing within scope:**`WithParentConnectionId` lets leaf assets within a scope (e.g., pods within a namespace) share that scope's API client cache, avoiding redundant API calls — while keeping the cache isolated from other scopes.
0 commit comments