feat(sdk): Enhance dynamic provider loading and compliance framework#10700
feat(sdk): Enhance dynamic provider loading and compliance framework#10700StylusFrost wants to merge 33 commits intomasterfrom
Conversation
…very - Implemented dynamic loading of external providers via entry points, allowing for greater flexibility in provider integration. - Added functionality to discover compliance directories from entry points, enabling external compliance frameworks to be loaded seamlessly. - Refactored check module resolution to prioritize built-in checks while falling back to entry points if necessary. - Improved compliance framework loading to include both built-in and external sources, ensuring comprehensive compliance coverage. - Enhanced CLI argument parsing to support external providers, improving user experience and configurability. - Introduced extensive unit tests to validate dynamic loading, compliance discovery, and overall integration of external providers.
|
✅ All necessary |
|
✅ Conflict Markers Resolved All conflict markers have been successfully resolved in this pull request. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10700 +/- ##
===========================================
+ Coverage 59.14% 71.58% +12.43%
===========================================
Files 8 116 +108
Lines 399 8566 +8167
===========================================
+ Hits 236 6132 +5896
- Misses 163 2434 +2271
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
🔒 Container Security ScanImage: 📊 Vulnerability Summary
4 package(s) affected
|
…iders and add tests
…ific compliance rendering
There was a problem hiding this comment.
Request changes — see inline comments for the specifics. One cross-cutting issue that doesn't anchor to any single line in the diff:
YAML namespace unwrap only knows about 5 providers
prowler/config/config.py:244-253 only sniffs aws, gcp, azure, kubernetes and m365. Prowler ships 16+ built-ins, so twelve are outside the list (nhn, github, googleworkspace, cloudflare, iac, llm, image, mongodbatlas, oraclecloud, openstack, alibabacloud, vercel).
A github user writing a namespaced config:
github:
token: abcThe sniff finds none of the five keys, falls into the else at line 250, github isn't in the safeguard at line 252, and the function returns {'github': {'token': 'abc'}} whole. github_provider receives the wrapped dict instead of the inner block.
This is a latent bug in master for 12 built-ins; the PR inherits it and additionally blocks any external plug-in from using the namespaced format. One change covers both:
if provider in config_file:
config = config_file.get(provider, {})
else:
config = config_file if config_file else {}…iders load_and_validate_config_file only detected the namespaced format for 5 hardcoded providers (aws, gcp, azure, kubernetes, m365). For every other built-in (github, nhn, vercel, cloudflare, iac, llm, image, mongodbatlas, oraclecloud, openstack, alibabacloud, googleworkspace) and for any external plug-in, the full YAML was returned wrapped instead of the provider's own block. Replace the hardcoded list with a dynamic check: if the file has a top-level key matching the provider and its value is a dict, unwrap it. Keep the legacy flat format for AWS only (historical, pre-multicloud) and identify it by the absence of nested-dict top-level values, which prevents cross-provider config leakage when a namespaced file has no section for the requested provider.
…etadata validators
…r clearer import errors
…covery Conflict in prowler/config/config.py resolved by combining both branches: - HEAD: external compliance discovery via entry points (PROWLER-1391) - master: multi-provider framework JSONs scanned at top-level compliance/ (#10300) Order: built-in per-provider -> built-in multi-provider -> external entry points. Built-ins first so they win on name collisions against external registrations. Supporting external plug-ins to register multi-provider frameworks is tracked in PROWLER-1444.
…r names with hyphens are not misparsed
HugoPBrito
left a comment
There was a problem hiding this comment.
Three more issues — see inline comments.
| # Built-in path | ||
| builtin_path = f"prowler.providers.{provider_type}.services.{service}.{check_name}.{check_name}" | ||
| try: | ||
| return import_check(builtin_path) |
There was a problem hiding this comment.
When a plugin re-registers an existing built-in CheckID (an edit over Prowler's own check), the executed module and the loaded metadata come from different sources:
_resolve_check_moduletries the built-in path first and returns it. Entry points are only consulted if the built-in import fails, so the executed module is the built-in one.CheckMetadata.get_bulk()(models.py:419-432) walks the loader output (built-ins first, entry points appended last) and doesbulk[check_id] = metadata. Last write wins, so the plugin's metadata replaces the built-in's for the sameCheckID.
Net effect: built-in code runs with the plugin's Severity, Description, Remediation, etc. A finding generated by code that expected LOW is published with the plugin's HIGH. Audit assumptions silently break.
Pick precedence once and apply it to both module and metadata — never split the source. Suggested: plugin wins (registering a duplicate is most likely an intentional edit, not a coincidence), with a logger.warning naming the overridden built-in.
Test: register an entry-point check sharing a built-in CheckID, run the check, and assert both lib and check.metadata come from the same source.
There was a problem hiding this comment.
Fixed in c7aa5368.
Picked precedence and applied it consistently across the three lookups so the executed module and the loaded metadata always come from the same source. The chosen direction is built-in wins on collision, with a logger.warning naming the shadowed plug-in:
_resolve_check_module(check.py): built-in first, entry point only when the built-in is absent.CheckMetadata.get_bulk(models.py): first-write-wins onCheckID, plug-in duplicate is ignored and a warning is emitted naming the metadata file.Provider.init_global_provider(provider.py): same pattern extended to the provider name itself, since the same divorce is possible there (built-in provider class running plug-in checks). When a plug-in registers a name colliding with a built-in, the built-in is loaded and the plug-in is announced as ignored.
Reasoning for built-in wins instead of plug-in wins: a security tool benefits more from fail-loud predictability than from permissive overrides. A plug-in lowering a check's severity, or shadowing a provider's session setup, would silently change the user's security posture. With built-in wins, plug-ins remain first-class extenders (new providers, new services, new checks under new CheckIDs all keep working) but cannot replace existing built-ins. Override is still possible by renaming, and the warning tells the user how.
Tests:
test_resolve_check_module_builtin_wins_over_entry_point: module resolution with both sources present, asserts built-in wins and the plug-in is not loaded.test_get_bulk_builtin_wins_on_check_id_collision: metadata loader with a duplicate CheckID, asserts built-in metadata stays and a warning is emitted naming the plug-in path.test_init_global_provider_warns_when_plugin_shadowed_by_builtin: provider name collision, asserts the warning fires before dispatch.
| builtin_path = f"prowler.providers.{provider_type}.services.{service}.{check_name}.{check_name}" | ||
| try: | ||
| return import_check(builtin_path) | ||
| except ModuleNotFoundError: |
There was a problem hiding this comment.
Same masking pattern as recover_checks_from_provider in utils.py, in a different file:
try:
return import_check(builtin_path)
except ModuleNotFoundError:
pass # falls through to entry pointsIf a built-in check exists but its module fails to import for an unrelated reason (broken transitive dep, misconfigured client, etc.), the ModuleNotFoundError is caught silently and we resolve via entry points. If a plugin happens to register the same name, it silently runs in place of the built-in. This is worse than just hiding errors: a plugin can hijack a built-in check by causing its import to fail.
Apply the same fix:
if importlib.util.find_spec(builtin_path) is not None:
return import_check(builtin_path) # any error here is real, surface it
# Only fall through to entry points if the built-in module truly doesn't existTest: patch a built-in check so its module raises ImportError("missing: foo") and assert _resolve_check_module re-raises that error instead of falling back to the entry-point loader.
There was a problem hiding this comment.
Fixed in 82132a93.
_resolve_check_module now uses importlib.util.find_spec to check if the built-in module exists before importing - same pattern already applied in init_global_provider via Provider.is_builtin(). If the built-in is absent, we fall through to entry points cleanly. If the built-in exists but its import fails (broken transitive dep, etc.), the error propagates instead of being silently swallowed and replaced by an entry-point plug-in that happens to share the check name.
Added a regression guard test_resolve_check_module_surfaces_error_when_builtin_import_fails that registers a same-named entry-point check and asserts it does NOT take over when the built-in raises ImportError.
|
|
||
|
|
||
| def stdout_report(finding, color, verbose, status, fix): | ||
| def stdout_report(finding, color, verbose, status, fix, provider=None): |
There was a problem hiding this comment.
stdout_report() was extended to accept provider=None, but the call site in report() (line 88) doesn't pass it:
stdout_report(finding, color, verbose, status, fixer)So for any external/custom provider, the else branch inside stdout_report falls back to Provider.get_global_provider(). That defeats the purpose of the new parameter and re-introduces the dependency on the global singleton this PR set out to remove.
Pass the local provider through:
stdout_report(finding, color, verbose, status, fixer, provider=provider)Test: invoke report() after setting Provider._global_provider = None (or to a different provider) and assert the output uses the provider argument.
There was a problem hiding this comment.
Fixed in e7f23bb1.
report() now passes its local provider through to stdout_report, so the dynamic else inside stdout_report no longer needs to fall back to the global singleton for external providers. Wires the new parameter the way the change was intended.
Added a regression guard test_report_propagates_provider_to_stdout_report that clears Provider._global and asserts report() still renders the finding using the local provider's get_stdout_detail(), proving the argument is being used instead of the global.
| check_name = check_module_name.split(".")[-1] | ||
| check_info = (check_name, check_path) | ||
| checks.append(check_info) | ||
| except ModuleNotFoundError: |
There was a problem hiding this comment.
list_modules calls walk_packages, which imports each package to walk it. Catching ModuleNotFoundError broadly here masks two distinct conditions:
try:
modules = list_modules(provider, service)
...
except ModuleNotFoundError:
pass # falls through to entry pointsIf a built-in service exists but one of its modules fails because of a broken transitive dependency, walk_packages raises ModuleNotFoundError with a different name than the service path. We swallow it and silently fall through to entry points. The user either never sees the real error or ends up running an entry-point check that happens to share the same name (see the duplicate-CheckID comment).
Distinguish "service doesn't exist" from "service exists but failed to import":
service_path = (
f"prowler.providers.{provider}.services.{service}"
if service else f"prowler.providers.{provider}.services"
)
if importlib.util.find_spec(service_path) is not None:
modules = list_modules(provider, service) # any error here is real, surface it
...
# Only fall through to entry points if the built-in service truly doesn't existTest: patch a built-in service so one of its modules raises ImportError("missing: foo") and assert recover_checks_from_provider re-raises instead of silently falling back.
There was a problem hiding this comment.
Fixed in 82132a93.
recover_checks_from_provider now distinguishes "service doesn't exist" from "service exists but failed to import" via importlib.util.find_spec(service_path), exactly as you suggested. Removed the except ModuleNotFoundError: pass that was masking real import errors. Built-in services that fail to import now surface the error through the function's global exception handler (logs the original ImportError class and message at CRITICAL, then sys.exit(1)) instead of silently routing to entry points.
Added a regression guard test_recover_checks_surfaces_error_when_builtin_service_import_fails that mocks a broken built-in service and asserts the import error wins over a same-named entry-point plug-in.
|
|
||
| # Skip compliance frameworks for external-tool providers | ||
| if provider not in EXTERNAL_TOOL_PROVIDERS: | ||
| if not Provider.is_tool_wrapper_provider(provider): |
There was a problem hiding this comment.
Nice work introducing Provider.is_tool_wrapper_provider() as the single source of truth here (and at :303, :426). But line 202 still uses the hardcoded list:
elif default_execution and provider not in ["iac", "llm"]:
args.output_formats.extend(get_available_compliance_frameworks(provider))Any new tool-wrapper provider (built-in or external) silently misses this branch unless someone also remembers to edit this list — exactly the bug the helper was meant to prevent.
elif default_execution and not Provider.is_tool_wrapper_provider(provider):
args.output_formats.extend(get_available_compliance_frameworks(provider))A grep for ["iac", "llm"] / "iac" / "llm" would also help confirm no other call site duplicates the same decision.
There was a problem hiding this comment.
Fixed in 5e87657.
The line now uses Provider.is_tool_wrapper_provider(provider) so the compliance-framework branch is skipped for any tool wrapper - built-in (iac, llm, image) or external plug-in declaring is_external_tool_provider = True. No behavior change for the built-ins (image already returned an empty list of frameworks, so extend([]) was a no-op).
Per your suggestion I grepped for ["iac", "llm"] / "iac" / "llm" / "image" across prowler/. The remaining hits fall into two buckets:
-
scan.py:97 and scan.py:164 use provider.type in ("iac", "image") - note the asymmetry: llm is intentionally excluded because it has no synthetic llm_scan check. Consolidating these to is_tool_wrapper_provider would route llm into a branch that tries to execute a non-existent check. I'm leaving these as-is; happy to revisit if you'd prefer a separate refactor that introduces a per-tool-wrapper "synthetic check name" registry.
-
main.py:392-396, 428 and the dispatch chain in provider.py are per-provider branches with different logic per tool wrapper (each one needs distinct kwargs / setup), so the binary helper doesn't apply there.
If you'd rather see scan.py consolidated too, let me know and I'll open a separate PR with the synthetic-check-registry approach
…er-contract-dynamic-discovery # Conflicts: # prowler/config/config.py
…very
Calling importlib.util.find_spec on prowler.providers.{provider}.services
for an external provider propagates ModuleNotFoundError when the parent
package prowler.providers.{provider} does not exist, instead of returning
None. This caused recover_checks_from_provider, _resolve_check_module and
Scan.scan to fail with "No module named 'prowler.providers.{external}'"
even though the plug-in registered its checks via entry points correctly.
Gate the built-in branch on Provider.is_builtin (which already wraps the
find_spec in try/except) and reuse _resolve_check_module from Scan.scan
so external providers fall through to the entry-point lookup.
Follow-up fix on
|
…port cycle CodeQL flagged a cyclic import after `prowler/lib/check/utils.py` and `prowler/lib/check/check.py` started importing `Provider` from `prowler.providers.common.provider`. That module transitively imports `prowler.config.config`, which imports back into `prowler.lib.check.*` (`compliance_models`, `external_tool_providers`) — closing the cycle. Apply the same pattern already used for `is_tool_wrapper_provider`: extract the predicate to a leaf module, `prowler.providers.common.builtin`, that depends only on `importlib.util`. `Provider.is_builtin` delegates to the leaf, and call sites in `prowler.lib.check.*` now import directly from the leaf — no more cycle. Also underscore-prefix unused parameters on the abstract stubs in `Provider` (get_finding_output_data, generate_compliance_output, display_compliance_table) so vulture stops flagging them now that the file is in the diff.
Two unrelated lint blockers that CI surfaced once the file was in scope of `black --check .` and `vulture` on the changed-file path: - Reformat one if/and conditional that black wanted on a single line. - Underscore-prefix unused parameter names on FakeProvider stub methods (from_cli_args, get_output_options, display_compliance_table, generate_compliance_output, FakePureContractProvider.from_cli_args) and the throwaway **kw in a MagicMock side_effect lambda. These are test fixtures whose method bodies don't reference those args; the signatures must match the abstract contract on Provider, so we keep positions/types and just rename to indicate intentional non-use. No logic change.
| @validator("CheckTitle", pre=True, always=True) | ||
| def validate_check_title(cls, check_title, values): # noqa: F841 | ||
| if values.get("Provider") not in EXTERNAL_TOOL_PROVIDERS: | ||
| def validate_check_title(cls, check_title, values): |
| @validator("RelatedUrl", pre=True, always=True) | ||
| def validate_related_url(cls, related_url, values): # noqa: F841 | ||
| if related_url and values.get("Provider") not in EXTERNAL_TOOL_PROVIDERS: | ||
| def validate_related_url(cls, related_url, values): |
| @validator("Remediation") | ||
| def validate_recommendation_url(cls, remediation, values): # noqa: F841 | ||
| if values.get("Provider") not in EXTERNAL_TOOL_PROVIDERS: | ||
| def validate_recommendation_url(cls, remediation, values): |
| @validator("Description", pre=True, always=True) | ||
| def validate_description(cls, description, values): # noqa: F841 | ||
| if values.get("Provider") not in EXTERNAL_TOOL_PROVIDERS: | ||
| def validate_description(cls, description, values): |
| @validator("Risk", pre=True, always=True) | ||
| def validate_risk(cls, risk, values): # noqa: F841 | ||
| if values.get("Provider") not in EXTERNAL_TOOL_PROVIDERS: | ||
| def validate_risk(cls, risk, values): |
Summary
Providerbase class (12 methods:from_cli_args,get_output_options,get_stdout_detail,get_finding_sort_key,get_summary_entity,get_finding_output_data,get_html_assessment_summary,generate_compliance_output,get_mutelist_finding_args,display_compliance_table,init_parser,is_external_tool_provider)prowler.providers,prowler.checks.{provider}, andprowler.compliancegroups with caching, CLI auto-discovery, and fallback across output, compliance, mutelist, compliance table display, and check resolutionChanges
prowler/providers/common/provider.py— Dynamic provider contract methods (12), entry point discovery (_load_ep_provider,get_available_providers,get_providers_help_text),init_global_providerfallback to entry points when built-in import fails, newis_tool_wrapper_provider(provider)helper that combines theEXTERNAL_TOOL_PROVIDERSfrozenset with the class attribute, and newis_builtin(provider)helper that usesimportlib.util.find_specto discriminate built-in vs external providers upfront (surfaces missing transitive dependencies with clear error messages)prowler/__main__.py— Dynamicelsefallback for output options and compliance CSV generation for unknown providers; consultsProvider.is_tool_wrapper_provider()instead of the frozenset directlyprowler/lib/cli/parser.py— Auto-discovers external providers and appends them to--helpusage/epilog dynamicallyprowler/lib/outputs/outputs.py—ifchain converted toif/elif/elsewith dynamic fallback viaget_stdout_detailandget_finding_sort_keyprowler/lib/outputs/summary_table.py— Dynamic fallback viaget_summary_entityfor unknown providersprowler/lib/outputs/finding.py— Support for external provider finding output dataprowler/lib/outputs/html/html.py— Dynamic fallback for HTML assessment summaryprowler/lib/outputs/compliance/compliance.py— Dynamic fallback viadisplay_compliance_table()for custom compliance table rendering, with automatic fallback to generic tableprowler/lib/check/check.py— Check module resolution falls back to entry points; mutelist dispatchelseclause viaget_mutelist_finding_args()prowler/lib/check/utils.py— Check and compliance discovery merges built-in and entry point sources;_recover_ep_checks(provider, service=None)filters by.services.{service}.in the entry-point dotted path;recover_checks_from_providerno longer exits early onModuleNotFoundErrorand always consults entry pointsprowler/lib/check/compliance_models.py— Compliance directory discovery from entry pointsprowler/lib/check/models.py— Support for external provider check metadata; all 9CheckMetadatavalidators consultProvider.is_tool_wrapper_provider()instead of the frozensetprowler/config/config.py— Configuration support for external providers;load_and_validate_config_filenow unwraps the namespaced format for every built-in and external provider (previously only 5 hardcoded providers)prowler/providers/common/arguments.py— CLI argument parsing for external providers; discriminates built-in vs external upfront viaProvider.is_builtin()so missing transitive dependencies in built-ins fail loudly with a clear message instead of silently falling through to entry pointstests/providers/external/test_dynamic_provider_loading.py— 77 tests covering provider discovery, entry point loading/caching, CLI parser integration, check resolution fallback, dispatch fallbacks, mutelist dispatch, compliance table dispatch, base contract defaults, compliance framework discovery,is_tool_wrapper_providerhelper, service-filtered entry-point check discovery,is_builtinhelper, and built-in-with-missing-dependency clean failureReview Fixes
Commits addressing @HugoPBrito's review:
e8487d068—load_and_validate_config_fileunwraps namespaced config for every provider, not just the 5 hardcoded built-ins60e76570—is_external_tool_providerwired into the execution flow (3 call sites in__main__.py) and the 9CheckMetadatavalidators via the newProvider.is_tool_wrapper_provider()helpercf70d1f9f—from_cli_argsreturn value honored in the dynamic fallback call site, with tolerance for providers that set the global from__init__0883baad7— External providers work with--service(removed prematuresys.exit(1)and theif not service:gate around entry points)907166d88— Discriminate built-in vs external providers upfront viaimportlib.util.find_spec, so missing transitive dependencies in built-ins (e.g.boto3) fail loudly with a clear error message instead of silently re-routing to entry pointsTest Plan
pytest tests/providers/external/test_dynamic_provider_loading.py— 77 tests passpytest tests/lib/— all tests pass (no regressions)--excluded-servicetracked in PROWLER-1419)elseclauses after existingif/elifchains