Upgrade ElasticSearch/OpenSearch Client Libraries#999
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ebattat
left a comment
There was a problem hiding this comment.
@pragya811,
are all the methods in elasticsearch_operations.py working the same for OpenSearch and ElasticSearch ?
|
@pragya811, |
I've tested with both opensearch and elasticsearch |
|
/approved |
|
@pragya811, why it is still a draft PR ? |
|
@pragya811, why coderabbitai is set to false ? |
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cloud_governance/common/elasticsearch/elasticsearch_operations.py (1)
360-364:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix bulk upload success/failure accounting to use
helpers.bulkreturn semantics (or catchBulkIndexError).
- At
cloud_governance/common/elasticsearch/elasticsearch_operations.pylines 360-364,self.__bulk_fn(self.__es, bulk_items)iselasticsearch.helpers.bulk(...)/opensearchpy.helpers.bulk(...), which return a tuple like(success_count, errors)—not a boolean. Theif response:/else:check therefore never correctly derives per-item success vs failure counts.- With default
raise_on_error=True, item-level failures causeBulkIndexErrorto be raised instead of returning that tuple, so partial failures won’t reach this accounting/logging. To count successes vs failures, either useraise_on_error=Falseand accumulatesuccess_countpluslen(errors)(or the error count whenstats_only=True), or catchBulkIndexErrorand use its.errors.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cloud_governance/common/elasticsearch/elasticsearch_operations.py` around lines 360 - 364, The current success/failure accounting uses the boolean truthiness of response from self.__bulk_fn(self.__es, bulk_items), but helpers.bulk returns (success_count, errors) or raises BulkIndexError on partial failures; change the logic in the method that calls self.__bulk_fn (using the bulk_items variable) to either call helpers.bulk with raise_on_error=False and unpack (success_count, errors) to add success_count to total_uploaded and len(errors) to failed_items (or use stats_only to get counts), or wrap the call in a try/except catching BulkIndexError and extract error details from the exception to increment failed_items and add the successfully indexed count to total_uploaded; ensure you reference self.__bulk_fn(self.__es, bulk_items) and handle both the tuple return and BulkIndexError paths so per-item accounting is correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cloud_governance/common/elasticsearch/elasticsearch_operations.py`:
- Around line 139-140: The upload_to_elasticsearch function signature and body
mismatch: either accept lists or not; update the function to handle a
list-of-dicts up front (before any dict-specific calls) or change the type
annotation/remove `@typechecked` so the function only accepts dicts; specifically
alter upload_to_elasticsearch to detect if data is a list and branch before
calling data.get or data.keys (references: upload_to_elasticsearch). Also fix
bulk failure accounting in upload_data_in_bulk by calling helpers.bulk with
raise_on_error=False and unpacking the returned tuple (success_count, errors) or
by catching BulkIndexError from helpers.bulk (reference: upload_data_in_bulk and
usage of helpers.bulk), then compute failed_items from errors (or length of
errors) and surface/log the error details instead of relying on a boolean check
of response.
In `@cloud_governance/main/environment_variables.py`:
- Line 213: Normalize the ES_SERVER_TYPE value when reading it: update the call
that sets self._environment_variables_dict['ES_SERVER_TYPE'] (which currently
uses EnvironmentVariables.get_env('ES_SERVER_TYPE', 'opensearch')) to strip
surrounding whitespace and normalize case (e.g., .strip().lower()) so values
like " OpenSearch " or "OPENSEARCH" resolve consistently; ensure the normalized
value is stored back into self._environment_variables_dict['ES_SERVER_TYPE'] so
downstream logic that checks for 'opensearch' vs other types behaves
deterministically.
In `@cloud_governance/main/es_uploader.py`:
- Around line 97-98: The parameter es_add_items in
upload_last_policy_to_elasticsearch is declared optional but the implementation
calls .update() / .items() and will crash if None; fix by guarding against None
at function start (e.g., if es_add_items is None: es_add_items = {}) or by using
a local empty dict before any .update()/.items() usage so the code never calls
those methods on None; apply the same defensive fix to the other occurrence(s)
that use es_add_items around lines 210-212.
In `@jenkins/cloud_resource_orchestration/run_cloud_resource_orchestration.py`:
- Around line 35-36: The code currently stringifies secrets from
common_env_vars/common_input_vars (including es_user and es_password) via
combine_vars into a shell-interpolated podman command executed with os.system;
replace those os.system invocations that run podman (the places that call
os.system with the combined -e "KEY=VALUE" string) with subprocess.run using a
list/sequence argument (no shell=True) and supply sensitive values via an env=
dictionary (e.g., build a copy of os.environ, update it with es_user/es_password
and other non-secret envs from combine_vars) so nothing is injected into the
shell command string; keep any non-secret CLI args as list elements and use
check=True to surface failures.
In `@jenkins/clouds/aws/daily/cost_explorer/run_upload_es.py`:
- Around line 40-52: Several os.system(...) calls in run_upload_es.py
interpolate sensitive ES_PASSWORD (and other creds) into shell strings; replace
each os.system(f"...") that uses QUAY_CLOUD_GOVERNANCE_REPOSITORY and es_index_*
with subprocess.run([...], env=env, check=True) where you construct the argv
list
(["podman","run","--rm","--net=host","--name","cloud-governance","-e","AWS_DEFAULT_REGION=us-east-1",
"-e", "account=perf-dept", ... , QUAY_CLOUD_GOVERNANCE_REPOSITORY]) and build
env by copying os.environ and adding ES_PASSWORD, AWS_ACCESS_KEY_ID_*,
AWS_SECRET_ACCESS_KEY_* and ES_HOST/ES_PORT/ES_USER/ES_INDEX/cost_explorer vars;
do not embed ES_PASSWORD or any secret into the command string—pass them via the
env dict (or as separate "-e","key=value" argv entries that reference literal
safe values), and use check=True to surface failures; update all occurrences
(the three per-index runs and the three global runs, and any use of
common_input_vars) accordingly.
In `@jenkins/clouds/aws/daily/org_cost_explorer/run_org_upload_es.py`:
- Around line 38-39: The code currently interpolates ES_USER/ES_PASSWORD into
shell strings via common_input_vars (used by combine_vars) and through
common_env_vars → generate_shell_cmd() → run_shell_cmd(), and also in the
os.system(f"""...""") Spot Analysis/yearly savings calls; replace those
os.system shell-insertions with subprocess.run([...], env=env_dict, check=True)
using an argv list (no shell=True) and pass credentials via a sanitized env dict
(e.g., env = os.environ.copy(); env.update({'ES_USER': ES_USER, 'ES_PASSWORD':
ES_PASSWORD, ...})) or as separate CLI args in the list, and update
generate_shell_cmd() / run_shell_cmd() (and any callers of combine_vars) to
accept an env parameter instead of embedding secrets into the command string so
no secrets are present in the executed shell command or logs.
In `@jenkins/clouds/aws/daily/policies/run_policies.py`:
- Around line 93-94: The code currently places ES_USER/ES_PASSWORD into
container_env_dict and interpolates them into podman run shell strings executed
via os.system (see container_env_dict and the podman invocations around the
podman run calls), which risks command injection and leaks secrets; instead,
stop composing shell command strings and replace those os.system calls with
subprocess.run using a list of arguments (e.g., ["podman","run",...]) without
shell=True and pass sensitive values via the env parameter (build an env dict
that includes es_user/es_password merged with os.environ) so the credentials are
not part of the command text and ensure subprocess.run(..., check=True) is used
to surface errors.
In `@jenkins/clouds/aws/monthly/run_monthly_policies.py`:
- Line 16: Replace the os.system call that builds a shell-interpolated f-string
command with subprocess.run using a list argv (no shell) and check=True;
construct the command as a list like
["podman","run","--rm","--name","cloud-governance","--net=host","-e",f"policy=monthly_report","-e",f"es_host={ES_HOST}",...
, QUAY_CLOUD_GOVERNANCE_REPOSITORY"] and call subprocess.run(cmd, check=True);
reference the variables ES_HOST, ES_PORT, ES_USER, ES_PASSWORD, TO_MAIL, CC_MAIL
and QUAY_CLOUD_GOVERNANCE_REPOSITORY when building the argv list and avoid
shell=True so values aren’t shell-interpolated (consider passing secrets via
environment dict or files instead of command-line args if possible).
In `@jenkins/clouds/aws/weekly/cost_over_usage/run_upload_es.py`:
- Around line 26-30: The three os.system(...) invocations that build long
f-string shell commands (using variables like ES_PASSWORD,
AWS_ACCESS_KEY_ID_DELETE_PERF, AWS_ACCESS_KEY_ID_DELETE_PSAP,
AWS_ACCESS_KEY_ID_DELETE_PERF_SCALE, QUAY_CLOUD_GOVERNANCE_REPOSITORY,
special_user_mails, IGNORE_MAILS, es_index_*) must be replaced with
subprocess.run([...], check=True) calls that pass each podman argument as a
separate list item (e.g., "podman", "run", "--rm", "--name", "cloud-governance",
"--net=host", "-e", f"es_password={ES_PASSWORD}", "-e", f"account=perf-dept",
..., QUAY_CLOUD_GOVERNANCE_REPOSITORY) so values are not shell-interpolated;
update each os.system call site to import subprocess and construct the arg list
for the corresponding account (perf, psap, perf-scale) and use
subprocess.run(args, check=True) instead of os.system(f"...").
In `@jenkins/clouds/azure/daily/cost_reports/run_policies.py`:
- Around line 20-21: The code currently puts ES_PASSWORD into common_input_vars
and builds a shell-interpolated podman command passed to os.system, which
serializes the secret into shell-parsed text; change the call site that
constructs the podman run command (the code that converts common_input_vars into
'-e key="value"' strings and invokes os.system) to instead build a list argv and
call subprocess.run(argv, check=True), passing each '-e', key and value as
separate list elements (so ES_PASSWORD is never shell-parsed); keep the same
keys from common_input_vars and ensure you pass values as separate argv entries
to subprocess.run to avoid shell interpolation and secret exposure.
---
Outside diff comments:
In `@cloud_governance/common/elasticsearch/elasticsearch_operations.py`:
- Around line 360-364: The current success/failure accounting uses the boolean
truthiness of response from self.__bulk_fn(self.__es, bulk_items), but
helpers.bulk returns (success_count, errors) or raises BulkIndexError on partial
failures; change the logic in the method that calls self.__bulk_fn (using the
bulk_items variable) to either call helpers.bulk with raise_on_error=False and
unpack (success_count, errors) to add success_count to total_uploaded and
len(errors) to failed_items (or use stats_only to get counts), or wrap the call
in a try/except catching BulkIndexError and extract error details from the
exception to increment failed_items and add the successfully indexed count to
total_uploaded; ensure you reference self.__bulk_fn(self.__es, bulk_items) and
handle both the tuple return and BulkIndexError paths so per-item accounting is
correct.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 7a7cc355-013c-428f-8819-6c6c95c208dc
📒 Files selected for processing (41)
.github/workflows/Build.yml.github/workflows/PR.ymlcloud_governance/common/elasticsearch/elasticsearch_operations.pycloud_governance/main/environment_variables.pycloud_governance/main/es_uploader.pycloud_governance/main/main.pyjenkins/cloud_resource_orchestration/Jenkinsfilejenkins/cloud_resource_orchestration/run_cloud_resource_orchestration.pyjenkins/clouds/aws/daily/cost_explorer/Jenkinsfilejenkins/clouds/aws/daily/cost_explorer/run_upload_es.pyjenkins/clouds/aws/daily/org_cost_explorer/Jenkinsfilejenkins/clouds/aws/daily/org_cost_explorer/run_org_upload_es.pyjenkins/clouds/aws/daily/policies/Jenkinsfilejenkins/clouds/aws/daily/policies/run_policies.pyjenkins/clouds/aws/monthly/Jenkinsfilejenkins/clouds/aws/monthly/run_monthly_policies.pyjenkins/clouds/aws/weekly/cost_over_usage/Jenkinsfilejenkins/clouds/aws/weekly/cost_over_usage/run_upload_es.pyjenkins/clouds/azure/daily/cost_reports/Jenkinsfilejenkins/clouds/azure/daily/cost_reports/run_policies.pyjenkins/clouds/azure/daily/policies/Jenkinsfilejenkins/clouds/azure/daily/policies/run_policies.pyjenkins/clouds/gcp/daily/cost_reports/Jenkinsfilejenkins/clouds/gcp/daily/cost_reports/run_reports.pyjenkins/clouds/ibm/daily/cost_billings/Jenkinsfilejenkins/clouds/ibm/daily/cost_billings/run_ibm_cost_reports.pyjenkins/clouds/ibm/monthly/cost_invoice/Jenkinsfilejenkins/clouds/ibm/monthly/cost_invoice/ibm_invoice_to_es.pyjenkins/tenant/aws/common/run_cost_policies.pyjenkins/tenant/aws/common/run_policies.pyjenkins/tenant/aws/ecoeng_01/PolicyJenkinsfileDailyjenkins/tenant/aws/ecoeng_02/PolicyJenkinsfileDailyjenkins/tenant/aws/ecoeng_03/PolicyJenkinsfileDailyjenkins/tenant/aws/ovn/PolicyJenkinsfileDailyjenkins/tenant/aws/ovn/run_policies.pyjenkins/tenant/aws/qe/qe_01/PolicyJenkinsfileDailyjenkins/tenant/aws/qe/qe_02/PolicyJenkinsfileDailyrequirements.txtsetup.pytests/unittest/mocks/elasticsearch/mock_elasticsearch.pytests_requirements.txt
💤 Files with no reviewable changes (1)
- cloud_governance/main/main.py
| def upload_to_elasticsearch(self, index: str, data: dict, es_add_items: dict = None, | ||
| **kwargs): |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify callsites that may pass list payloads into upload_to_elasticsearch
rg -nP --type=py -C2 '\bupload_to_elasticsearch\s*\('Repository: redhat-performance/cloud-governance
Length of output: 16509
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="cloud_governance/common/elasticsearch/elasticsearch_operations.py"
echo "=== upload_to_elasticsearch decorator + signature ==="
nl -ba "$FILE" | sed -n '120,220p'
echo
echo "=== bulk/dispatch + result handling area ==="
nl -ba "$FILE" | sed -n '220,460p'
echo
echo "=== any direct list passed as data=... to upload_to_elasticsearch ==="
rg -nP --type=py '\bupload_to_elasticsearch\s*\(\s*[^)]*data\s*=\s*\[' -S . || true
echo
echo "=== any callsites passing variables that might be lists named data/records ==="
rg -nP --type=py '\bupload_to_elasticsearch\s*\(\s*[^)]*\bdata\s*=\s*(data|records|items|payload|records_data|record)\b' -S . || trueRepository: redhat-performance/cloud-governance
Length of output: 194
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="cloud_governance/common/elasticsearch/elasticsearch_operations.py"
echo "=== upload_to_elasticsearch decorator + signature (cat -n) ==="
cat -n "$FILE" | sed -n '120,220p'
echo
echo "=== bulk/dispatch + result handling area (cat -n) ==="
cat -n "$FILE" | sed -n '220,460p'
echo
echo "=== any direct list passed as data=[...] to upload_to_elasticsearch ==="
rg -nP --type=py '\bupload_to_elasticsearch\s*\(\s*[^)]*data\s*=\s*\[' -S . || true
echo
echo "=== any callsites passing variables that might be lists named data/records/items/payload ==="
rg -nP --type=py '\bupload_to_elasticsearch\s*\(\s*[^)]*\bdata\s*=\s*(data|records|items|payload|records_data|record)\b' -S . || trueRepository: redhat-performance/cloud-governance
Length of output: 14638
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="cloud_governance/common/elasticsearch/elasticsearch_operations.py"
echo "=== imports/head (typeguard) ==="
cat -n "$FILE" | sed -n '1,120p'
echo
echo "=== typechecked usage in repo (limit output) ==="
rg -n --type=py 'typeguard|typechecked\(' cloud_governance/common/elasticsearch/elasticsearch_operations.py cloud_governance | head -n 80Repository: redhat-performance/cloud-governance
Length of output: 13338
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Detect Python version requirements (pyproject/tox/setup)
rg -n --no-heading -S "python_requires|requires-python|target-version|tox" pyproject.toml setup.cfg setup.py tox.ini .github/workflows -g'*.yml' -g'*.yaml' || true
# Quick heuristic: do we use PEP604 unions (|) elsewhere in repo?
rg -n --no-heading -S "\b\w+\s*\|\s*\w+" cloud_governance -g'*.py' | head -n 40 || trueRepository: redhat-performance/cloud-governance
Length of output: 690
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect key callsites around upload_to_elasticsearch(data=...)
sed -n '1,260p' cloud_governance/main/es_uploader.py | nl -ba | sed -n '1,260p'
echo "----"
sed -n '1,140p' cloud_governance/common/elasticsearch/elastic_upload.py | nl -ba | sed -n '1,160p'
echo "----"
sed -n '1,220p' cloud_governance/common/mails/postfix.py | nl -ba | sed -n '90,160p'Repository: redhat-performance/cloud-governance
Length of output: 122
🌐 Web query:
elasticsearch.helpers.bulk return value success errors tuple stats_only
💡 Result:
The elasticsearch.helpers.bulk function in the Python Elasticsearch client returns a tuple containing two elements: (number_of_successful_actions, error_information) [1][2][3]. The structure of the second element in the returned tuple depends on the value of the stats_only parameter: If stats_only=False (the default): The second element is a list of error responses encountered during the bulk operation [1][2][3]. If stats_only=True: The second element is an integer representing the total number of failed operations [1][2][3]. Important implementation notes: 1. Raise on Error: By default, the bulk helper raises a BulkIndexError if any errors are encountered [1][3]. The stats_only parameter only effectively controls the return value if you set raise_on_error=False [1][3]. 2. Memory Usage: Collecting detailed error information (when stats_only=False) can lead to high memory usage when processing large datasets [1][3]. If you need to handle high volumes of data while tracking errors, the streaming_bulk helper is recommended, as it allows you to process results without storing all errors in memory [1][3].
Citations:
- 1: https://elasticsearch-py.readthedocs.io/en/stable/api_helpers.html
- 2: https://elasticsearch-py.readthedocs.io/en/v8.6.2/helpers.html
- 3: https://elasticsearch-py.readthedocs.io/en/v8.15.0/helpers.html
🌐 Web query:
opensearchpy.helpers.bulk return value tuple success errors
💡 Result:
The opensearchpy.helpers.bulk function returns a tuple containing two elements: a count of successfully executed actions and a collection of errors [1][2]. The exact structure of the second element in the tuple depends on the stats_only parameter: 1. If stats_only is set to False (the default), the second element is a list of error dictionaries describing each failed operation [1][2]. 2. If stats_only is set to True, the second element is an integer representing the total count of failed operations [1][2]. Important Considerations: - By default, helpers.bulk raises a BulkIndexError if any operation in the bulk request fails [3]. - To retrieve errors without raising an exception, you must set raise_on_error=False in the function arguments [1][2]. - Because helpers.bulk buffers documents to reduce memory usage, if you need to process large datasets and handle errors, it is often more memory-efficient to use the streaming_bulk helper, which yields results for each individual operation rather than returning a single summary tuple at the end [1][2][4].
Citations:
- 1: https://github.com/elastic/elasticsearch-py/blob/main/elasticsearch/helpers/actions.py
- 2: https://github.com/opensearch-project/opensearch-py/blob/main/opensearchpy/helpers/actions.py
- 3: https://github.com/opensearch-project/opensearch-py/blob/main/guides/bulk.md
- 4: https://elasticsearch-py.readthedocs.io/en/v7.12.0/helpers.html
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Where upload_data_in_bulk is called
rg -n --type=py '\bupload_data_in_bulk\s*\(' cloud_governance || true
# Show exact call to bulk helper in upload_data_in_bulk
rg -n --type=py '__bulk_fn|upload_data_in_bulk|helpers\.bulk|response\s*=\s*self\.__bulk_fn' cloud_governance/common/elasticsearch/elasticsearch_operations.pyRepository: redhat-performance/cloud-governance
Length of output: 1035
Fix upload_to_elasticsearch type/runtime mismatch and incorrect bulk failure accounting
upload_to_elasticsearch(..., data: dict)is wrapped with@typechecked(), yet the function later includes a JSON-array branch (else: for record in data ...). The body also performs dict-only operations (data.get(...),data.keys(), etc.) before that branch, so passing a list will fail (TypeCheckError and/or AttributeError). Align the type/signature with actual behavior: either remove the list branch or refactor to handle list-of-dicts before any dict operations.upload_data_in_bulkchecksif response:afterresponse = helpers.bulk(...).helpers.bulkreturns a tuple(success_count, errors)and (by default) raisesBulkIndexErroron errors, so this boolean check won’t properly detect partial failures and the code currently ignores the error details. Compute failures from the returned tuple (raise_on_error=False) and/or catchBulkIndexErrorto updatefailed_items.
🧰 Tools
🪛 Ruff (0.15.14)
[warning] 139-139: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cloud_governance/common/elasticsearch/elasticsearch_operations.py` around
lines 139 - 140, The upload_to_elasticsearch function signature and body
mismatch: either accept lists or not; update the function to handle a
list-of-dicts up front (before any dict-specific calls) or change the type
annotation/remove `@typechecked` so the function only accepts dicts; specifically
alter upload_to_elasticsearch to detect if data is a list and branch before
calling data.get or data.keys (references: upload_to_elasticsearch). Also fix
bulk failure accounting in upload_data_in_bulk by calling helpers.bulk with
raise_on_error=False and unpacking the returned tuple (success_count, errors) or
by catching BulkIndexError from helpers.bulk (reference: upload_data_in_bulk and
usage of helpers.bulk), then compute failed_items from errors (or length of
errors) and surface/log the error details instead of relying on a boolean check
of response.
| self._environment_variables_dict['es_index'] = EnvironmentVariables.get_env('es_index', es_index) | ||
| self._environment_variables_dict['es_doc_type'] = EnvironmentVariables.get_env('es_doc_type', '') | ||
| self._environment_variables_dict['ES_TIMEOUT'] = EnvironmentVariables.get_env('ES_TIMEOUT', 2000) | ||
| self._environment_variables_dict['ES_SERVER_TYPE'] = EnvironmentVariables.get_env('ES_SERVER_TYPE', 'opensearch') |
There was a problem hiding this comment.
Normalize ES_SERVER_TYPE on read to avoid config drift failures.
At Line 213, normalize case/whitespace so non-exact env values don’t accidentally select the OpenSearch branch.
Proposed fix
- self._environment_variables_dict['ES_SERVER_TYPE'] = EnvironmentVariables.get_env('ES_SERVER_TYPE', 'opensearch')
+ self._environment_variables_dict['ES_SERVER_TYPE'] = (
+ EnvironmentVariables.get_env('ES_SERVER_TYPE', 'opensearch').strip().lower()
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cloud_governance/main/environment_variables.py` at line 213, Normalize the
ES_SERVER_TYPE value when reading it: update the call that sets
self._environment_variables_dict['ES_SERVER_TYPE'] (which currently uses
EnvironmentVariables.get_env('ES_SERVER_TYPE', 'opensearch')) to strip
surrounding whitespace and normalize case (e.g., .strip().lower()) so values
like " OpenSearch " or "OPENSEARCH" resolve consistently; ensure the normalized
value is stored back into self._environment_variables_dict['ES_SERVER_TYPE'] so
downstream logic that checks for 'opensearch' vs other types behaves
deterministically.
| def upload_last_policy_to_elasticsearch(self, policy: str, index: str, s3_json_file: str, | ||
| es_add_items: dict = None): |
There was a problem hiding this comment.
es_add_items is optional in signature but required in execution path.
With the current flow, None will crash on .update() / .items(). Guard it to keep the API contract consistent.
Proposed fix
- def upload_last_policy_to_elasticsearch(self, policy: str, index: str, s3_json_file: str,
- es_add_items: dict = None):
+ def upload_last_policy_to_elasticsearch(self, policy: str, index: str, s3_json_file: str,
+ es_add_items: dict | None = None):
...
- for key, value in es_add_items.items():
+ for key, value in (es_add_items or {}).items():
data[key] = value
...
def upload_to_es(self, account):
@@
- self.__es_add_items.update({'policy': self.__policy_name, 'region': self.__region_name})
+ self.__es_add_items = self.__es_add_items or {}
+ self.__es_add_items.update({'policy': self.__policy_name, 'region': self.__region_name})Also applies to: 210-212
🧰 Tools
🪛 Ruff (0.15.14)
[warning] 98-98: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cloud_governance/main/es_uploader.py` around lines 97 - 98, The parameter
es_add_items in upload_last_policy_to_elasticsearch is declared optional but the
implementation calls .update() / .items() and will crash if None; fix by
guarding against None at function start (e.g., if es_add_items is None:
es_add_items = {}) or by using a local empty dict before any .update()/.items()
usage so the code never calls those methods on None; apply the same defensive
fix to the other occurrence(s) that use es_add_items around lines 210-212.
| 'es_host': ES_HOST, 'es_port': ES_PORT, 'es_user': ES_USER, 'es_password': ES_PASSWORD, | ||
| 'CRO_ES_INDEX': CRO_ES_INDEX, 'log_level': 'INFO', |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the relevant file with line numbers (only parts likely needed)
FILE="jenkins/cloud_resource_orchestration/run_cloud_resource_orchestration.py"
if [ ! -f "$FILE" ]; then
echo "Missing file: $FILE"
exit 1
fi
echo "=== File excerpt around lines 1-140 ==="
nl -ba "$FILE" | sed -n '1,160p'
echo "=== Grep for os.system usage and env dict creation ==="
rg -n "os\.system|subprocess\.run|env\s*=" "$FILE" || true
echo "=== Extract specific lines mentioned in review comment ==="
nl -ba "$FILE" | sed -n '20,120p' | sed -n '1,120p'
# Run flake8 as per instructions (error detection)
echo "=== Running flake8 error detection ==="
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# Run flake8 style validation (exit-zero allowed)
echo "=== Running flake8 style validation ==="
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statisticsRepository: redhat-performance/cloud-governance
Length of output: 181
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="jenkins/cloud_resource_orchestration/run_cloud_resource_orchestration.py"
echo "=== File existence ==="
ls -l "$FILE" || { echo "Missing $FILE"; exit 1; }
echo "=== Command availability (flake8) ==="
command -v flake8 || echo "flake8 not found"
echo "=== os.system / subprocess.run occurrences (file) ==="
rg -n "os\.system|subprocess\.run" "$FILE" || true
echo "=== Inspect around lines 1-140 (no nl; use sed) ==="
sed -n '1,160p' "$FILE"
echo "=== Inspect around env dict lines (35-60) ==="
sed -n '25,70p' "$FILE"
echo "=== Inspect around later command strings (70-110) ==="
sed -n '65,120p' "$FILE"
echo "=== Inspect around later command strings (90-120) ==="
sed -n '85,130p' "$FILE"
echo "=== Extract specific mentioned snippet lines (35-36, 54-55) with awk ==="
awk 'NR>=30 && NR<=60 {printf "%5d:%s\n", NR, $0}' "$FILE"
awk 'NR>=45 && NR<=65 {printf "%5d:%s\n", NR, $0}' "$FILE"
echo "=== Run flake8 commands only if available ==="
if command -v flake8 >/dev/null 2>&1; then
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics || true
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics || true
else
echo "Skipping flake8 (not installed in sandbox)."
fiRepository: redhat-performance/cloud-governance
Length of output: 22289
Avoid interpolating ES credentials into os.system podman command strings.
common_env_vars/common_input_vars (lines 35-36 and 54-55) include es_user/es_password, which are stringified into -e KEY="VALUE" args (combine_vars) and injected into podman run command strings executed via os.system (lines 77 and 98). This preserves command-injection and secret-exposure risk; replace the shell-built command strings with subprocess.run([...], check=True) and pass credentials via an env= dict instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/cloud_resource_orchestration/run_cloud_resource_orchestration.py`
around lines 35 - 36, The code currently stringifies secrets from
common_env_vars/common_input_vars (including es_user and es_password) via
combine_vars into a shell-interpolated podman command executed with os.system;
replace those os.system invocations that run podman (the places that call
os.system with the combined -e "KEY=VALUE" string) with subprocess.run using a
list/sequence argument (no shell=True) and supply sensitive values via an env=
dictionary (e.g., build a copy of os.environ, update it with es_user/es_password
and other non-secret envs from combine_vars) so nothing is injected into the
shell command string; keep any non-secret CLI args as list elements and use
check=True to surface failures.
| f"""podman run --rm --net="host" --name cloud-governance -e AWS_DEFAULT_REGION="us-east-1" -e account="perf-dept" -e policy="cost_explorer" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PERF}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PERF}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_user="{ES_USER}" -e es_password="{ES_PASSWORD}" -e es_index="{es_index_perf}" -e cost_explorer_tags="{cost_tags}" -e granularity="{granularity}" -e cost_metric="{cost_metric}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| os.system( | ||
| f"""podman run --rm --net="host" --name cloud-governance -e AWS_DEFAULT_REGION="us-east-1" -e account="psap" -e policy="cost_explorer" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PSAP}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PSAP}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_index="{es_index_psap}" -e cost_explorer_tags="{cost_tags}" -e granularity="{granularity}" -e cost_metric="{cost_metric}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| f"""podman run --rm --net="host" --name cloud-governance -e AWS_DEFAULT_REGION="us-east-1" -e account="psap" -e policy="cost_explorer" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PSAP}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PSAP}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_user="{ES_USER}" -e es_password="{ES_PASSWORD}" -e es_index="{es_index_psap}" -e cost_explorer_tags="{cost_tags}" -e granularity="{granularity}" -e cost_metric="{cost_metric}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| os.system( | ||
| f"""podman run --rm --net="host" --name cloud-governance -e AWS_DEFAULT_REGION="us-east-1" -e account="perf-scale" -e policy="cost_explorer" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PERF_SCALE}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PERF_SCALE}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_index={es_index_perf_scale} -e cost_explorer_tags="{cost_tags}" -e granularity={granularity} -e cost_metric="{cost_metric}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| f"""podman run --rm --net="host" --name cloud-governance -e AWS_DEFAULT_REGION="us-east-1" -e account="perf-scale" -e policy="cost_explorer" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PERF_SCALE}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PERF_SCALE}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_user="{ES_USER}" -e es_password="{ES_PASSWORD}" -e es_index={es_index_perf_scale} -e cost_explorer_tags="{cost_tags}" -e granularity={granularity} -e cost_metric="{cost_metric}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
|
|
||
| es_index_global = 'cloud-governance-cost-explorer-perf-global-cost' | ||
| os.system( | ||
| f"""podman run --rm --net="host" --name cloud-governance -e AWS_DEFAULT_REGION="us-east-1" -e account="perf-dept" -e policy="cost_explorer" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PERF}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PERF}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_index="{es_index_global}" -e cost_explorer_tags="{cost_tags}" -e granularity="{granularity}" -e cost_metric="{cost_metric}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| f"""podman run --rm --net="host" --name cloud-governance -e AWS_DEFAULT_REGION="us-east-1" -e account="perf-dept" -e policy="cost_explorer" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PERF}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PERF}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_user="{ES_USER}" -e es_password="{ES_PASSWORD}" -e es_index="{es_index_global}" -e cost_explorer_tags="{cost_tags}" -e granularity="{granularity}" -e cost_metric="{cost_metric}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| os.system( | ||
| f"""podman run --rm --net="host" --name cloud-governance -e AWS_DEFAULT_REGION="us-east-1" -e account="psap" -e policy="cost_explorer" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PSAP}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PSAP}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_index="{es_index_global}" -e cost_explorer_tags="{cost_tags}" -e granularity="{granularity}" -e cost_metric="{cost_metric}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| f"""podman run --rm --net="host" --name cloud-governance -e AWS_DEFAULT_REGION="us-east-1" -e account="psap" -e policy="cost_explorer" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PSAP}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PSAP}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_user="{ES_USER}" -e es_password="{ES_PASSWORD}" -e es_index="{es_index_global}" -e cost_explorer_tags="{cost_tags}" -e granularity="{granularity}" -e cost_metric="{cost_metric}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| os.system( | ||
| f"""podman run --rm --net="host" --name cloud-governance -e AWS_DEFAULT_REGION="us-east-1" -e account="perf-scale" -e policy="cost_explorer" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PERF_SCALE}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PERF_SCALE}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_index="{es_index_global}" -e cost_explorer_tags="{cost_tags}" -e granularity="{granularity}" -e cost_metric="{cost_metric}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| f"""podman run --rm --net="host" --name cloud-governance -e AWS_DEFAULT_REGION="us-east-1" -e account="perf-scale" -e policy="cost_explorer" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PERF_SCALE}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PERF_SCALE}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_user="{ES_USER}" -e es_password="{ES_PASSWORD}" -e es_index="{es_index_global}" -e cost_explorer_tags="{cost_tags}" -e granularity="{granularity}" -e cost_metric="{cost_metric}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") |
There was a problem hiding this comment.
Stop interpolating ES_PASSWORD into os.system podman run command strings
jenkins/clouds/aws/daily/cost_explorer/run_upload_es.py injects ES_PASSWORD via f-string interpolation into os.system(...) podman run calls (lines 40-52) and again through common_input_vars ('es_password': ES_PASSWORD) when building additional -e args (lines 61-62 / used in the subsequent command construction). Replace these os.system calls with subprocess.run([...], env=..., check=True) and pass values via env / argv elements rather than shell-constructed strings.
🧰 Tools
🪛 OpenGrep (1.22.0)
[ERROR] 41-42: Dynamic command passed to os.system/os.popen. Use subprocess with an argument list instead, or use shlex.quote() to sanitize input.
(coderabbit.command-injection.python-os-command)
[ERROR] 43-44: Dynamic command passed to os.system/os.popen. Use subprocess with an argument list instead, or use shlex.quote() to sanitize input.
(coderabbit.command-injection.python-os-command)
[ERROR] 47-48: Dynamic command passed to os.system/os.popen. Use subprocess with an argument list instead, or use shlex.quote() to sanitize input.
(coderabbit.command-injection.python-os-command)
[ERROR] 49-50: Dynamic command passed to os.system/os.popen. Use subprocess with an argument list instead, or use shlex.quote() to sanitize input.
(coderabbit.command-injection.python-os-command)
[ERROR] 51-52: Dynamic command passed to os.system/os.popen. Use subprocess with an argument list instead, or use shlex.quote() to sanitize input.
(coderabbit.command-injection.python-os-command)
🪛 Ruff (0.15.14)
[error] 41-41: Starting a process with a shell, possible injection detected
(S605)
[error] 43-43: Starting a process with a shell, possible injection detected
(S605)
[error] 47-47: Starting a process with a shell, possible injection detected
(S605)
[error] 49-49: Starting a process with a shell, possible injection detected
(S605)
[error] 51-51: Starting a process with a shell, possible injection detected
(S605)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/clouds/aws/daily/cost_explorer/run_upload_es.py` around lines 40 -
52, Several os.system(...) calls in run_upload_es.py interpolate sensitive
ES_PASSWORD (and other creds) into shell strings; replace each os.system(f"...")
that uses QUAY_CLOUD_GOVERNANCE_REPOSITORY and es_index_* with
subprocess.run([...], env=env, check=True) where you construct the argv list
(["podman","run","--rm","--net=host","--name","cloud-governance","-e","AWS_DEFAULT_REGION=us-east-1",
"-e", "account=perf-dept", ... , QUAY_CLOUD_GOVERNANCE_REPOSITORY]) and build
env by copying os.environ and adding ES_PASSWORD, AWS_ACCESS_KEY_ID_*,
AWS_SECRET_ACCESS_KEY_* and ES_HOST/ES_PORT/ES_USER/ES_INDEX/cost_explorer vars;
do not embed ES_PASSWORD or any secret into the command string—pass them via the
env dict (or as separate "-e","key=value" argv entries that reference literal
safe values), and use check=True to surface failures; update all occurrences
(the three per-index runs and the three global runs, and any use of
common_input_vars) accordingly.
| common_input_vars = {'es_host': ES_HOST, 'es_port': ES_PORT, 'es_user': ES_USER, 'es_password': ES_PASSWORD, | ||
| 'es_index': 'cloud-governance-global-cost-billing-reports', |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="jenkins/clouds/aws/daily/org_cost_explorer/run_org_upload_es.py"
echo "== File header =="
sed -n '1,140p' "$FILE" | nl -ba | sed -n '1,140p'
echo
echo "== Show referenced line ranges with context =="
for a in 30 38 39 54 55 71 72 110 112 113; do
start=$((a-5))
end=$((a+5))
echo "--- lines ${start}-${end} (centered at $a) ---"
sed -n "${start},${end}p" "$FILE" | nl -ba
done
echo
echo "== Search for os.system usage =="
rg -n "os\.system" "$FILE" || true
echo
echo "== Search for ES_USER/ES_PASSWORD usage and interpolation =="
rg -n "ES_USER|ES_PASSWORD" "$FILE" || true
rg -n "(es_user|es_password|ES_USER|ES_PASSWORD).*(" "$FILE" || true
rg -n "common_input_vars|env dict|podman|es_host|es_index" "$FILE" || true
echo
echo "== flake8 linting (error detection) =="
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
echo
echo "== flake8 linting (style validation) =="
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statisticsRepository: redhat-performance/cloud-governance
Length of output: 158
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="jenkins/clouds/aws/daily/org_cost_explorer/run_org_upload_es.py"
echo "== File header (first 140 lines with line numbers) =="
sed -n '1,140p' "$FILE" | awk '{printf "%6d\t%s\n", NR, $0}'
echo
echo "== Show referenced line ranges with context =="
for a in 30 38 39 54 55 71 72 110 112 113; do
start=$((a-5))
end=$((a+5))
[ $start -lt 1 ] && start=1
echo "--- lines ${start}-${end} (centered at $a) ---"
sed -n "${start},${end}p" "$FILE" | awk '{printf "%6d\t%s\n", NR, $0}'
done
echo
echo "== Search for os.system usage in file =="
rg -n "os\.system" "$FILE" || true
echo
echo "== Search for ES_USER/ES_PASSWORD usage in file =="
rg -n "ES_USER|ES_PASSWORD" "$FILE" || true
echo
echo "== Search for possible interpolation into shell command strings =="
rg -n "(es_user|es_password|ES_USER|ES_PASSWORD).*(\$\{|\+|\.format|\%s|\\bin\\b|podman|ssh|curl|es_host|es_port|os\.system)" "$FILE" || true
rg -n "(os\.system).*" "$FILE" || true
echo
echo "== flake8 linting (error detection) =="
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
echo
echo "== flake8 linting (style validation) =="
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statisticsRepository: redhat-performance/cloud-governance
Length of output: 18299
Fix: prevent ES_USER/ES_PASSWORD from being embedded in os.system command strings
run_org_upload_es.py interpolates ES_USER/ES_PASSWORD into podman run ... command strings that are executed via os.system (via common_input_vars/combine_vars and the Spot Analysis + yearly savings os.system(f"""...""") calls, and via common_env_vars → generate_shell_cmd() → run_shell_cmd()). Replace this with subprocess.run([...], env={...}) (argv array + env=) instead of building/executing a shell command containing secrets to avoid secret leakage and injection risk.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/clouds/aws/daily/org_cost_explorer/run_org_upload_es.py` around lines
38 - 39, The code currently interpolates ES_USER/ES_PASSWORD into shell strings
via common_input_vars (used by combine_vars) and through common_env_vars →
generate_shell_cmd() → run_shell_cmd(), and also in the os.system(f"""...""")
Spot Analysis/yearly savings calls; replace those os.system shell-insertions
with subprocess.run([...], env=env_dict, check=True) using an argv list (no
shell=True) and pass credentials via a sanitized env dict (e.g., env =
os.environ.copy(); env.update({'ES_USER': ES_USER, 'ES_PASSWORD': ES_PASSWORD,
...})) or as separate CLI args in the list, and update generate_shell_cmd() /
run_shell_cmd() (and any callers of combine_vars) to accept an env parameter
instead of embedding secrets into the command string so no secrets are present
in the executed shell command or logs.
| "es_host": ES_HOST, "es_port": ES_PORT, "es_user": ES_USER, "es_password": ES_PASSWORD, | ||
| "MANAGER_EMAIL_ALERT": "False", "EMAIL_ALERT": "False", "log_level": "INFO", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="jenkins/clouds/aws/daily/policies/run_policies.py"
if [ ! -f "$FILE" ]; then
echo "Missing file: $FILE" >&2
exit 1
fi
echo "== File header =="
head -n 60 "$FILE" | cat -n
echo
echo "== Lines 70-120 =="
sed -n '70,120p' "$FILE" | cat -n
echo
echo "== Lines 120-190 =="
sed -n '120,190p' "$FILE" | cat -n
echo
echo "== Find occurrences of ES_USER/ES_PASSWORD =="
rg -n "ES_USER|ES_PASSWORD" "$FILE" || true
echo
echo "== Find shell-building / execution calls =="
rg -n "os\.system|subprocess\.|Popen\(|shell=True|podman" "$FILE" || true
echo
echo "== flake8 (errors only) =="
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
echo
echo "== flake8 (style) =="
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statisticsRepository: redhat-performance/cloud-governance
Length of output: 14670
Stop building podman shell command strings with ES_USER/ES_PASSWORD and executing via os.system
container_env_dict sets "es_user": ES_USER / "es_password": ES_PASSWORD (lines 93-94), which are interpolated into podman run ... -e es_user="..." -e es_password="..." and then executed with os.system (line 66). This creates command-injection risk and exposes secrets in the constructed shell text (podman invocations at lines 138 and 150).
Use subprocess.run([...], env={...}, check=True) with an argument array and pass es_user/es_password via env (no shell string interpolation).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/clouds/aws/daily/policies/run_policies.py` around lines 93 - 94, The
code currently places ES_USER/ES_PASSWORD into container_env_dict and
interpolates them into podman run shell strings executed via os.system (see
container_env_dict and the podman invocations around the podman run calls),
which risks command injection and leaks secrets; instead, stop composing shell
command strings and replace those os.system calls with subprocess.run using a
list of arguments (e.g., ["podman","run",...]) without shell=True and pass
sensitive values via the env parameter (build an env dict that includes
es_user/es_password merged with os.environ) so the credentials are not part of
the command text and ensure subprocess.run(..., check=True) is used to surface
errors.
| print("Run AWS Monthly Policies") | ||
| os.system( | ||
| f"""podman run --rm --name cloud-governance --net="host" -e policy="monthly_report" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e to_mail="{TO_MAIL}" -e cc_mail="{CC_MAIL}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| f"""podman run --rm --name cloud-governance --net="host" -e policy="monthly_report" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_user="{ES_USER}" -e es_password="{ES_PASSWORD}" -e to_mail="{TO_MAIL}" -e cc_mail="{CC_MAIL}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the target file around the referenced line.
sed -n '1,120p' jenkins/clouds/aws/monthly/run_monthly_policies.py | nl -ba | sed -n '1,120p'
# Show just the exact referenced line region if the file is larger.
nl -ba jenkins/clouds/aws/monthly/run_monthly_policies.py | sed -n '1,80p' | tail -n +1
# Run the flake8 error-detection pass (as per provided guidelines).
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# Run the style/limits pass (as per provided guidelines).
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statisticsRepository: redhat-performance/cloud-governance
Length of output: 122
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="jenkins/clouds/aws/monthly/run_monthly_policies.py"
echo "=== File excerpt around line 16 ==="
sed -n '1,120p' "$FILE" | cat -n | sed -n '1,80p'
echo "=== Exact line 16 ==="
sed -n '16,16p' "$FILE" | cat -n
echo "=== Flake8 error-detection (exit nonzero only on errors) ==="
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics || true
echo "=== Flake8 style/limits pass ==="
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics || trueRepository: redhat-performance/cloud-governance
Length of output: 6371
Replace the os.system f-string with subprocess.run to prevent shell injection (jenkins/clouds/aws/monthly/run_monthly_policies.py lines ~15-16).
The podman run command is currently executed via os.system and interpolates es_password="{ES_PASSWORD}" into a shell string; use subprocess.run([...], check=True) with a non-shell argv list. (This mitigates injection; avoiding secrets in the command line may require further adjustment.)
Suggested hardening
import os
+import subprocess
@@
-os.system(
- f"""podman run --rm --name cloud-governance --net="host" -e policy="monthly_report" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_user="{ES_USER}" -e es_password="{ES_PASSWORD}" -e to_mail="{TO_MAIL}" -e cc_mail="{CC_MAIL}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""")
+subprocess.run(
+ [
+ "podman", "run", "--rm", "--name", "cloud-governance", "--net=host",
+ "-e", "policy=monthly_report",
+ "-e", f"es_host={ES_HOST}",
+ "-e", f"es_port={ES_PORT}",
+ "-e", f"es_user={ES_USER}",
+ "-e", f"es_password={ES_PASSWORD}",
+ "-e", f"to_mail={TO_MAIL}",
+ "-e", f"cc_mail={CC_MAIL}",
+ "-e", "log_level=INFO",
+ QUAY_CLOUD_GOVERNANCE_REPOSITORY,
+ ],
+ check=True,
+)🧰 Tools
🪛 OpenGrep (1.22.0)
[ERROR] 15-16: Dynamic command passed to os.system/os.popen. Use subprocess with an argument list instead, or use shlex.quote() to sanitize input.
(coderabbit.command-injection.python-os-command)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/clouds/aws/monthly/run_monthly_policies.py` at line 16, Replace the
os.system call that builds a shell-interpolated f-string command with
subprocess.run using a list argv (no shell) and check=True; construct the
command as a list like
["podman","run","--rm","--name","cloud-governance","--net=host","-e",f"policy=monthly_report","-e",f"es_host={ES_HOST}",...
, QUAY_CLOUD_GOVERNANCE_REPOSITORY"] and call subprocess.run(cmd, check=True);
reference the variables ES_HOST, ES_PORT, ES_USER, ES_PASSWORD, TO_MAIL, CC_MAIL
and QUAY_CLOUD_GOVERNANCE_REPOSITORY when building the argv list and avoid
shell=True so values aren’t shell-interpolated (consider passing secrets via
environment dict or files instead of command-line args if possible).
| f"""podman run --rm --name cloud-governance --net="host" -e account="perf-dept" -e policy="cost_over_usage" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PERF}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PERF}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_user="{ES_USER}" -e es_password="{ES_PASSWORD}" -e es_index="{es_index_perf}" -e LDAP_HOST_NAME="{LDAP_HOST_NAME}" -e special_user_mails="{special_user_mails}" -e IGNORE_MAILS="{IGNORE_MAILS}" -e log_level="INFO" -e policy_output="{BUCKET_PERF}" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| os.system( | ||
| f"""podman run --rm --name cloud-governance --net="host" -e account="psap" -e policy="cost_over_usage" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PSAP}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PSAP}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_index="{es_index_psap}" -e LDAP_HOST_NAME="{LDAP_HOST_NAME}" -e special_user_mails="{special_user_mails}" -e IGNORE_MAILS="{IGNORE_MAILS}" -e log_level="INFO" -e policy_output="{BUCKET_PSAP}" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| f"""podman run --rm --name cloud-governance --net="host" -e account="psap" -e policy="cost_over_usage" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PSAP}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PSAP}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_user="{ES_USER}" -e es_password="{ES_PASSWORD}" -e es_index="{es_index_psap}" -e LDAP_HOST_NAME="{LDAP_HOST_NAME}" -e special_user_mails="{special_user_mails}" -e IGNORE_MAILS="{IGNORE_MAILS}" -e log_level="INFO" -e policy_output="{BUCKET_PSAP}" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| os.system( | ||
| f"""podman run --rm --name cloud-governance --net="host" -e account="perf-scale" -e policy="cost_over_usage" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PERF_SCALE}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PERF_SCALE}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_index="{es_index_perf_scale}" -e LDAP_HOST_NAME="{LDAP_HOST_NAME}" -e special_user_mails="{special_user_mails}" -e IGNORE_MAILS="{IGNORE_MAILS}" -e policy_output="{BUCKET_PERF_SCALE}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") | ||
| f"""podman run --rm --name cloud-governance --net="host" -e account="perf-scale" -e policy="cost_over_usage" -e AWS_ACCESS_KEY_ID="{AWS_ACCESS_KEY_ID_DELETE_PERF_SCALE}" -e AWS_SECRET_ACCESS_KEY="{AWS_SECRET_ACCESS_KEY_DELETE_PERF_SCALE}" -e es_host="{ES_HOST}" -e es_port="{ES_PORT}" -e es_user="{ES_USER}" -e es_password="{ES_PASSWORD}" -e es_index="{es_index_perf_scale}" -e LDAP_HOST_NAME="{LDAP_HOST_NAME}" -e special_user_mails="{special_user_mails}" -e IGNORE_MAILS="{IGNORE_MAILS}" -e policy_output="{BUCKET_PERF_SCALE}" -e log_level="INFO" {QUAY_CLOUD_GOVERNANCE_REPOSITORY}""") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and print the relevant portion of the file with line numbers
FILE="jenkins/clouds/aws/weekly/cost_over_usage/run_upload_es.py"
echo "=== File: $FILE (lines around 1-120) ==="
nl -ba "$FILE" | sed -n '1,160p'
# Run flake8 error detection as per guidelines (error codes only)
echo "=== flake8 error detection ==="
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# Run flake8 style validation as per guidelines
echo "=== flake8 style validation ==="
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statisticsRepository: redhat-performance/cloud-governance
Length of output: 234
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="jenkins/clouds/aws/weekly/cost_over_usage/run_upload_es.py"
echo "=== Sanity: file length ==="
wc -l "$FILE"
echo "=== File: $FILE (lines around 1-140) ==="
awk '{printf "%6d\t%s\n", NR, $0}' "$FILE" | sed -n '1,140p'
echo "=== File: $FILE (lines around 20-45) ==="
awk '{printf "%6d\t%s\n", NR, $0}' "$FILE" | sed -n '20,50p'
echo "=== flake8 error detection ==="
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
echo "=== flake8 style validation ==="
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statisticsRepository: redhat-performance/cloud-governance
Length of output: 8091
Harden podman invocations against shell injection
jenkins/clouds/aws/weekly/cost_over_usage/run_upload_es.py (lines 25-30) uses os.system(f"... podman run ... -e es_password=\"{ES_PASSWORD}\" ..."), interpolating env-derived values (including secrets) into a shell string. Replace with subprocess.run([...], check=True) and pass the podman args as a list (e.g., ["podman","run", ...,"-e", f"es_password={ES_PASSWORD}", ...]) so values are not parsed by a shell.
🧰 Tools
🪛 OpenGrep (1.22.0)
[ERROR] 27-28: Dynamic command passed to os.system/os.popen. Use subprocess with an argument list instead, or use shlex.quote() to sanitize input.
(coderabbit.command-injection.python-os-command)
[ERROR] 29-30: Dynamic command passed to os.system/os.popen. Use subprocess with an argument list instead, or use shlex.quote() to sanitize input.
(coderabbit.command-injection.python-os-command)
🪛 Ruff (0.15.14)
[error] 27-27: Starting a process with a shell, possible injection detected
(S605)
[error] 29-29: Starting a process with a shell, possible injection detected
(S605)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/clouds/aws/weekly/cost_over_usage/run_upload_es.py` around lines 26 -
30, The three os.system(...) invocations that build long f-string shell commands
(using variables like ES_PASSWORD, AWS_ACCESS_KEY_ID_DELETE_PERF,
AWS_ACCESS_KEY_ID_DELETE_PSAP, AWS_ACCESS_KEY_ID_DELETE_PERF_SCALE,
QUAY_CLOUD_GOVERNANCE_REPOSITORY, special_user_mails, IGNORE_MAILS, es_index_*)
must be replaced with subprocess.run([...], check=True) calls that pass each
podman argument as a separate list item (e.g., "podman", "run", "--rm",
"--name", "cloud-governance", "--net=host", "-e", f"es_password={ES_PASSWORD}",
"-e", f"account=perf-dept", ..., QUAY_CLOUD_GOVERNANCE_REPOSITORY) so values are
not shell-interpolated; update each os.system call site to import subprocess and
construct the arg list for the corresponding account (perf, psap, perf-scale)
and use subprocess.run(args, check=True) instead of os.system(f"...").
| common_input_vars = {'es_host': ES_HOST, 'es_port': ES_PORT, 'es_user': ES_USER, 'es_password': ES_PASSWORD, | ||
| 'es_index': 'cloud-governance-clouds-billing-reports', |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and show the relevant portion of the file with line numbers
FILE="jenkins/clouds/azure/daily/cost_reports/run_policies.py"
echo "=== File: $FILE (lines 1-160) ==="
nl -ba "$FILE" | sed -n '1,160p'
# Find usages of os.system / subprocess / common_input_vars / env flags
echo
echo "=== rg for common_input_vars and os.system ==="
rg -n --hidden --no-ignore -S "common_input_vars|os\.system|subprocess\.run|podman|env=|ES_PASSWORD" "$FILE" || true
# Run flake8 error detection for the whole repo (as instructed); capture only relevant output size
echo
echo "=== flake8: error detection command ==="
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics || true
echo
echo "=== flake8: style command ==="
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics || trueRepository: redhat-performance/cloud-governance
Length of output: 224
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="jenkins/clouds/azure/daily/cost_reports/run_policies.py"
test -f "$FILE" || { echo "Missing $FILE"; exit 1; }
echo "=== $FILE: relevant region (lines 1-200) ==="
nl -ba "$FILE" | sed -n '1,200p'
echo
echo "=== Search within $FILE ==="
rg -n --hidden --no-ignore -S "common_input_vars|os\.system|subprocess\.run|podman|env=|ES_PASSWORD|ES_USER" "$FILE" || trueRepository: redhat-performance/cloud-governance
Length of output: 235
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="jenkins/clouds/azure/daily/cost_reports/run_policies.py"
echo "=== flake8 (error detection) ==="
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
echo
echo "=== flake8 (style validation) ==="
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statisticsRepository: redhat-performance/cloud-governance
Length of output: 2628
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="jenkins/clouds/azure/daily/cost_reports/run_policies.py"
test -f "$FILE" || { echo "Missing $FILE"; exit 1; }
echo "=== $FILE: relevant region (lines 1-220) ==="
nl -ba "$FILE" | sed -n '1,220p'
echo
echo "=== Search within $FILE ==="
rg -n --hidden --no-ignore -S "common_input_vars|os\.system|subprocess\.run|podman|env=|ES_PASSWORD|ES_USER" "$FILE" || true
echo
echo "=== flake8: error detection ==="
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics || true
echo
echo "=== flake8: style validation ==="
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics || trueRepository: redhat-performance/cloud-governance
Length of output: 235
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="jenkins/clouds/azure/daily/cost_reports/run_policies.py"
test -f "$FILE" || { echo "Missing $FILE"; exit 1; }
echo "=== $FILE: relevant region (lines 1-220) ==="
nl -ba "$FILE" | sed -n '1,220p'
echo
echo "=== Search within $FILE ==="
rg -n --hidden --no-ignore -S "common_input_vars|os\.system|subprocess\.run|podman|env=|ES_PASSWORD|ES_USER" "$FILE" || true
echo
echo "=== flake8: error detection ==="
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics || true
echo
echo "=== flake8: style validation ==="
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics || trueRepository: redhat-performance/cloud-governance
Length of output: 235
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="jenkins/clouds/azure/daily/cost_reports/run_policies.py"
test -f "$FILE" || { echo "Missing $FILE"; exit 1; }
echo "=== $FILE: relevant region (lines 1-220) ==="
nl -ba "$FILE" | sed -n '1,220p'
echo
echo "=== Search within $FILE ==="
rg -n --hidden --no-ignore -S "common_input_vars|os\.system|subprocess\.run|podman|env=|ES_PASSWORD|ES_USER" "$FILE" || true
echo
echo "=== flake8: error detection ==="
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics || true
echo
echo "=== flake8: style validation ==="
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics || trueRepository: redhat-performance/cloud-governance
Length of output: 235
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="jenkins/clouds/azure/daily/cost_reports/run_policies.py"
test -f "$FILE" || { echo "Missing $FILE"; exit 1; }
echo "=== $FILE (lines 1-140) ==="
awk '{printf "%6d\t%s\n", NR, $0}' "$FILE" | sed -n '1,140p'
echo
echo "=== rg within $FILE ==="
rg -n --no-ignore --hidden -S "common_input_vars|os\.system|subprocess\.run|podman|env=|ES_PASSWORD|ES_USER" "$FILE" || true
echo
echo "=== flake8 (error detection) ==="
flake8 . --jobs=1 --count --select=E9,F63,F7,F82 --show-source --statistics || true
echo
echo "=== flake8 (style validation) ==="
flake8 . --jobs=1 --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics || trueRepository: redhat-performance/cloud-governance
Length of output: 50391
Stop passing ES_PASSWORD via os.system-constructed podman run -e flags
jenkins/clouds/azure/daily/cost_reports/run_policies.py builds common_input_vars with 'es_password': ES_PASSWORD (lines 20-22), converts it to -e key="value" strings, then interpolates those strings into an os.system(...) shell command (lines 23-30), serializing the secret into shell-parsed text. Use subprocess.run([...], check=True) with an argv list (and pass -e entries as separate args) instead of shell-joined command strings.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/clouds/azure/daily/cost_reports/run_policies.py` around lines 20 -
21, The code currently puts ES_PASSWORD into common_input_vars and builds a
shell-interpolated podman command passed to os.system, which serializes the
secret into shell-parsed text; change the call site that constructs the podman
run command (the code that converts common_input_vars into '-e key="value"'
strings and invokes os.system) to instead build a list argv and call
subprocess.run(argv, check=True), passing each '-e', key and value as separate
list elements (so ES_PASSWORD is never shell-parsed); keep the same keys from
common_input_vars and ensure you pass values as separate argv entries to
subprocess.run to avoid shell interpolation and secret exposure.
Type of change
Note: Fill x in []
Description
Migrate the cloud-governance application from elasticsearch-py 7.x + elasticsearch-dsl 7.x (talking to OpenSearch 1.x via compatibility mode) to a dual-client architecture:
Key Changes:
Core library (cloud_governance/common/elasticsearch/elasticsearch_operations.py):
Jenkins pipelines (all Jenkinsfiles + Python runners):
CI/CD (.github/workflows/):
Dependencies:
Testing Performed
For security reasons, all pull requests need to be approved first before running any automated CI
Summary by CodeRabbit
New Features
Chores