Skip to content

MTA-7133-LLM-Proxy-Disconnection-Network-Isolation-Procedures #381

Open
anarnold97 wants to merge 2 commits into
migtools:mainfrom
anarnold97:MTA-7133-LLM-Proxy-Disconnection-Network-Isolation-Procedures
Open

MTA-7133-LLM-Proxy-Disconnection-Network-Isolation-Procedures #381
anarnold97 wants to merge 2 commits into
migtools:mainfrom
anarnold97:MTA-7133-LLM-Proxy-Disconnection-Network-Isolation-Procedures

Conversation

@anarnold97

@anarnold97 anarnold97 commented Jul 5, 2026

Copy link
Copy Markdown
Collaborator

JIRA


PREVIEW


DESCRIPTION

Adds emergency LLM proxy shutdown and network isolation procedures to the Developer Lightspeed guide.

New file: proc_emergency-llm-proxy-shutdown.adoc

Centralized mode (Operator / OpenShift deployments)

  • Disable the LLM proxy by setting kai_llm_proxy_enabled: false in the Tackle CR
  • Covers both the OpenShift web console and oc patch CLI methods
  • Warns against directly scaling the Deployment to zero — the Operator will re-create it
  • Documents expected developer impact: 503/connection refused errors in the IDE, standard IDE functions unaffected
  • Includes re-enablement procedure once the incident is resolved

Standalone mode (manual desktop setup)

  • No central proxy to disable — documents the firewall egress block approach instead
  • Reference table of common LLM provider hostnames to block (OpenAI, Azure OpenAI, IBM watsonx.ai, custom)
  • Notes DNS-based filtering as an alternative (Pi-hole, Cisco Umbrella, Zscaler)

Validation

  • End-to-end procedure to confirm isolation is active from the IDE
  • Table of expected error responses per deployment mode
  • Optional curl-based network-level check

Assembly change

docs/developer-lightspeed-guide/master.adoc — new module included after con_developer-lightspeed-logs.adoc

…Draft)

Signed-off-by: A.Arnold <anarnold@redhat.com>
@coderabbitai

coderabbitai Bot commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a new AsciiDoc procedure documenting emergency shutdown of the LLM proxy to isolate developer environments during security incidents, covering centralized (Tackle CR toggle) and standalone (firewall/network policy) modes, with verification and re-enable steps. It is wired into the master developer guide via an include directive.

Changes

Emergency LLM Proxy Shutdown Documentation

Layer / File(s) Summary
Guide inclusion and procedure introduction
docs/developer-lightspeed-guide/master.adoc, docs/topics/developer-lightspeed/proc_emergency-llm-proxy-shutdown.adoc
Master guide includes the new topic before the Developer Lightspeed attribute toggle; new document introduces module metadata, title, and an incident-response warning.
Centralized mode shutdown, verification, and re-enable
docs/topics/developer-lightspeed/proc_emergency-llm-proxy-shutdown.adoc
Documents disabling/re-enabling the proxy via web console and oc patch on kai_llm_proxy_enabled, verification of proxy pod state, and expected developer-facing error behavior.
Standalone mode blocking and validation
docs/topics/developer-lightspeed/proc_emergency-llm-proxy-shutdown.adoc
Documents blocking LLM provider endpoints at the firewall/network policy layer with example rules and DNS filtering, plus IDE and curl-based validation steps for confirming isolation.

Estimated code review effort: 2 (Simple) | ~10 minutes

Possibly related PRs

  • migtools/mta-documentation#324: Adds emergency shutdown procedure using the same kai_llm_proxy_enabled Tackle CR field referenced in the related "enable LLM proxy" documentation.

Suggested reviewers: djzager, fabianvf

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the new LLM proxy disconnection and network isolation procedures added in this PR.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Signed-off-by: A.Arnold <anarnold@redhat.com>
@anarnold97 anarnold97 changed the title MTA-7133-LLM-Proxy-Disconnection-Network-Isolation-Procedures (First … MTA-7133-LLM-Proxy-Disconnection-Network-Isolation-Procedures Jul 5, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/topics/developer-lightspeed/proc_emergency-llm-proxy-shutdown.adoc`:
- Around line 175-182: Correct the provider name typo in the provider list so it
reads “Azure OpenAI Service” instead of “zure OpenAI Service.” Update the
relevant entry in the developer-lightspeed emergency LLM proxy shutdown document
where the OpenAI/Azure OpenAI providers are listed, keeping the rest of the
table content unchanged.
- Around line 93-104: The verification step in
proc_emergency-llm-proxy-shutdown.adoc is misleading because `oc get pods ... |
grep proxy` will not produce the stated “No resources found...” output; update
the expected result to reflect the actual shell behavior for this command, using
the existing “Verify that no proxy pods are running” step and its “Expected
output” block as the place to adjust the wording.
- Around line 243-264: The curl validation is targeting the public
`api.openai.com` endpoint instead of the actual isolation/proxy path, so it does
not validate centralized deployments correctly. Update the `curl` example in the
emergency shutdown docs to use the proxy or workstation-specific endpoint used
by the isolation flow, and make the expected output reflect that path for both
isolated and centralized modes. Keep the guidance aligned with the surrounding
`curl` validation section so readers verify the real traffic route rather than a
single provider host.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 23625628-72fe-4a48-8b6d-a0bc320e3992

📥 Commits

Reviewing files that changed from the base of the PR and between 50da7d6 and 6a1684d.

📒 Files selected for processing (2)
  • docs/developer-lightspeed-guide/master.adoc
  • docs/topics/developer-lightspeed/proc_emergency-llm-proxy-shutdown.adoc

Comment on lines +93 to +104
. Verify that no proxy pods are running:
+
[source,terminal,subs="+quotes"]
----
$ *oc get pods -n __<mta_namespace>__ | grep proxy*
----
+
.Expected output
[source,terminal]
----
No resources found in mta namespace.
----

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Fix the verification output.

oc get pods ... | grep proxy will not print No resources found in mta namespace.; it either emits matching lines or nothing at all. As written, this verification step is misleading.

Proposed fix
- $ *oc get pods -n __<mta_namespace>__ | grep proxy*
+ $ *oc get pods -n __<mta_namespace>__ --no-headers | grep proxy*

Then update the expected result to match the actual shell behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/topics/developer-lightspeed/proc_emergency-llm-proxy-shutdown.adoc`
around lines 93 - 104, The verification step in
proc_emergency-llm-proxy-shutdown.adoc is misleading because `oc get pods ... |
grep proxy` will not produce the stated “No resources found...” output; update
the expected result to reflect the actual shell behavior for this command, using
the existing “Verify that no proxy pods are running” step and its “Expected
output” block as the place to adjust the wording.

Comment on lines +175 to +182
|OpenAI
|`api.openai.com`
|443

zure OpenAI Service
|`*.openai.azure.com`
|443

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Correct the provider name typo.

zure OpenAI Service should be Azure OpenAI Service.

Proposed fix
-|zure OpenAI Service
+|Azure OpenAI Service
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
|OpenAI
|`api.openai.com`
|443
zure OpenAI Service
|`*.openai.azure.com`
|443
|OpenAI
|`api.openai.com`
|443
Azure OpenAI Service
|`*.openai.azure.com`
|443
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/topics/developer-lightspeed/proc_emergency-llm-proxy-shutdown.adoc`
around lines 175 - 182, Correct the provider name typo in the provider list so
it reads “Azure OpenAI Service” instead of “zure OpenAI Service.” Update the
relevant entry in the developer-lightspeed emergency LLM proxy shutdown document
where the OpenAI/Azure OpenAI providers are listed, keeping the rest of the
table content unchanged.

Comment on lines +243 to +264
.Optional: Network-level validation by using the `curl` command

* On the affected workstation, run the following command to confirm that the endpoint is unreachable:
+
[source,terminal,subs="+quotes"]
----
$ *curl -v --max-time 10 https://api.openai.com/v1/models*
----
+
.Expected output when isolation is active
[source,terminal,subs="+quotes"]
----
* connect to __<ip>__ port 443 failed: Connection refused
curl: (7) Failed to connect to api.openai.com port 443 after __<X>__ ms: Connection refused
----
+
Alternatively, in centralized mode where DNS resolves but the service is down, the expected output is as follows:
+
[source,terminal]
----
curl: (22) The requested URL returned error: 503
----

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Scope the curl validation to the actual isolation path.

This check hardcodes api.openai.com, so it only validates one provider and does not verify centralized mode at all. In centralized deployments, the workstation should be reaching the proxy path, not the public LLM endpoint.

Proposed fix
-Optional: Network-level validation by using the `curl` command
+Optional: Network-level validation for standalone firewall blocks

...
-$ *curl -v --max-time 10 https://api.openai.com/v1/models*
+$ *curl -v --max-time 10 https://<configured_llm_endpoint>/*
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
.Optional: Network-level validation by using the `curl` command
* On the affected workstation, run the following command to confirm that the endpoint is unreachable:
+
[source,terminal,subs="+quotes"]
----
$ *curl -v --max-time 10 https://api.openai.com/v1/models*
----
+
.Expected output when isolation is active
[source,terminal,subs="+quotes"]
----
* connect to __<ip>__ port 443 failed: Connection refused
curl: (7) Failed to connect to api.openai.com port 443 after __<X>__ ms: Connection refused
----
+
Alternatively, in centralized mode where DNS resolves but the service is down, the expected output is as follows:
+
[source,terminal]
----
curl: (22) The requested URL returned error: 503
----
.Optional: Network-level validation for standalone firewall blocks
* On the affected workstation, run the following command to confirm that the endpoint is unreachable:
[source,terminal,subs="+quotes"]
----
$ *curl -v --max-time 10 https://<configured_llm_endpoint>/*
.Expected output when isolation is active
[source,terminal,subs="+quotes"]
----
* connect to __<ip>__ port 443 failed: Connection refused
curl: (7) Failed to connect to api.openai.com port 443 after __<X>__ ms: Connection refused
Alternatively, in centralized mode where DNS resolves but the service is down, the expected output is as follows:
[source,terminal]
----
curl: (22) The requested URL returned error: 503
----
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/topics/developer-lightspeed/proc_emergency-llm-proxy-shutdown.adoc`
around lines 243 - 264, The curl validation is targeting the public
`api.openai.com` endpoint instead of the actual isolation/proxy path, so it does
not validate centralized deployments correctly. Update the `curl` example in the
emergency shutdown docs to use the proxy or workstation-specific endpoint used
by the isolation flow, and make the expected output reflect that path for both
isolated and centralized modes. Keep the guidance aligned with the surrounding
`curl` validation section so readers verify the real traffic route rather than a
single provider host.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant