Skip to content

Helm charts deployment#572

Open
juanpablosalas wants to merge 9 commits into
archi-physics:mainfrom
juanpablosalas:k8s-deployment
Open

Helm charts deployment#572
juanpablosalas wants to merge 9 commits into
archi-physics:mainfrom
juanpablosalas:k8s-deployment

Conversation

@juanpablosalas

Copy link
Copy Markdown
Collaborator

📝 Summary

This PR introduces support for Kubernetes Helm-based deployments to archi, complementing the existing Docker Compose/Podman setups. It also adds a mechanism to dynamically discover and inject runtime resources (extra_agents and extra_tools) into the pipeline environments via local filesystem scanning.

🛠 Key Changes

⚙️ Core Logic & Dynamism

  • Dynamic Extension Discovery: Added scanning mechanisms to dynamically import and register runtime extensions from absolute base paths (/root/archi/...):
  • Agents: src/archi/pipelines/init.py scans extra_agents and populates the _PIPELINE_EXPORTS catalog, throwing errors on registration name collisions.
  • Tools: src/archi/pipelines/agents/tools/init.py scans extra_tools and injects functions/classes into the package global namespace.

🖥️ CLI & Infrastructure Deployment

  • New install Subcommand: Introduced a robust install command within src/cli/cli_main.py allowing operators to build configurations and orchestrate real deployments via Kubernetes.
  • Deployment Management: Updated DeploymentManager to support Helm operations, housing methods like create_deployment_templates and helm_install to parse and safely trigger localized upgrades or forced reinstalls.
  • Secret Compiling: Added create_secret_template inside SecretsManager to catch local .env variables, convert secrets safely to Base64 syntax, and compile them into a native Kubernetes manifest target (secrets.yaml).
  • Conditional Templating Overhaul: Adjusted TemplateManager workflows to swap out Docker Compose routines for chart, values, config_seed, and tools compilation stages whenever context.helm is enabled.

🌐 Docker images

  • New "universal" docker images have been pushed to a temporary public docker hub (juanpablosalasg/archi-chat) to be used as source images for the helm deployment

You can test first the template generation using:

archi install --name <archi_name> --config <config_path> --env-file <env_path> --services chatbot--verbosity 4 --templates-dir <templates_dir> --dry-run

templates_dir being the directory where the helm charts will be created. Once you're confident about the templates you can run the installation command without the --dry-run flag or manually run helm install.

shutil.copyfile(prompt_file, dst_file)
logger.debug(f"Copied default prompt: {prompt_type}/{prompt_file.name}")

def _helm_render_default_prompts(self, context: TemplateContext) -> None:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the helm path only handles default prompts. Do we have a method to handle the prompt that user will set to a custom path? e.g.
services:
prompts:
chat_prompt: /path/to/my/custom.prompt
condense_prompt: /path/to/my/condense.prompt

Comment thread src/cli/managers/templates_manager.py Outdated
Comment thread src/cli/managers/templates_manager.py
Comment thread src/cli/templates/helm/templates/chatbot/pvc.yaml Outdated
Comment thread src/cli/templates/helm/templates/data-manager/pvc.yaml Outdated
Comment thread src/cli/templates/helm/values.yaml
Comment thread src/cli/templates/helm/templates/data-manager/pvc.yaml Outdated
Comment thread src/cli/templates/helm/templates/chatbot/deployment.yaml Outdated
Comment thread src/cli/templates/helm/templates/chatbot/deployment.yaml
Comment thread src/cli/managers/secrets_manager.py
@swinney

swinney commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Code review

Found 4 issues:

  1. get_required_volumes(helm=True) crashes with TypeError when --gpu-ids is set. In Helm mode the set is populated with (service, volume) tuples, but the GPU branch unconditionally adds the bare string "archi-models", so sorted() on a mixed set of tuples and strings raises TypeError. VolumeManager.create_volume_templates also unpacks each item as volume[0]/volume[1], which fails on the bare string. (bug — any archi install with GPUs hits this)

def get_required_volumes(self, helm=False) -> List[str]:
volumes: Set[str|dict] = set()
for service_name, state in self.services.items():
if state.enabled and state.volume_name:
if helm:
volumes.add((service_name,state.volume_name))
else:
volumes.add(state.volume_name)
if self.gpu_ids:
volumes.add("archi-models")
return sorted(volumes)

  1. data-manager and postgres deployments render resources from .Values.chat.resources instead of .Values.data_manager.resources / .Values.postgres.resources. values.yaml defines separate per-service resources blocks, so these services silently inherit the chatbot's limits and can't be tuned independently. Likely a copy-paste error. (bug)

resources:
{{ "{{- toYaml .Values.chat.resources | nindent 12 }}" }}
ports:

resources:
{{ "{{- toYaml .Values.chat.resources | nindent 12 }}" }}
livenessProbe:

  1. Helm rendering crashes with TemplateNotFound for any enabled service without a Helm template. Both _stage_service_artifacts and create_deployment_templates iterate all enabled services and unconditionally load helm/templates/<service>/{service,deployment}.yaml, but templates only exist for chatbot, data-manager, and postgres. Enabling grafana or grader (both registered, real services) raises TemplateNotFound. (bug)

def _stage_service_artifacts(self, context: TemplateContext) -> None:
helm = context.helm
if helm:
enabled_services = context.plan.get_enabled_services()
for service in enabled_services:
chart_dir = context.base_dir / "chart" / "templates" / f"{service}-service.yaml"
tmpl = self.env.get_template(str(HELM_PREFIX / service / "service.yaml"))
helm_config = tmpl.render(name=context.plan.name)

def create_deployment_templates(self, base_dir: Path|str, services, env, name):
for service in services:
chart_dir = Path(base_dir) / "chart" / "templates" / f"{service}-deployment.yaml"
tmpl = env.get_template(str(HELM_PREFIX / service / "deployment.yaml"))
helm_config = tmpl.render(name=name, selenium_scraper=True)

  1. --dry-run still writes files to disk, including base64-encoded secrets. Only helm_install is guarded by if not dry; create_secret_template, create_volume_templates, prepare_deployment_files, and create_deployment_templates all run unconditionally before it. The create command exits early in dry mode (the invariant added in commit 9256a9af), but install does not. (bug — secrets written to disk on a dry run)

archi/src/cli/cli_main.py

Lines 685 to 705 in b62ed09

secrets_manager.create_secret_template(base_dir, helm_name,env, all_secrets)
volume_manager = VolumeManager(helm=True)
volume_manager.create_volume_templates(base_dir, helm_config, env=env, name=helm_name)
helm_template_manager.prepare_deployment_files(
helm_config,
config_manager,
secrets_manager,
helm=True,
allow_port_reuse=True, #checked later
**other_flags,
)
# Host-side seeding removed; container config-seed handles schema + ingestion before services start.
deployment_manager = DeploymentManager(helm=True)
deployment_manager.create_deployment_templates(base_dir, services=service_only_resolved, env=env, name=helm_name)
if not dry:
deployment_manager.helm_install(name, templates_dir, force_reinstall)

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@swinney

swinney commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

By all means, 👎 the claude reviewer if you feel the need. 😄

If you want me to have claude fix the bugs, gimme a 👍 I have not had a chance to run it with minikube yet. I'm happy to do that, just need a bit more time.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Helm-based Kubernetes deployment path to archi, alongside existing Compose/Podman flows, including template generation for core services and runtime extension injection (extra agents/tools) via filesystem scanning.

Changes:

  • Introduces archi install CLI command to generate Helm charts (and optionally deploy via helm upgrade --install).
  • Adds Helm chart templates (Chart/values + Deployments/Services/PVCs/ConfigMaps/Secrets) and updates template/volume/secret managers to render them.
  • Adds dynamic discovery/loading of extra agents and tools at runtime from fixed container paths.

Reviewed changes

Copilot reviewed 32 out of 34 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
src/cli/utils/service_builder.py Adds Helm-specific volume discovery and a Helm deployment plan builder.
src/cli/utils/helpers.py Adds Helm template prefix constant.
src/cli/templates/helm/values.yaml New Helm values template for core services and resource defaults.
src/cli/templates/helm/templates/secrets.yaml New Secret manifest template generated from .env secrets.
src/cli/templates/helm/templates/pvc.yaml Placeholder/Helm PVC template path (file added).
src/cli/templates/helm/templates/postgres/service.yaml New Postgres Service Helm template.
src/cli/templates/helm/templates/postgres/pvc.yaml New Postgres PVC Helm template.
src/cli/templates/helm/templates/postgres/deployment.yaml New Postgres Deployment Helm template.
src/cli/templates/helm/templates/postgres/configmap.yaml New ConfigMap template for Postgres init SQL.
src/cli/templates/helm/templates/data-manager/service.yaml New data-manager Service Helm template.
src/cli/templates/helm/templates/data-manager/pvc.yaml New data-manager PVC Helm template.
src/cli/templates/helm/templates/data-manager/deployment.yaml New data-manager Deployment Helm template (optional selenium sidecar).
src/cli/templates/helm/templates/data-manager/configmap.yaml New data-manager ConfigMap template for input lists/weblists.
src/cli/templates/helm/templates/config-seed.yaml New Helm hook Job to seed config/data into Postgres.
src/cli/templates/helm/templates/chatbot/service.yaml New chatbot Service Helm template.
src/cli/templates/helm/templates/chatbot/pvc.yaml New chatbot PVC Helm template.
src/cli/templates/helm/templates/chatbot/deployment.yaml New chatbot Deployment Helm template + mounts for prompts/agents/tools/skills.
src/cli/templates/helm/templates/chatbot/configmap.yaml Generic ConfigMap template used to emit configs/agents/skills/prompts/tools.
src/cli/templates/helm/templates/_helpers.tpl Adds standard Helm name/fullname helpers.
src/cli/templates/helm/Chart.yaml New chart metadata template written by the CLI.
src/cli/templates/dockerfiles/Dockerfile-postgres-universal Adds a universal Postgres image build (pgvector + pg_textsearch).
src/cli/templates/dockerfiles/Dockerfile-grafana-universal Adds a universal Grafana image build.
src/cli/templates/dockerfiles/Dockerfile-data-manager-universal Adds a universal data-manager image build (incl. Firefox/Geckodriver).
src/cli/templates/dockerfiles/Dockerfile-chat-universal Adds a universal chatbot image build (incl. Firefox/Geckodriver).
src/cli/templates/base-config.yaml Adds tools_dir to chat_app config template.
src/cli/managers/volume_manager.py Adds Helm PVC-template generation based on required volumes.
src/cli/managers/templates_manager.py Adds Helm rendering workflow (Chart/values/config-seed/tools + configmaps).
src/cli/managers/secrets_manager.py Adds Kubernetes Secret template generation with base64 encoding.
src/cli/managers/deployment_manager.py Adds Helm deployment template generation and helm upgrade --install execution.
src/cli/cli_main.py Introduces install CLI command to drive Helm chart generation/deployment.
src/archi/pipelines/agents/tools/init.py Adds dynamic loading/injection of extra tool functions/classes from filesystem.
src/archi/pipelines/init.py Adds dynamic loading/registration of extra pipeline agents from filesystem.
src/archi/archi.py Consolidates pipeline init debug logging.
.gitignore Un-ignores the Helm secrets.yaml template file under templates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/cli/utils/service_builder.py
Comment thread src/cli/templates/helm/templates/chatbot/service.yaml Outdated
Comment thread src/cli/templates/helm/templates/data-manager/service.yaml Outdated
Comment thread src/cli/templates/helm/templates/postgres/deployment.yaml Outdated
Comment thread src/cli/templates/helm/values.yaml
Comment thread src/cli/managers/deployment_manager.py Outdated
Comment thread src/cli/templates/helm/templates/secrets.yaml Outdated
Comment thread src/cli/templates/helm/templates/data-manager/pvc.yaml Outdated
Comment thread src/archi/pipelines/agents/tools/__init__.py
Comment thread src/archi/pipelines/__init__.py
@swinney

swinney commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Follow-up: deploy-blocking bug found while testing on a live cluster

I deployed this branch end-to-end (minikube + archi install -> helm upgrade --install). postgres, data-manager, and the config-seed post-install Job all came up cleanly. The chatbot pod, however, gets stuck in ContainerCreating indefinitely:

Warning  FailedMount  configmap "<name>-chat-skills" not found
Warning  FailedMount  configmap "<name>-chat-tools"  not found

Root cause: in the chatbot volumes, optional: true is indented at the Volume level, but optional is a field of the ConfigMapVolumeSource (i.e. it must sit under configMap:). As written, the API server silently drops it, so each mount defaults to optional: false. When a deployment has no skills/tools (so those ConfigMaps are never rendered), the missing ConfigMaps wedge the pod forever instead of being treated as optional.

- name: {{ name }}-chat-prompts
configMap:
name: {{ name }}-chat-prompts
optional: true
- configMap:
name: {{ name }}-chat-agents
name: {{ name }}-chat-agents
optional: true
- configMap:
name: {{ name }}-chat-skills
name: {{ name }}-chat-skills
optional: true
- configMap:
name: {{ name }}-chat-tools
name: {{ name }}-chat-tools
optional: true

All four configMap volumes (prompts, agents, skills, tools) have the same misplacement; config/agents/prompts happen to render in this config so they mount, but skills/tools do not and block startup.

Fix is to nest optional under configMap:, e.g.:

        - configMap:
            name: {{ name }}-chat-skills
            optional: true
          name: {{ name }}-chat-skills

After applying that, the chatbot pod schedules and the app boots normally. Note this compounds with the earlier finding that empty list-derived ConfigMaps (weblists, and here skills/tools) aren't rendered at all -- either always emit an empty ConfigMap or rely on a correctly-placed optional: true.

🤖 Generated with Claude Code

@juanpablosalas

Copy link
Copy Markdown
Collaborator Author

Thanks for the comments @Viphava280444 and @swinney ! I have already implemented them. For the grafana and grader services I added blank templates as I am not that familiar with them, I would like to maybe understand how they are deployed from someone who is using them. I think that while I figure this out, the main services and structure change can be reviewed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants