AI Infrastructure Engineer · AI Security Engineer · Platform Engineer
Hybrid engineer, capable of planning, designing, and implementing AI projects across both high-level and low-level architecture, with Network and Cybersecurity background. Passionate on creating secure, high-performance AI systems — from GPU-backed inference stacks to retrieval pipelines and security controls around LLMs. I like practical labs, reproducible tooling, and performance debugging.
- 🧠 Focus: LLM serving & optimization, RAG / retrieval quality, agentic workflows, AI security & runtime controls
- 🧰 I work mostly with local models and self-hosted infra (GPU-first)
- 📍 Based in Europe (NL)
- GPU inference stacks (vLLM / Triton / TensorRT-LLM), profiling + performance tuning (TTFT, throughput, batching)
- Creating Agentic workflows with L40S, A10, Jetson Thor, DGX Spark and RTX4090
- RAG systems with hybrid retrieval (dense + sparse), grounding, and evaluation
- Secure AI workflows: guardrails, prompt-injection resilience, and policy-driven controls around model + tool access, agent/MCP security
- Infrastructure automation: containers, Kubernetes, Terraform/Ansible, observability and scaling (Prometheus/Grafana)
- AI Security: currently as Escalation Engineer at PaloAlto Networks, specialist on AI Runtime Security, AI API-Intercept, Agent, Model Security, Red Teaming. also specialist on AI Infrastructure with BlueField 2/3 DPUs with Container Firewalls for NIC level L7 inspection on AI Factories.
AI / Serving: vLLM, Triton Inference Server, TensorRT-LLM, CUDA, Nsight Systems
RAG / Data: Milvus (hybrid vectors), LangChain/LangGraph patterns, chunking + eval harnesses
Infra: Linux, Docker/Podman, Kubernetes/OpenShift, Terraform, Ansible, Prometheus/Grafana
Security: threat modeling for AI systems, runtime controls, secure-by-design agent/tooling
A few repos I’m proud of (more in pinned projects 👇):
-
NVIDIA vLLM Serve Llama3-70B (AWQ) on L40S with TP=2
Benchmarking + practical serving notes for multi-GPU inference.
https://github.com/kagaho/NVIDIA-VLLM_Serve_llama3-70b-awq_on_L40S_with_TP2 -
NVIDIA GenAI-Perf / Triton performance labs
Measuring and analyzing inference performance with real metrics and repeatable workflows.
(See pinned repos) -
Nsight Systems notes
Practical profiling guidance: timelines, GPU gaps, CPU/GPU correlation, and bottleneck hunting.
(See pinned repos) -
Training and Hands-on PaloAlto's PRECISION AI - Prisma-AIRS Components
Ongoing Series with AI Network and API intercept, Agent, Model Security and Red Teaming
https://github.com/kagaho/PANW-AIRS -
Hands on training for Terraform and Ansible with PaloAlto VM-Series
Ongoing Series of Day 1 and Day 2 deployments with IaC on Virtual firewalls on cloud and premises
https://github.com/kagaho/Terraform-Ansible-Automation-Training