Renato G Teixeira kagaho

Renato Teixeira

AI Infrastructure Engineer · AI Security Engineer · Platform Engineer

Hybrid engineer, capable of planning, designing, and implementing AI projects across both high-level and low-level architecture, with Network and Cybersecurity background. Passionate on creating secure, high-performance AI systems — from GPU-backed inference stacks to retrieval pipelines and security controls around LLMs. I like practical labs, reproducible tooling, and performance debugging.

🧠 Focus: LLM serving & optimization, RAG / retrieval quality, agentic workflows, AI security & runtime controls
🧰 I work mostly with local models and self-hosted infra (GPU-first)
📍 Based in Europe (NL)

What I’m working on

GPU inference stacks (vLLM / Triton / TensorRT-LLM), profiling + performance tuning (TTFT, throughput, batching)
Creating Agentic workflows with L40S, A10, Jetson Thor, DGX Spark and RTX4090
RAG systems with hybrid retrieval (dense + sparse), grounding, and evaluation
Secure AI workflows: guardrails, prompt-injection resilience, and policy-driven controls around model + tool access, agent/MCP security
Infrastructure automation: containers, Kubernetes, Terraform/Ansible, observability and scaling (Prometheus/Grafana)
AI Security: currently as Escalation Engineer at PaloAlto Networks, specialist on AI Runtime Security, AI API-Intercept, Agent, Model Security, Red Teaming. also specialist on AI Infrastructure with BlueField 2/3 DPUs with Container Firewalls for NIC level L7 inspection on AI Factories.

Toolbox

AI / Serving: vLLM, Triton Inference Server, TensorRT-LLM, CUDA, Nsight Systems
RAG / Data: Milvus (hybrid vectors), LangChain/LangGraph patterns, chunking + eval harnesses
Infra: Linux, Docker/Podman, Kubernetes/OpenShift, Terraform, Ansible, Prometheus/Grafana
Security: threat modeling for AI systems, runtime controls, secure-by-design agent/tooling

Featured work

A few repos I’m proud of (more in pinned projects 👇):

NVIDIA vLLM Serve Llama3-70B (AWQ) on L40S with TP=2
Benchmarking + practical serving notes for multi-GPU inference.
https://github.com/kagaho/NVIDIA-VLLM_Serve_llama3-70b-awq_on_L40S_with_TP2
NVIDIA GenAI-Perf / Triton performance labs
Measuring and analyzing inference performance with real metrics and repeatable workflows.
(See pinned repos)
Nsight Systems notes
Practical profiling guidance: timelines, GPU gaps, CPU/GPU correlation, and bottleneck hunting.
(See pinned repos)
Training and Hands-on PaloAlto's PRECISION AI - Prisma-AIRS Components
Ongoing Series with AI Network and API intercept, Agent, Model Security and Red Teaming
https://github.com/kagaho/PANW-AIRS
Hands on training for Terraform and Ansible with PaloAlto VM-Series
Ongoing Series of Day 1 and Day 2 deployments with IaC on Virtual firewalls on cloud and premises
https://github.com/kagaho/Terraform-Ansible-Automation-Training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Renato G Teixeira kagaho

Block or report kagaho

Renato Teixeira

What I’m working on

Toolbox

Featured work

Pinned Loading

Uh oh!