Skip to content
View kagaho's full-sized avatar
🎯
Till All Are One
🎯
Till All Are One

Block or report kagaho

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
kagaho/README.md

Renato Teixeira

AI Infrastructure Engineer · AI Security Engineer · Platform Engineer

Hybrid engineer, capable of planning, designing, and implementing AI projects across both high-level and low-level architecture, with Network and Cybersecurity background. Passionate on creating secure, high-performance AI systems — from GPU-backed inference stacks to retrieval pipelines and security controls around LLMs. I like practical labs, reproducible tooling, and performance debugging.

  • 🧠 Focus: LLM serving & optimization, RAG / retrieval quality, agentic workflows, AI security & runtime controls
  • 🧰 I work mostly with local models and self-hosted infra (GPU-first)
  • 📍 Based in Europe (NL)

What I’m working on

  • GPU inference stacks (vLLM / Triton / TensorRT-LLM), profiling + performance tuning (TTFT, throughput, batching)
  • Creating Agentic workflows with L40S, A10, Jetson Thor, DGX Spark and RTX4090
  • RAG systems with hybrid retrieval (dense + sparse), grounding, and evaluation
  • Secure AI workflows: guardrails, prompt-injection resilience, and policy-driven controls around model + tool access, agent/MCP security
  • Infrastructure automation: containers, Kubernetes, Terraform/Ansible, observability and scaling (Prometheus/Grafana)
  • AI Security: currently as Escalation Engineer at PaloAlto Networks, specialist on AI Runtime Security, AI API-Intercept, Agent, Model Security, Red Teaming. also specialist on AI Infrastructure with BlueField 2/3 DPUs with Container Firewalls for NIC level L7 inspection on AI Factories.

Toolbox

AI / Serving: vLLM, Triton Inference Server, TensorRT-LLM, CUDA, Nsight Systems
RAG / Data: Milvus (hybrid vectors), LangChain/LangGraph patterns, chunking + eval harnesses
Infra: Linux, Docker/Podman, Kubernetes/OpenShift, Terraform, Ansible, Prometheus/Grafana
Security: threat modeling for AI systems, runtime controls, secure-by-design agent/tooling


Featured work

A few repos I’m proud of (more in pinned projects 👇):

  • NVIDIA vLLM Serve Llama3-70B (AWQ) on L40S with TP=2
    Benchmarking + practical serving notes for multi-GPU inference.
    https://github.com/kagaho/NVIDIA-VLLM_Serve_llama3-70b-awq_on_L40S_with_TP2

  • NVIDIA GenAI-Perf / Triton performance labs
    Measuring and analyzing inference performance with real metrics and repeatable workflows.
    (See pinned repos)

  • Nsight Systems notes
    Practical profiling guidance: timelines, GPU gaps, CPU/GPU correlation, and bottleneck hunting.
    (See pinned repos)

  • Training and Hands-on PaloAlto's PRECISION AI - Prisma-AIRS Components
    Ongoing Series with AI Network and API intercept, Agent, Model Security and Red Teaming
    https://github.com/kagaho/PANW-AIRS

  • Hands on training for Terraform and Ansible with PaloAlto VM-Series
    Ongoing Series of Day 1 and Day 2 deployments with IaC on Virtual firewalls on cloud and premises
    https://github.com/kagaho/Terraform-Ansible-Automation-Training


Pinned Loading

  1. PANW-AIRS PANW-AIRS Public

    Training and Hands-on PaloAlto's Prisma-AIRS Components

    Jupyter Notebook 4

  2. Build-Jetson-Container-VLLM-0.10.2 Build-Jetson-Container-VLLM-0.10.2 Public

    Build Jetson Container VLLM 0.10.2 for new models, like Qwen3-Next-80B-A3B-Instruct

    4

  3. NVIDIA-VLLM_Serve_llama3-70b-awq_on_L40S_with_TP2 NVIDIA-VLLM_Serve_llama3-70b-awq_on_L40S_with_TP2 Public

    Container VLLM with Tensor Parallelism, AIPerf, Benchmark, Prometheus metrics

    Jupyter Notebook 2

  4. NVIDIA-Nemo-Guardrails NVIDIA-Nemo-Guardrails Public

    Study on Nemo guardrails on triton inference server at Jetson Thor

    Python 2

  5. NVIDIA-GenAI-Performance-Analyzer NVIDIA-GenAI-Performance-Analyzer Public

    GenAI-Perf on Triton Inference Server

    Dockerfile 2 1