Senior AI Infrastructure & Cloud-Native Engineer | Shanghai, China
A senior engineer with 10+ years of experience in high-concurrency backend systems and cloud-native infrastructure. Currently focused on Cloud-Native AI Infrastructure and Distributed LLM Inference Systems, with hands-on experience across Kubernetes-based orchestration, model serving, observability, and performance optimization.
Cloud-Native AI Infrastructure: Designing and operating scalable infrastructure for AI workloads, including scheduling, networking, observability, and platform reliability.
Distributed LLM Inference: Working on production inference systems, including PD disaggregation, cache-aware routing, workload orchestration, autoscaling, and runtime optimization.
High-Performance AI Networking: Exploring topology-aware scheduling and network-aware optimization for LLM inference, including RoCE, InfiniBand, and related high-performance communication patterns.
Observability & Platform Reliability: Building monitoring, tracing, metrics, and dashboarding capabilities for cloud-native and AI infrastructure using OpenTelemetry, Prometheus, and Grafana.
llm-d: Active contributor to the llm-d open-source ecosystem, focusing on distributed LLM inference, cloud-native orchestration, and production-grade AI infrastructure.
Toolbox: Go, Python | Kubernetes, KubeRay, Istio | vLLM, Ray | OpenTelemetry, Prometheus, Grafana.
📫 Connect: Active in AI Infra developer communities.
Optimizing infrastructure so that intelligence can scale freely.


