pei0033

Eunik Park | ML Engineer

About Me

I am a ML Engineer focused on efficient inference and hardware-aware optimization. My work spans LLM serving, model optimization, and runtime performance across GPU, NPU, and mobile environments. I enjoy turning research and systems ideas into practical, production-ready improvements in throughput, latency, and reliability.

Skills

Work Experience

ML Engineer @ SqueezeBits

06/2022 - Present

Optimizing models for target hardware & platforms
Enhancing performance-speed trade-offs through PTQ and QAT
Conducted benchmarking of vLLM and TensorRT-LLM serving

Internship @ LG CNS

07/2021 - 08/2021

Built AWS 3-tier web service using Terraform

Projects

vLLM for RBLN

12/2025 - Present

[Repo]

Worked on serving-path optimization for decoding, scheduling, and structured generation
Improved end-to-end inference performance through runtime profiling and targeted optimizations
Built supporting benchmark and validation workflows for repeatable performance analysis

MAX

01/2026 - Present

[Repo]

Integrated model pipelines into inference platforms and production-style serving paths
Optimized interactions between preprocessing, model execution, and postprocessing stages
Added verification and benchmarking coverage to support stable iteration

OwLite

08/2023 - 12/2025

[Website] [Github] [OwLite Examples]

Developed a framework for easy model quantization from PyTorch to TensorRT
Implemented various quantization algorithms and simulations
Produced various examples and identified optimization patterns

Fits-on-Chips

02/2024 - 06/2024

[Website]

Conducted comprehensive performance benchmarking of LLM serving frameworks
Implemented evaluation module
Wrote blog post, [vLLM vs TensorRTLLM] weight-activation quantization

Efficient Keyword Spotting Research

02/2024 - 06/2024

Presented poster at Interspeech 2024
RepTor: Re-parameterizable Temporal Convolution for Keyword Spotting via Differentiable Kernel Search
Developed CNN-based KWS model using structural reparameterization
Implemented Latency-aware Neural Architecture Search
Achieved 97.9% accuracy with 183μs latency on Galaxy S10 CPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pei0033

Achievements

Achievements

Block or report pei0033

Eunik Park | ML Engineer

About Me

Skills

Work Experience

ML Engineer @ SqueezeBits

Internship @ LG CNS

Projects

vLLM for RBLN

MAX

OwLite

Fits-on-Chips

Efficient Keyword Spotting Research

Education

POSTECH

Changwon Science High School

Pinned Loading

Uh oh!