Skip to content
View pei0033's full-sized avatar

Block or report pei0033

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pei0033/README.md

Eunik Park | ML Engineer

LinkedIn Email

About Me


I am a ML Engineer focused on efficient inference and hardware-aware optimization. My work spans LLM serving, model optimization, and runtime performance across GPU, NPU, and mobile environments. I enjoy turning research and systems ideas into practical, production-ready improvements in throughput, latency, and reliability.

Skills


Python PyTorch C++ CUDA
vLLM SGLang TensorRT TensorRT--LLM
MAX ONNX LiteRT Android

Work Experience


ML Engineer @ SqueezeBits SqueezeBits Logo

06/2022 - Present

  • Optimizing models for target hardware & platforms
  • Enhancing performance-speed trade-offs through PTQ and QAT
  • Conducted benchmarking of vLLM and TensorRT-LLM serving

Internship @ LG CNS LG CNS Logo

07/2021 - 08/2021

  • Built AWS 3-tier web service using Terraform

Projects


vLLM for RBLN

12/2025 - Present

[Repo]

  • Worked on serving-path optimization for decoding, scheduling, and structured generation
  • Improved end-to-end inference performance through runtime profiling and targeted optimizations
  • Built supporting benchmark and validation workflows for repeatable performance analysis

MAX

01/2026 - Present

[Repo]

  • Integrated model pipelines into inference platforms and production-style serving paths
  • Optimized interactions between preprocessing, model execution, and postprocessing stages
  • Added verification and benchmarking coverage to support stable iteration

owlite_logo OwLite

08/2023 - 12/2025

[Website] [Github] [OwLite Examples]

  • Developed a framework for easy model quantization from PyTorch to TensorRT
  • Implemented various quantization algorithms and simulations
  • Produced various examples and identified optimization patterns

fistonchips_logo Fits-on-Chips

02/2024 - 06/2024

[Website]

Efficient Keyword Spotting Research

02/2024 - 06/2024

Education


POSTECH

  • Bachelor's in IT Convergence Engineering
  • 03/2016 - 09/2022

Changwon Science High School

  • 03/2014 - 02/2016

Pinned Loading

  1. modular/modular modular/modular Public

    The Modular Platform (includes MAX & Mojo)

    Mojo 25.9k 2.8k

  2. RBLN-SW/vllm-rbln RBLN-SW/vllm-rbln Public

    vLLM plugin for RBLN NPU

    Python 46 9

  3. SqueezeBits/owlite SqueezeBits/owlite Public

    OwLite is a low-code AI model compression toolkit for AI models.

    Python 53 4