Skip to content
View ruixiang63's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report ruixiang63

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ruixiang63/README.md

Hi, I'm Ruixiang πŸ‘‹

I am Senior DevTech Engineer at NVIDIA.

πŸš€ Recent Open Source Contributions

  • #23869 β€” Speed-bench: standardized speculative decoding performance evaluation benchmark
  • #18039 β€” Eagle3 speculative decoding: 1.2–3.28Γ— speedup across many model families
  • #22105 β€” DFlash speculative decoding: up to 8Γ— speedup on Qwen3 models
  • #24536 β€” Add speculative decoding metrics for better observability and parameters tuning
  • #24655 β€” Support GPU-backend sampling to improve Eagle3 performance
  • #45665 β€” Performance fix: eliminated implicit H2D copies in Gated DeltaNet
  • This NVIDIA-Unsloth blog explains the following optimizations in detail.
  • #534 β€” Double-buffered checkpoint reload via CUDA streams + events, +8.4% on 8B, +6.7% on 14B fine-tuning speedup
  • #4173 β€” Packed-sequence metadata caching, +14.3% fine-tuning speedup on Qwen3-14B QLoRA SFT
  • #535 β€” GPT-OSS MoE expert routing optimization, ~10-15% fine-tuning speedup on GPT-OSS models

✍️ Technical Writing β€” NVIDIA Developer Blog

Model Quantization Series:

  1. Concepts, Methods, and Why It Matters
  2. Post-Training Quantization Using NVIDIA Model Optimizer
  3. Turn FP8 Checkpoints into High-Performance Inference Engines with NVIDIA TensorRT

Pinned Loading

  1. Research-Project-Title-Embedding Research-Project-Title-Embedding Public

    This project aims to improve the quality eBay product title embedding. Here are the slides and my master thesis. The source code is in company's repo and not able to release now.

    1

  2. microgpt-cpp microgpt-cpp Public

    C++ version of MicroGPT with GPU acceleration

    C++

  3. llama.cpp llama.cpp Public

    Forked from ggml-org/llama.cpp

    LLM inference in C/C++

    C++ 5 2

  4. ggml-org/llama.cpp ggml-org/llama.cpp Public

    LLM inference in C/C++

    C++ 117k 19.7k

  5. unslothai/unsloth unslothai/unsloth Public

    Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

    Python 66.7k 6k

  6. unslothai/unsloth-zoo unslothai/unsloth-zoo Public

    Utils for Unsloth https://github.com/unslothai/unsloth

    Python 276 278