Hey, I'm jianzhnie. Thanks for stopping by!
I'm an AI engineer focusing on LLMs, RLHF, Reinforcement Learning, and production-grade code.
| Code Repo | About | 
|---|---|
| LLMReasoning | Techniques and toolkit for reasoning with LLMs. | 
| LLMEval | A modular framework to evaluate LLMs across tasks and settings. | 
| LLMToolkit | A PyTorch toolkit for NLP and LLM development. | 
| LLamaTuner | Easy and efficient finetuning pipelines for LLMs. | 
| Open-R1 | Open-source DeepSeek-R1-style and RLHF training pipeline. | 
| awesome-instruction-datasets | Curated instruction/prompt datasets for training ChatLLMs. | 
| Code Repo | About | 
|---|---|
| Deep-RL-Toolkit | Single-agent RL toolkit (DQN, Rainbow, DDPG, PPO, SAC, TD3, …). | 
| Deep-MARL-Toolkit | Multi-agent RL toolkit (VDN, QMIX, MADDPG, MAPPO, …). | 
| RLZero | MCTS for general sequential decision making (AlphaZero, MuZero, …). | 
| ScaleRL | Simple, scalable distributed RL (A3C, Ape-X, IMPALA, …). | 
| CyberAttackSimulator | RL environment for autonomous cyber attack and defense on simulated networks. | 
- Diffuser Toolkit for image/audio generation in PyTorch: diffusion-toolkit
 - AutoML for deep learning and tabular tasks: AutoTimm | AutoTabular
 - Trying to reduce the Learning Machine Learning (LML) loss 😂
 - Coding every day to become a better research engineer
 
- RL for Reasoning and GRPO
 - LLM systems and AGI
 - Large-scale distributed RL systems
 
- Email: [email protected]
 - Homepage: https://jianzhnie.github.io
 - Blog: https://jianzhnie.github.io/llmtech/
 - ZhiHu: https://www.zhihu.com/column/fengnie
 - Hugging Face Org: https://huggingface.co/GaussianTech
 - LinkedIn: https://www.linkedin.com/in/jianzheng-nie-2749b7156/
 - Ask me about: statistics, machine learning, LLMs, and RL.
 - ❤️ Sponsor me on GitHub
 
Have an awesome day!




