LLM (Large Language Model)

[TOC]

Res

Learning Resource

Texts & Docs

📖 大规模语言模型：从理论到实践 https://intro-llm.github.io 大语言模型（Large Language Models，LLM）是一种由包含数百亿以上权重的深度神经网络构建的语言模型，使用自监督学习方法通过大量无标记文本进行训练。自2018年以来，包含Google、OpenAI、Meta、百度、华为等公司和研究机构都纷纷发布了包括BERT， GPT等在内多种模型，并在几乎所有自然语言处理任务中都表现出色。2021年开始大模型呈现爆发式的增长，特别是2022年11月ChatGPT发布后，更是引起了全世界的广泛关注。用户可以使用自然语言与系统交互，从而实现包括问答、分类、摘要、翻译、聊天等从理解到生成的各种任务。大型语言模型展现出了强大的对世界知识掌握和对语言的理解。本书将介绍大语言模型的基础理论包括语言模型、分布式模型训练以及强化学习，并以Deepspeed-Chat框架为例介绍实现大语言模型和类ChatGPT系统的实践。

🪜 https://github.com/Hannibal046/Awesome-LLM/tree/main Large Language Models(LLM) have taken the ~~NLP community~~ ~~AI community~~ the Whole World by storm. Here is a curated list of papers about large language models, especially relating to ChatGPT. It also contains frameworks for LLM training, tools to deploy LLM, courses and tutorials about LLM and all publicly available LLM checkpoints and APIs.

Awesome-LLM

Great thoughts about LLM

🔗 https://github.com/Hannibal046/Awesome-LLM/tree/main?tab=readme-ov-file#great-thoughts-about-llm

Miscellaneous

🔗 https://github.com/Hannibal046/Awesome-LLM/tree/main?tab=readme-ov-file#miscellaneous

Arize-Phoenix - Open-source tool for ML observability that runs in your notebook environment. Monitor and fine tune LLM, CV and Tabular Models.
Emergent Mind - The latest AI news, curated & explained by GPT-4.
ShareGPT - Share your wildest ChatGPT conversations with one click.
Major LLMs + Data Availability
500+ Best AI Tools
Cohere Summarize Beta - Introducing Cohere Summarize Beta: A New Endpoint for Text Summarization
chatgpt-wrapper - ChatGPT Wrapper is an open-source unofficial Python API and CLI that lets you interact with ChatGPT.
Open-evals - A framework extend openai's Evals for different language model.
Cursor - Write, edit, and chat about your code with a powerful AI.
AutoGPT - an experimental open-source application showcasing the capabilities of the GPT-4 language model.
OpenAGI - When LLM Meets Domain Experts.
EasyEdit - An easy-to-use framework to edit large language models.
chatgpt-shroud - A Chrome extension for OpenAI's ChatGPT, enhancing user privacy by enabling easy hiding and unhiding of chat history. Ideal for privacy during screen shares.

https://github.com/Shubhamsaboo/awesome-llm-apps A curated collection of Awesome LLM apps built with RAG, AI Agents, Multi-agent Teams, MCP, Voice Agents, and more. This repository features LLM apps that use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen or Llama that you can run locally on your computer.

🤔 https://transformer-circuits.pub/2025/attribution-graphs/biology.html On the Biology of a Large Language Model | Anthropic We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology.

Tutorials & Books & Courses ⭐

[!links] ↗ LLM (Large Language Model)

https://csdiy.wiki/%E6%B7%B1%E5%BA%A6%E7%94%9F%E6%88%90%E6%A8%A1%E5%9E%8B/roadmap/ 近几年大语言模型成为大热的方向，也和笔者博士期间的课题非常相关。这篇路线图旨在分享笔者在熟悉和深入深度生成模型这一领域过程中学习和参考的各类课程资料，方便相关领域的从业者或者对生成模型的底层原理感兴趣的朋友共同学习。由于笔者科研之余时间有限，很多课程的实验并未完成，等后续有时间完成之后会在该目录下一一添加。其实，大语言模型只是深度生成模型的一个分支，而其他生成模型例如 VAE，GAN，Diffusion Model，Flow 等等，都还在“生成”这一领域占有重要地位，所谓的 AIGC，就是泛指这一类技术。推荐学习下列课程：

MIT 6.S184: Generative AI with Stochastic Differential Equations: MIT IAP 小学期的 GenAI 入门课程，主要通过微分方程的视角讲解了 Flow Matching 和 Diffusion Model 背后的数学原理，并且配有简单的小实验让学生在实践中理解，适合对底层数学原理感兴趣的同学入门。
MIT 6.S978: Deep Generative Models: MIT 新晋明星教授何恺明亲授，涵盖了各种生成模型的基础理论和相关前沿论文，几次作业都有丰富的脚手架代码，难度不高但能加深理解，能对这个领域有个快速全貌了解。
- https://mit-6s978.github.io/
UCB CS294-158-SP24: Deep Unsupervised Learning: 强化学习领域的顶级巨佬 Pieter Abbeel 主讲，相比 MIT 的课程内容更加丰富全面，并且有配套课程视频和 Slides。此外课后作业只有测试代码，需要学生自主编写模型架构定义和训练代码，虽然硬核但很适合有志于炼丹的同学练手。众所周知，深度学习理论实践中存在着很多经验技巧，魔鬼往往存在于细节里。没有什么比自己上手训一个模型更能掌握这些细节了。
CMU 10423: Generative AI: CMU 的 GenAI 课程，相比前两门课更侧重于大语言模型一些，其他内容和前两门课重合较多。不过课程作业都挺有意思，推荐闲暇时间练练手。
- https://www.cs.cmu.edu/~mgormley/courses/10423/ OpenAI 的 GPT 系列让大语言模型在 Scaling Law 的指引下展现出惊人的效果，在数学和代码领域取得了很大进展。如果你主要关注大语言模型这个方向，那么推荐如下课程：
Stanford CS336: Language Modeling from Scratch: 正如课程标题写的，在这门课程中你将从头编写大语言模型的所有核心组件，例如 Tokenizer，模型架构，训练优化器，底层算子，训练数据清洗，后训练算法等等。每次作业的 handout 都有四五十页 pdf，相当硬核。如果你想充分吃透大语言模型的所有底层细节，那么非常推荐学习这门课程。
CMU 11868: Large Language Model Systems: CMU 的大语言模型系统课程，侧重底层系统优化，例如 GPU 加速，分布式训练和推理，以及各种前沿技术。非常适合从事系统领域的同学对这个方向有个全貌性的了解。课表里还包含了一篇我发表的 PD 分离相关的文章，因此私心推荐一下。课程作业的话会让你先实现一个迷你 Pytorch，然后在上面实现各种大语言模型的系统级优化。
CMU 11667: Large Language Models: Methods and Applications 和 CMU 11711: Advanced NLP: 和前两门课相比，这两门课更偏重上层算法和应用，而且每节课都列举了很多相关阅读材料，适合对大语言模型发展前沿的各个方向都有个粗糙的认识，如果对某个子领域感兴趣的话再寻着参考资料深入学习。

CSE234: Data Systems for Machine Learning 本课程专注于设计一个全面的大语言模型(LLM)系统课程，作为设计高效LLM系统的入门介绍。课程可以更准确地分为三个部分（外加若干 guest lecture）： Part 1. 基础：现代深度学习与计算表示

Modern DL 与计算图（computational graph / framework 基础）
Autodiff 与 ML system 架构概览
Tensor format、MatMul 深入与硬件加速器（accelerators） Part 2. 系统与性能优化：从 GPU Kernel 到编译与内存
GPUs & CUDA（含基本性能模型）
GPU MatMul 与算子编译（operator compilation）
Triton 编程、图优化与编译（graph optimization & compilation）
Memory（含训练/推理中的内存问题与技巧）
Quantization（量化方法与系统落地） Part 3. LLM系统：训练与推理
并行策略：模型并行、collective communication、intra-/inter-op、自动并行化
LLM 基础：Transformer、Attention、MoE
LLM 训练优化：FlashAttention 等
LLM 推理：continuous batching、paged attention、disaggregated prefill/decoding
Scaling law （Guest lectures：ML compiler、LLM pretraining/open science、fast inference、tool use & agents 等，作为补充与扩展。） CSE234的最大特点在于非常专注于以LLM (LLM System)为核心应用场景，强调真实系统设计中的取舍与工程约束，而非停留在算法或 API 使用层面。课程作业通常需要直接面对性能瓶颈（如内存带宽、通信开销、kernel fusion 等），并通过 Triton 或系统级优化手段加以解决，对理解“为什么某些 LLM 系统设计是现在这个样子”非常有帮助。学习体验整体偏硬核，前期对系统与并行计算背景要求较高，自学时建议提前补齐 CUDA/并行编程与基础系统知识，否则在后半部分（尤其是 LLM 优化与推理相关内容）会明显感到陡峭的学习曲线。但一旦跟上节奏，这门课对从事 LLM Infra / ML Systems / AI Compiler 方向的同学具有很强的长期价值。

https://github.com/PKU-DAIR/Starter-Guide 本仓库为PKU-DAIR团队为相关领域的新人提供全面的开源文档和技术指南。通过汇集团队的核心论文和经验分享，将帮助初学者快速熟悉数据管理(Data Management, DM) 和 人工智能(Artificial Intelligence, AI) 等前沿领域，搭建坚实的技术基础。无论你是刚入门还是希望加深理解，仓库中的资源将为你的学习和研究之旅提供有力支持。

https://github.com/Hannibal046/Awesome-LLM/tree/main?tab=readme-ov-file#llm-tutorials-and-courses LLM Tutorials and Courses

Andrej Karpathy Series - My favorite!
Umar Jamil Series - high quality and educational videos you don't want to miss.
Alexander Rush Series - high quality and educational materials you don't want to miss.
llm-course - Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
UWaterloo CS 886 - Recent Advances on Foundation Models.
CS25-Transformers United
ChatGPT Prompt Engineering
Princeton: Understanding Large Language Models
Stanford CS324 - Large Language Models
- 🏫 CS 324 Large Language Model
State of GPT
A Visual Guide to Mamba and State Space Models
Let's build GPT: from scratch, in code, spelled out.
minbpe - Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
femtoGPT - Pure Rust implementation of a minimal Generative Pretrained Transformer.
Neurips2022-Foundational Robustness of Foundation Models
ICML2022-Welcome to the "Big Model" Era: Techniques and Systems to Train and Serve Bigger Models
GPT in 60 Lines of NumPy
LLM‑RL‑Visualized (EN) | LLM‑RL‑Visualized (中文) - 100+ LLM / RL Algorithm Maps📚.

https://github.com/Hannibal046/Awesome-LLM/tree/main?tab=readme-ov-file#llm-books LLM Books

Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs - it comes with a GitHub repository that showcases a lot of the functionality
Build a Large Language Model (From Scratch) - A guide to building your own working LLM.
BUILD GPT: HOW AI WORKS - explains how to code a Generative Pre-trained Transformer, or GPT, from scratch.
Hands-On Large Language Models: Language Understanding and Generation - Explore the world of Large Language Models with over 275 custom made figures in this illustrated guide!
The Chinese Book for Large Language Models - An Introductory LLM Textbook Based on A Survey of Large Language Models.

https://diffusion.csail.mit.edu/ Introduction to Flow Matching and Diffusion Models MIT Computer Science Class 6.S184: Generative AI with Stochastic Differential Equations

Diffusion and flow-based models have become the state of the art for generative AI across a wide range of data modalities, including images, videos, shapes, molecules, music, and more! This course aims to build up the mathematical framework underlying these models from first principles. At the end of the class, students will have built a toy image diffusion model from scratch, and along the way, will have gained hands-on experience with the mathematical toolbox of stochastic differential equations that is useful in many other fields. This course is ideal for students who want to develop a principled understanding of the theory and practice of generative AI.

Videos

https://youtu.be/1il-s4mgNdI?si=DxlD_98ITLZsnCIw What does it mean for computers to understand language? | LM1 vcubingx

https://youtu.be/kCc8FmEb1nY?si=Dhj1moY2pHkyiCiT Let's build GPT: from scratch, in code, spelled out. Andrej Karpathy

https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&si=AUDMGwyz7-yL33Xd Neural networks | 3Blue1Brown

But what is a neural network? | Deep learning chapter 1
Gradient descent, how neural networks learn | Deep Learning Chapter 2
Backpropagation, intuitively | Deep Learning Chapter 3
Backpropagation calculus | Deep Learning Chapter 4
Large Language Models explained briefly
Transformers, the tech behind LLMs | Deep Learning Chapter 5
- 【【官方双语】GPT是什么？直观解释Transformer | 深度学习第5章-哔哩哔哩】 https://b23.tv/rcO76mO
Attention in transformers, step-by-step | Deep Learning Chapter 6
- 【【官方双语】直观解释注意力机制，Transformer的核心 | 【深度学习第6章】-哔哩哔哩】 https://b23.tv/f0udg4P
How might LLMs store facts | Deep Learning Chapter 7

Lex Fridman

Machine Learning Street Talk

StatQuest with Josh Starmer

Jeremy Howard

Serrano.Academy

Hamel Husain

Jason Liu

Dave Ebbelaar

Blogs & Communities

https://www.alignmentforum.org/

Papers & Researches

LLM Survey Papers

Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2025). Large Language Models: A Survey (arXiv:2402.06196). arXiv. https://doi.org/10.48550/arXiv.2402.06196

🚧 👍 https://github.com/RUCAIBox/LLMSurvey A collection of papers and resources related to Large Language Models. The organization of papers refers to our survey "A Survey of Large Language Models".

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2025). A Survey of Large Language Models (arXiv:2303.18223). arXiv. https://doi.org/10.48550/arXiv.2303.18223

Other Papers ⭐

👍 📄 https://github.com/RUCAIBox/LLMSurvey （大语言模型综述 | 中国人民大学高瓴人工智能学院） A collection of papers and resources related to Large Language Models. The organization of papers refers to our survey "A Survey of Large Language Models". To facilitate the reading of our (English-verison) survey, we also translate a Chinese version for this survey. We will continue to update the Chinese version.

📄 https://arc.net/folder/D0472A20-9C20-4D3F-B145-D2865C0A9FEE Papers must know to understand the world of deep learning & AIGC

🔗 https://github.com/Hannibal046/Awesome-LLM/tree/main?tab=readme-ov-file#other-papers (2025.01)

If you're interested in the field of LLM, you may find the above list of milestone papers helpful to explore its history and state-of-the-art. However, each direction of LLM offers a unique set of insights and contributions, which are essential to understanding the field as a whole. For a detailed list of papers in various subfields, please refer to the following link:

Awesome-LLM-hallucination - LLM hallucination paper list.
awesome-hallucination-detection - List of papers on hallucination detection in LLMs.
LLMsPracticalGuide - A curated list of practical guide resources of LLMs
Awesome ChatGPT Prompts - A collection of prompt examples to be used with the ChatGPT model.
awesome-chatgpt-prompts-zh - A Chinese collection of prompt examples to be used with the ChatGPT model.
Awesome ChatGPT - Curated list of resources for ChatGPT and GPT-3 from OpenAI.
Chain-of-Thoughts Papers - A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models.
Awesome Deliberative Prompting - How to ask LLMs to produce reliable reasoning and make reason-responsive decisions.
Instruction-Tuning-Papers - A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).
LLM Reading List - A paper & resource list of large language models.
Reasoning using Language Models - Collection of papers and resources on Reasoning using Language Models.
Chain-of-Thought Hub - Measuring LLMs' Reasoning Performance
Awesome GPT - A curated list of awesome projects and resources related to GPT, ChatGPT, OpenAI, LLM, and more.
Awesome GPT-3 - a collection of demos and articles about the OpenAI GPT-3 API.
Awesome LLM Human Preference Datasets - a collection of human preference datasets for LLM instruction tuning, RLHF and evaluation.
RWKV-howto - possibly useful materials and tutorial for learning RWKV.
ModelEditingPapers - A paper & resource list on model editing for large language models.
Awesome LLM Security - A curation of awesome tools, documents and projects about LLM Security.
Awesome-Align-LLM-Human - A collection of papers and resources about aligning large language models (LLMs) with human.
Awesome-Code-LLM - An awesome and curated list of best code-LLM for research.
Awesome-LLM-Compression - Awesome LLM compression research papers and tools.
Awesome-LLM-Systems - Awesome LLM systems research papers.
awesome-llm-webapps - A collection of open source, actively maintained web apps for LLM applications.
awesome-japanese-llm - 日本語LLMまとめ - Overview of Japanese LLMs.
Awesome-LLM-Healthcare - The paper list of the review on LLMs in medicine.
Awesome-LLM-Inference - A curated list of Awesome LLM Inference Paper with codes.
Awesome-LLM-3D - A curated list of Multi-modal Large Language Model in 3D world, including 3D understanding, reasoning, generation, and embodied agents.
LLMDatahub - a curated collection of datasets specifically designed for chatbot training, including links, size, language, usage, and a brief description of each dataset
Awesome-Chinese-LLM - 整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。
LLM4Opt - Applying Large language models (LLMs) for diverse optimization tasks (Opt) is an emerging research area. This is a collection of references and papers of LLM4Opt.
awesome-language-model-analysis - This paper list focuses on the theoretical or empirical analysis of language models, e.g., the learning dynamics, expressive capacity, interpretability, generalization, and other interesting topics.

Other Resources

🎬 https://youtu.be/OFS90-FX6pg?si=hlsJj4DUWzGrZ_V- The Origin of ChatGPT | Art of the Problem I follow the 35 year journey that led to the explosion of Large Language Models. From Jordan's pioneering work in 1986 to today's GPT-4, this documentary traces how AI learned to talk. Featuring insights from AI pioneers including Chomsky, Hofstadter, Hinton, and LeCun, exploring the revolutionary concepts that made ChatGPT possible: transformer architecture, attention mechanism, next-token prediction, and emergent capabilities. Next video following open ai's o1 model My script, references & visualizations here: https://docs.google.com/document/d/1s7FNPoKPW9y3EhvzNgexJaEG2pP4Fx_rmI4askoKZPA/edit?usp=sharing

🎬 (1hr Talk) Intro to Large Language Models | Andrej Karpathy https://youtu.be/zjkBMFhNj_g?si=G546Rtz9r9hc233z

👍 https://huggingface.co/spaces/Eliahu/Model-Atlas

https://www.anthropic.com/research/estimating-productivity-gains Estimating AI productivity gains from Claude conversations

Intro: LLM Principles & Utilization

Large Language Models explained briefly | 3Blue1Brown

📎 https://cameronrwolfe.substack.com/p/understanding-and-using-supervised

Transformer Architecture: Nearly all modern language models—and many other deep learning models—are based upon this architecture.
Decoder-only Transformers : This is the specific variant of the transformer architecture that is used by most generative LLMs.
Brief History of LLMs: LLMs have gone through several phases from the creation of GPT to the release of ChatGPT.
Next token prediction: this self-supervised training objective underlies nearly all LLM functionality and is used by SFT!
Language Model Pretraining: language models are pretrained over a massive, unlabeled textual corpus.
Language Model Inference: language models can be used to generate coherent sequences of text via autoregressive next token prediction.

↗ Natural Language Processing (NLP) /Intro

LLM Backgrounds

📜 The Development History of AI, NLP, and LLM

↗ The Development History of AI ↗ Artificial Neural Networks (ANN) & Deep Learning Methods ↗ Natural Language Processing (NLP) & Computational Linguistics

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2025). A Survey of Large Language Models (arXiv:2303.18223). arXiv.
https://doi.org/10.48550/arXiv.2303.18223

Scaling Laws

https://stanford-cs324.github.io/winter2022/lectures/scaling-laws/

Emergent Abilities

How Emergent Abilities Relate to Scaling Laws

LLM Modeling ⭐

[!links] ↗ LLM Foundation Models List & Evaluation and Benchmarks & Leaderboard ↗ Transformers

https://poloclub.github.io/transformer-explainer/

Tokenization & Embedding

🔗 https://stanford-cs324.github.io/winter2022/lectures/modeling/#model-architecture

LLM Model Architectures

↗ Transformers

Tokenization
Attention
Probability

↗ RWKV (Receptance Weighted Key Value) ↗ Mamba

LLM Training, Utilization, and Evaluation

↗ LLM Training, Utilization, and Evaluation

↗ Pre-Training (In-Weight Learning)
↗ LLM Adaptation & Alignment Tuning
↗ LLM Utilization & Prompt, Context, and Harness Engineering (In-Context Learning)

LLM Reasoning & Large Reasoning Models (LRM) 🤔

↗ Reinforcement Learning (RL) & Sequential Decision Making

↗ LLM and RL ↗ RLFT (Reinforcement Learning Fine Tuning)

🔗 https://en.wikipedia.org/wiki/Reasoning_model

A reasoning model, also known as reasoning language models (RLMs) or large reasoning models (LRMs), is a type of large language model (LLM) that has been specifically trained to solve complex tasks requiring multiple steps of logical reasoning. These models demonstrate superior performance on logic, mathematics, and programming tasks compared to standard LLMs. They possess the ability to revisit and revise earlier reasoning steps and utilize additional computation during inference as a method to scale performance, complementing traditional scaling approaches based on training data size, model parameters, and training compute.

Unlike traditional language models that generate responses immediately, reasoning models allocate additional compute, or thinking, time before producing an answer to solve multi-step problems. OpenAI introduced this terminology in September 2024 when it released the o1 series, describing the models as designed to "spend more time thinking" before responding. The company framed o1 as a reset in model naming that targets complex tasks in science, coding, and mathematics, and it contrasted o1's performance with GPT-4o on benchmarks such as AIME and Codeforces. Independent reporting the same week summarized the launch and highlighted OpenAI's claim that o1 automates chain-of-thought style reasoning to achieve large gains on difficult exams.

In operation, reasoning models generate internal chains of intermediate steps, then select and refine a final answer. OpenAI reported that o1's accuracy improves as the model is given more reinforcement learning during training and more test-time compute at inference. The company initially chose to hide raw chains and instead return a model-written summary, stating that it "decided not to show" the underlying thoughts so researchers could monitor them without exposing unaligned content to end users. Commercial deployments document separate "reasoning tokens" that meter hidden thinking and a control for "reasoning effort" that tunes how much compute the model uses. These features make the models slower than ordinary chat systems while enabling stronger performance on difficult problems.

📜 The Technical Evolution of LLM & Future Directions

[!links] ↗ LLM Foundation Models List & Evaluation and Benchmarks & Leaderboard

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2025). A Survey of Large Language Models (arXiv:2303.18223). arXiv.
https://doi.org/10.48550/arXiv.2303.18223

LLM Milestone Papers ⭐

https://github.com/Hannibal046/Awesome-LLM/tree/main?tab=readme-ov-file#milestone-papers (2025.01)

Date	keywords	Institute	Paper
2017-06	Transformers	Google	Attention Is All You Need
2018-06	GPT 1.0	OpenAI	Improving Language Understanding by Generative Pre-Training
2018-10	BERT	Google	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019-02	GPT 2.0	OpenAI	Language Models are Unsupervised Multitask Learners
2019-09	Megatron-LM	NVIDIA	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
2019-10	T5	Google	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
2019-10	ZeRO	Microsoft	ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
2020-01	Scaling Law	OpenAI	Scaling Laws for Neural Language Models
2020-05	GPT 3.0	OpenAI	Language models are few-shot learners
2021-01	Switch Transformers	Google	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2021-08	Codex	OpenAI	Evaluating Large Language Models Trained on Code
2021-08	Foundation Models	Stanford	On the Opportunities and Risks of Foundation Models
2021-09	FLAN	Google	Finetuned Language Models are Zero-Shot Learners
2021-10	T0	HuggingFace et al.	Multitask Prompted Training Enables Zero-Shot Task Generalization
2021-12	GLaM	Google	GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
2021-12	WebGPT	OpenAI	WebGPT: Browser-assisted question-answering with human feedback
2021-12	Retro	DeepMind	Improving language models by retrieving from trillions of tokens
2021-12	Gopher	DeepMind	Scaling Language Models: Methods, Analysis & Insights from Training Gopher
2022-01	COT	Google	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
2022-01	LaMDA	Google	LaMDA: Language Models for Dialog Applications
2022-01	Minerva	Google	Solving Quantitative Reasoning Problems with Language Models
2022-01	Megatron-Turing NLG	Microsoft&NVIDIA	Using Deep and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
2022-03	InstructGPT	OpenAI	Training language models to follow instructions with human feedback
2022-04	PaLM	Google	PaLM: Scaling Language Modeling with Pathways
2022-04	Chinchilla	DeepMind	An empirical analysis of compute-optimal large language model training
2022-05	OPT	Meta	OPT: Open Pre-trained Transformer Language Models
2022-05	UL2	Google	Unifying Language Learning Paradigms
2022-06	Emergent Abilities	Google	Emergent Abilities of Large Language Models
2022-06	BIG-bench	Google	Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
2022-06	METALM	Microsoft	Language Models are General-Purpose Interfaces
2022-09	Sparrow	DeepMind	Improving alignment of dialogue agents via targeted human judgements
2022-10	Flan-T5/PaLM	Google	Scaling Instruction-Finetuned Language Models
2022-10	GLM-130B	Tsinghua	GLM-130B: An Open Bilingual Pre-trained Model
2022-11	HELM	Stanford	Holistic Evaluation of Language Models
2022-11	BLOOM	BigScience	BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
2022-11	Galactica	Meta	Galactica: A Large Language Model for Science
2022-12	OPT-IML	Meta	OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
2023-01	Flan 2022 Collection	Google	The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
2023-02	LLaMA	Meta	LLaMA: Open and Efficient Foundation Language Models
2023-02	Kosmos-1	Microsoft	Language Is Not All You Need: Aligning Perception with Language Models
2023-03	LRU	DeepMind	Resurrecting Recurrent Neural Networks for Long Sequences
2023-03	PaLM-E	Google	PaLM-E: An Embodied Multimodal Language Model
2023-03	GPT 4	OpenAI	GPT-4 Technical Report
2023-04	LLaVA	UW–Madison&Microsoft	Visual Instruction Tuning
2023-04	Pythia	EleutherAI et al.	Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
2023-05	Dromedary	CMU et al.	Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
2023-05	PaLM 2	Google	PaLM 2 Technical Report
2023-05	RWKV	Bo Peng	RWKV: Reinventing RNNs for the Transformer Era
2023-05	DPO	Stanford	Direct Preference Optimization: Your Language Model is Secretly a Reward Model
2023-05	ToT	Google&Princeton	Tree of Thoughts: Deliberate Problem Solving with Large Language Models
2023-07	LLaMA2	Meta	Llama 2: Open Foundation and Fine-Tuned Chat Models
2023-10	Mistral 7B	Mistral	Mistral 7B
2023-12	Mamba	CMU&Princeton	Mamba: Linear-Time Sequence Modeling with Selective State Spaces
2024-01	DeepSeek-v2	DeepSeek	DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
2024-05	Mamba2	CMU&Princeton	Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
2024-05	Llama3	Meta	The Llama 3 Herd of Models
2024-12	Qwen2.5	Alibaba	Qwen2.5 Technical Report
2024-12	DeepSeek-V3	DeepSeek	DeepSeek-V3 Technical Report
2025-01	DeepSeek-R1	DeepSeek	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Technical Evolution of Specific LLM Model Series

GPT-series Model

↗ OpenAI GPT

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2025). A Survey of Large Language Models (arXiv:2303.18223). arXiv. https://doi.org/10.48550/arXiv.2303.18223

Gemini-series Model

↗ Google Gemini

LLaMA-series Model

↗ Meta LLama

Qwen-series Model

↗ Alibaba Qwen

DeepSeek-series Model

↗ DeepSeek

Ref

Prompt Injection 是一种攻击技术，黑客或恶意攻击者操纵 AI 模型的输入值，以诱导模型返回非预期的结果。这里提到的属于是SSTI服务端模板注入。

这允许攻击者利用模型的安全性来泄露用户数据或扭曲模型的训练结果。在某些模型中，很多情况下输入提示的数据会直接暴露或对输出有很大影响。

QuantumBlack AI by McKinsey: "The next innovation revolution - powered by AI" Gruber & Tal: The Market Opportunity Navigator , PDF worksheet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM (Large Language Model)

Res

Related Topics

Learning Resource

Texts & Docs

Tutorials & Books & Courses ⭐

Videos

Blogs & Communities

Papers & Researches

LLM Survey Papers

Other Papers ⭐

Other Resources

Intro: LLM Principles & Utilization

LLM Backgrounds

📜 The Development History of AI, NLP, and LLM

Scaling Laws

Emergent Abilities

How Emergent Abilities Relate to Scaling Laws

LLM Modeling ⭐

Tokenization & Embedding

LLM Model Architectures

LLM Training, Utilization, and Evaluation

LLM Reasoning & Large Reasoning Models (LRM) 🤔

LLM Infrastructure & Deployment

LLM Applications & LLM-Driven Automation

Agentic LLM and LLM OS

Artificial General Intelligence?

📜 The Technical Evolution of LLM & Future Directions

LLM Milestone Papers ⭐

Technical Evolution of Specific LLM Model Series

GPT-series Model

Gemini-series Model

LLaMA-series Model

Qwen-series Model

DeepSeek-series Model

Ref

FilesExpand file tree

LLM (Large Language Model).md

Latest commit

History

LLM (Large Language Model).md

File metadata and controls

LLM (Large Language Model)

Res

Related Topics

Learning Resource

Texts & Docs

Tutorials & Books & Courses ⭐

Videos

Blogs & Communities

Papers & Researches

LLM Survey Papers

Other Papers ⭐

Other Resources

Intro: LLM Principles & Utilization

LLM Backgrounds

📜 The Development History of AI, NLP, and LLM

Scaling Laws

Emergent Abilities

How Emergent Abilities Relate to Scaling Laws

LLM Modeling ⭐

Tokenization & Embedding

LLM Model Architectures

LLM Training, Utilization, and Evaluation

LLM Reasoning & Large Reasoning Models (LRM) 🤔

LLM Infrastructure & Deployment

LLM Applications & LLM-Driven Automation

Agentic LLM and LLM OS

Artificial General Intelligence?

📜 The Technical Evolution of LLM & Future Directions

LLM Milestone Papers ⭐

Technical Evolution of Specific LLM Model Series

GPT-series Model

Gemini-series Model

LLaMA-series Model

Qwen-series Model

DeepSeek-series Model

Ref