[TOC]
↗ Linguistics ↗ Ordinary Language Philosophy
↗ Information Theory ↗ Algebraic Structure & Abstract Algebra & Modern Algebra ↗ Probability Theory & Statistics
↗ Statistical (Data-Driven) Learning & Machine Learning (ML) ↗ Artificial Neural Networks (ANN) & Deep Learning Methods
LLM & Academics 🧑🎓
- ↗ LLM & Federated Learning
- ↗ LLM & Fuzzing
- ↗ LLM & Software Security and Analysis ↗ LLM For Security
↗ AI4SE
↗ Artificial Intelligence Related Conferences & Journals ↗ Research Topics in LLM ↗ XAI (eXplainable AI) & Mathematical Analysis of AI
↗ Artificial Intelligence Industry and Companies
📖 大规模语言模型:从理论到实践 https://intro-llm.github.io 大语言模型(Large Language Models,LLM)是一种由包含数百亿以上权重的深度神经网络构建的语言模型,使用自监督学习方法通过大量无标记文本进行训练。自2018年以来,包含Google、OpenAI、Meta、百度、华为等公司和研究机构都纷纷发布了包括BERT, GPT等在内多种模型,并在几乎所有自然语言处理任务中都表现出色。2021年开始大模型呈现爆发式的增长,特别是2022年11月ChatGPT发布后,更是引起了全世界的广泛关注。用户可以使用自然语言与系统交互,从而实现包括问答、分类、摘要、翻译、聊天等从理解到生成的各种任务。大型语言模型展现出了强大的对世界知识掌握和对语言的理解。本书将介绍大语言模型的基础理论包括语言模型、分布式模型训练以及强化学习,并以Deepspeed-Chat框架为例介绍实现大语言模型和类ChatGPT系统的实践。
🪜 https://github.com/Hannibal046/Awesome-LLM/tree/main
Large Language Models(LLM) have taken the NLP community AI community the Whole World by storm. Here is a curated list of papers about large language models, especially relating to ChatGPT. It also contains frameworks for LLM training, tools to deploy LLM, courses and tutorials about LLM and all publicly available LLM checkpoints and APIs.
Great thoughts about LLM
🔗 https://github.com/Hannibal046/Awesome-LLM/tree/main?tab=readme-ov-file#great-thoughts-about-llm
- Why did all of the public reproduction of GPT-3 fail?
- A Stage Review of Instruction Tuning
- LLM Powered Autonomous Agents
- Why you should work on AI AGENTS!
- Google "We Have No Moat, And Neither Does OpenAI"
- AI competition statement
- Prompt Engineering
- Noam Chomsky: The False Promise of ChatGPT
- Is ChatGPT 175 Billion Parameters? Technical Analysis
- The Next Generation Of Large Language Models
- Large Language Model Training in 2023
- How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources
- Open Pretrained Transformers
- Scaling, emergence, and reasoning in large language models
Miscellaneous
🔗 https://github.com/Hannibal046/Awesome-LLM/tree/main?tab=readme-ov-file#miscellaneous
- Arize-Phoenix - Open-source tool for ML observability that runs in your notebook environment. Monitor and fine tune LLM, CV and Tabular Models.
- Emergent Mind - The latest AI news, curated & explained by GPT-4.
- ShareGPT - Share your wildest ChatGPT conversations with one click.
- Major LLMs + Data Availability
- 500+ Best AI Tools
- Cohere Summarize Beta - Introducing Cohere Summarize Beta: A New Endpoint for Text Summarization
- chatgpt-wrapper - ChatGPT Wrapper is an open-source unofficial Python API and CLI that lets you interact with ChatGPT.
- Open-evals - A framework extend openai's Evals for different language model.
- Cursor - Write, edit, and chat about your code with a powerful AI.
- AutoGPT - an experimental open-source application showcasing the capabilities of the GPT-4 language model.
- OpenAGI - When LLM Meets Domain Experts.
- EasyEdit - An easy-to-use framework to edit large language models.
- chatgpt-shroud - A Chrome extension for OpenAI's ChatGPT, enhancing user privacy by enabling easy hiding and unhiding of chat history. Ideal for privacy during screen shares.
https://github.com/Shubhamsaboo/awesome-llm-apps A curated collection of Awesome LLM apps built with RAG, AI Agents, Multi-agent Teams, MCP, Voice Agents, and more. This repository features LLM apps that use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen or Llama that you can run locally on your computer.
🤔 https://transformer-circuits.pub/2025/attribution-graphs/biology.html On the Biology of a Large Language Model | Anthropic We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology.
[!links] ↗ LLM (Large Language Model)
https://csdiy.wiki/%E6%B7%B1%E5%BA%A6%E7%94%9F%E6%88%90%E6%A8%A1%E5%9E%8B/roadmap/ 近几年大语言模型成为大热的方向,也和笔者博士期间的课题非常相关。这篇路线图旨在分享笔者在熟悉和深入深度生成模型这一领域过程中学习和参考的各类课程资料,方便相关领域的从业者或者对生成模型的底层原理感兴趣的朋友共同学习。由于笔者科研之余时间有限,很多课程的实验并未完成,等后续有时间完成之后会在该目录下一一添加。 其实,大语言模型只是深度生成模型的一个分支,而其他生成模型例如 VAE,GAN,Diffusion Model,Flow 等等,都还在“生成”这一领域占有重要地位,所谓的 AIGC,就是泛指这一类技术。 推荐学习下列课程:
- MIT 6.S184: Generative AI with Stochastic Differential Equations: MIT IAP 小学期的 GenAI 入门课程,主要通过微分方程的视角讲解了 Flow Matching 和 Diffusion Model 背后的数学原理,并且配有简单的小实验让学生在实践中理解,适合对底层数学原理感兴趣的同学入门。
- MIT 6.S978: Deep Generative Models: MIT 新晋明星教授何恺明亲授,涵盖了各种生成模型的基础理论和相关前沿论文,几次作业都有丰富的脚手架代码,难度不高但能加深理解,能对这个领域有个快速全貌了解。
- UCB CS294-158-SP24: Deep Unsupervised Learning: 强化学习领域的顶级巨佬 Pieter Abbeel 主讲,相比 MIT 的课程内容更加丰富全面,并且有配套课程视频和 Slides。此外课后作业只有测试代码,需要学生自主编写模型架构定义和训练代码,虽然硬核但很适合有志于炼丹的同学练手。众所周知,深度学习理论实践中存在着很多经验技巧,魔鬼往往存在于细节里。没有什么比自己上手训一个模型更能掌握这些细节了。
- CMU 10423: Generative AI: CMU 的 GenAI 课程,相比前两门课更侧重于大语言模型一些,其他内容和前两门课重合较多。不过课程作业都挺有意思,推荐闲暇时间练练手。
- https://www.cs.cmu.edu/~mgormley/courses/10423/ OpenAI 的 GPT 系列让大语言模型在 Scaling Law 的指引下展现出惊人的效果,在数学和代码领域取得了很大进展。如果你主要关注大语言模型这个方向,那么推荐如下课程:
- Stanford CS336: Language Modeling from Scratch: 正如课程标题写的,在这门课程中你将从头编写大语言模型的所有核心组件,例如 Tokenizer,模型架构,训练优化器,底层算子,训练数据清洗,后训练算法等等。每次作业的 handout 都有四五十页 pdf,相当硬核。如果你想充分吃透大语言模型的所有底层细节,那么非常推荐学习这门课程。
- CMU 11868: Large Language Model Systems: CMU 的大语言模型系统课程,侧重底层系统优化,例如 GPU 加速,分布式训练和推理,以及各种前沿技术。非常适合从事系统领域的同学对这个方向有个全貌性的了解。课表里还包含了一篇我发表的 PD 分离相关的文章,因此私心推荐一下。课程作业的话会让你先实现一个迷你 Pytorch,然后在上面实现各种大语言模型的系统级优化。
- CMU 11667: Large Language Models: Methods and Applications 和 CMU 11711: Advanced NLP: 和前两门课相比,这两门课更偏重上层算法和应用,而且每节课都列举了很多相关阅读材料,适合对大语言模型发展前沿的各个方向都有个粗糙的认识,如果对某个子领域感兴趣的话再寻着参考资料深入学习。
CSE234: Data Systems for Machine Learning 本课程专注于设计一个全面的大语言模型(LLM)系统课程,作为设计高效LLM系统的入门介绍。 课程可以更准确地分为三个部分(外加若干 guest lecture): Part 1. 基础:现代深度学习与计算表示
- Modern DL 与计算图(computational graph / framework 基础)
- Autodiff 与 ML system 架构概览
- Tensor format、MatMul 深入与硬件加速器(accelerators) Part 2. 系统与性能优化:从 GPU Kernel 到编译与内存
- GPUs & CUDA(含基本性能模型)
- GPU MatMul 与算子编译(operator compilation)
- Triton 编程、图优化与编译(graph optimization & compilation)
- Memory(含训练/推理中的内存问题与技巧)
- Quantization(量化方法与系统落地) Part 3. LLM系统:训练与推理
- 并行策略:模型并行、collective communication、intra-/inter-op、自动并行化
- LLM 基础:Transformer、Attention、MoE
- LLM 训练优化:FlashAttention 等
- LLM 推理:continuous batching、paged attention、disaggregated prefill/decoding
- Scaling law (Guest lectures:ML compiler、LLM pretraining/open science、fast inference、tool use & agents 等,作为补充与扩展。) CSE234的最大特点在于非常专注于以LLM (LLM System)为核心应用场景,强调真实系统设计中的取舍与工程约束,而非停留在算法或 API 使用层面。课程作业通常需要直接面对性能瓶颈(如内存带宽、通信开销、kernel fusion 等),并通过 Triton 或系统级优化手段加以解决,对理解“为什么某些 LLM 系统设计是现在这个样子”非常有帮助。学习体验整体偏硬核,前期对系统与并行计算背景要求较高,自学时建议提前补齐 CUDA/并行编程与基础系统知识,否则在后半部分(尤其是 LLM 优化与推理相关内容)会明显感到陡峭的学习曲线。但一旦跟上节奏,这门课对从事 LLM Infra / ML Systems / AI Compiler 方向的同学具有很强的长期价值。
https://github.com/PKU-DAIR/Starter-Guide 本仓库为PKU-DAIR团队为相关领域的新人提供全面的开源文档和技术指南。通过汇集团队的核心论文和经验分享,将帮助初学者快速熟悉数据管理(Data Management, DM) 和 人工智能(Artificial Intelligence, AI) 等前沿领域,搭建坚实的技术基础。无论你是刚入门还是希望加深理解,仓库中的资源将为你的学习和研究之旅提供有力支持。
- AI系统方向 🔗
- AutoML方向 🔗
- Database方向 🔗
- AI Agent方向 🔗
- Data-Centric ML方向 🔗
- 扩散模型方向 🔗
- AI for Science方向 🔗
- Graph方向 🔗
https://github.com/Hannibal046/Awesome-LLM/tree/main?tab=readme-ov-file#llm-tutorials-and-courses LLM Tutorials and Courses
- Andrej Karpathy Series - My favorite!
- Umar Jamil Series - high quality and educational videos you don't want to miss.
- Alexander Rush Series - high quality and educational materials you don't want to miss.
- llm-course - Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
- UWaterloo CS 886 - Recent Advances on Foundation Models.
- CS25-Transformers United
- ChatGPT Prompt Engineering
- Princeton: Understanding Large Language Models
- Stanford CS324 - Large Language Models
- State of GPT
- A Visual Guide to Mamba and State Space Models
- Let's build GPT: from scratch, in code, spelled out.
- minbpe - Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
- femtoGPT - Pure Rust implementation of a minimal Generative Pretrained Transformer.
- Neurips2022-Foundational Robustness of Foundation Models
- ICML2022-Welcome to the "Big Model" Era: Techniques and Systems to Train and Serve Bigger Models
- GPT in 60 Lines of NumPy
- LLM‑RL‑Visualized (EN) | LLM‑RL‑Visualized (中文) - 100+ LLM / RL Algorithm Maps📚.
https://github.com/Hannibal046/Awesome-LLM/tree/main?tab=readme-ov-file#llm-books LLM Books
- Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs - it comes with a GitHub repository that showcases a lot of the functionality
- Build a Large Language Model (From Scratch) - A guide to building your own working LLM.
- BUILD GPT: HOW AI WORKS - explains how to code a Generative Pre-trained Transformer, or GPT, from scratch.
- Hands-On Large Language Models: Language Understanding and Generation - Explore the world of Large Language Models with over 275 custom made figures in this illustrated guide!
- The Chinese Book for Large Language Models - An Introductory LLM Textbook Based on A Survey of Large Language Models.
https://diffusion.csail.mit.edu/ Introduction to Flow Matching and Diffusion Models MIT Computer Science Class 6.S184: Generative AI with Stochastic Differential Equations
- Diffusion and flow-based models have become the state of the art for generative AI across a wide range of data modalities, including images, videos, shapes, molecules, music, and more! This course aims to build up the mathematical framework underlying these models from first principles. At the end of the class, students will have built a toy image diffusion model from scratch, and along the way, will have gained hands-on experience with the mathematical toolbox of stochastic differential equations that is useful in many other fields. This course is ideal for students who want to develop a principled understanding of the theory and practice of generative AI.
https://youtu.be/1il-s4mgNdI?si=DxlD_98ITLZsnCIw What does it mean for computers to understand language? | LM1 vcubingx
https://youtu.be/kCc8FmEb1nY?si=Dhj1moY2pHkyiCiT Let's build GPT: from scratch, in code, spelled out. Andrej Karpathy
https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&si=AUDMGwyz7-yL33Xd Neural networks | 3Blue1Brown
- But what is a neural network? | Deep learning chapter 1
- Gradient descent, how neural networks learn | Deep Learning Chapter 2
- Backpropagation, intuitively | Deep Learning Chapter 3
- Backpropagation calculus | Deep Learning Chapter 4
- Large Language Models explained briefly
- Transformers, the tech behind LLMs | Deep Learning Chapter 5
- 【【官方双语】GPT是什么?直观解释Transformer | 深度学习第5章-哔哩哔哩】 https://b23.tv/rcO76mO
- Attention in transformers, step-by-step | Deep Learning Chapter 6
- 【【官方双语】直观解释注意力机制,Transformer的核心 | 【深度学习第6章】-哔哩哔哩】 https://b23.tv/f0udg4P
- How might LLMs store facts | Deep Learning Chapter 7
Lex Fridman
Machine Learning Street Talk
StatQuest with Josh Starmer
Jeremy Howard
Serrano.Academy
Hamel Husain
Jason Liu
Dave Ebbelaar
https://www.alignmentforum.org/
Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2025). Large Language Models: A Survey (arXiv:2402.06196). arXiv. https://doi.org/10.48550/arXiv.2402.06196
🚧 👍 https://github.com/RUCAIBox/LLMSurvey A collection of papers and resources related to Large Language Models. The organization of papers refers to our survey "A Survey of Large Language Models".
- Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2025). A Survey of Large Language Models (arXiv:2303.18223). arXiv. https://doi.org/10.48550/arXiv.2303.18223
👍 📄 https://github.com/RUCAIBox/LLMSurvey (大语言模型综述 | 中国人民大学高瓴人工智能学院) A collection of papers and resources related to Large Language Models. The organization of papers refers to our survey "A Survey of Large Language Models". To facilitate the reading of our (English-verison) survey, we also translate a Chinese version for this survey. We will continue to update the Chinese version.
📄 https://arc.net/folder/D0472A20-9C20-4D3F-B145-D2865C0A9FEE Papers must know to understand the world of deep learning & AIGC
🔗 https://github.com/Hannibal046/Awesome-LLM/tree/main?tab=readme-ov-file#other-papers (2025.01)
If you're interested in the field of LLM, you may find the above list of milestone papers helpful to explore its history and state-of-the-art. However, each direction of LLM offers a unique set of insights and contributions, which are essential to understanding the field as a whole. For a detailed list of papers in various subfields, please refer to the following link:
- Awesome-LLM-hallucination - LLM hallucination paper list.
- awesome-hallucination-detection - List of papers on hallucination detection in LLMs.
- LLMsPracticalGuide - A curated list of practical guide resources of LLMs
- Awesome ChatGPT Prompts - A collection of prompt examples to be used with the ChatGPT model.
- awesome-chatgpt-prompts-zh - A Chinese collection of prompt examples to be used with the ChatGPT model.
- Awesome ChatGPT - Curated list of resources for ChatGPT and GPT-3 from OpenAI.
- Chain-of-Thoughts Papers - A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models.
- Awesome Deliberative Prompting - How to ask LLMs to produce reliable reasoning and make reason-responsive decisions.
- Instruction-Tuning-Papers - A trend starts from
Natrural-Instruction(ACL 2022),FLAN(ICLR 2022) andT0(ICLR 2022). - LLM Reading List - A paper & resource list of large language models.
- Reasoning using Language Models - Collection of papers and resources on Reasoning using Language Models.
- Chain-of-Thought Hub - Measuring LLMs' Reasoning Performance
- Awesome GPT - A curated list of awesome projects and resources related to GPT, ChatGPT, OpenAI, LLM, and more.
- Awesome GPT-3 - a collection of demos and articles about the OpenAI GPT-3 API.
- Awesome LLM Human Preference Datasets - a collection of human preference datasets for LLM instruction tuning, RLHF and evaluation.
- RWKV-howto - possibly useful materials and tutorial for learning RWKV.
- ModelEditingPapers - A paper & resource list on model editing for large language models.
- Awesome LLM Security - A curation of awesome tools, documents and projects about LLM Security.
- Awesome-Align-LLM-Human - A collection of papers and resources about aligning large language models (LLMs) with human.
- Awesome-Code-LLM - An awesome and curated list of best code-LLM for research.
- Awesome-LLM-Compression - Awesome LLM compression research papers and tools.
- Awesome-LLM-Systems - Awesome LLM systems research papers.
- awesome-llm-webapps - A collection of open source, actively maintained web apps for LLM applications.
- awesome-japanese-llm - 日本語LLMまとめ - Overview of Japanese LLMs.
- Awesome-LLM-Healthcare - The paper list of the review on LLMs in medicine.
- Awesome-LLM-Inference - A curated list of Awesome LLM Inference Paper with codes.
- Awesome-LLM-3D - A curated list of Multi-modal Large Language Model in 3D world, including 3D understanding, reasoning, generation, and embodied agents.
- LLMDatahub - a curated collection of datasets specifically designed for chatbot training, including links, size, language, usage, and a brief description of each dataset
- Awesome-Chinese-LLM - 整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
- LLM4Opt - Applying Large language models (LLMs) for diverse optimization tasks (Opt) is an emerging research area. This is a collection of references and papers of LLM4Opt.
- awesome-language-model-analysis - This paper list focuses on the theoretical or empirical analysis of language models, e.g., the learning dynamics, expressive capacity, interpretability, generalization, and other interesting topics.
🎬 https://youtu.be/OFS90-FX6pg?si=hlsJj4DUWzGrZ_V- The Origin of ChatGPT | Art of the Problem I follow the 35 year journey that led to the explosion of Large Language Models. From Jordan's pioneering work in 1986 to today's GPT-4, this documentary traces how AI learned to talk. Featuring insights from AI pioneers including Chomsky, Hofstadter, Hinton, and LeCun, exploring the revolutionary concepts that made ChatGPT possible: transformer architecture, attention mechanism, next-token prediction, and emergent capabilities. Next video following open ai's o1 model My script, references & visualizations here: https://docs.google.com/document/d/1s7FNPoKPW9y3EhvzNgexJaEG2pP4Fx_rmI4askoKZPA/edit?usp=sharing
🎬 (1hr Talk) Intro to Large Language Models | Andrej Karpathy https://youtu.be/zjkBMFhNj_g?si=G546Rtz9r9hc233z
👍 https://huggingface.co/spaces/Eliahu/Model-Atlas
https://www.anthropic.com/research/estimating-productivity-gains Estimating AI productivity gains from Claude conversations
Large Language Models explained briefly | 3Blue1Brown
📎 https://cameronrwolfe.substack.com/p/understanding-and-using-supervised
- Transformer Architecture: Nearly all modern language models—and many other deep learning models—are based upon this architecture.
- Decoder-only Transformers : This is the specific variant of the transformer architecture that is used by most generative LLMs.
- Brief History of LLMs: LLMs have gone through several phases from the creation of GPT to the release of ChatGPT.
- Next token prediction: this self-supervised training objective underlies nearly all LLM functionality and is used by SFT!
- Language Model Pretraining: language models are pretrained over a massive, unlabeled textual corpus.
- Language Model Inference: language models can be used to generate coherent sequences of text via autoregressive next token prediction.
↗ Natural Language Processing (NLP) /Intro
↗ The Development History of AI ↗ Artificial Neural Networks (ANN) & Deep Learning Methods ↗ Natural Language Processing (NLP) & Computational Linguistics
Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2025). A Survey of Large Language Models (arXiv:2303.18223). arXiv.
https://doi.org/10.48550/arXiv.2303.18223
https://stanford-cs324.github.io/winter2022/lectures/scaling-laws/
[!links] ↗ LLM Foundation Models List & Evaluation and Benchmarks & Leaderboard ↗ Transformers
https://poloclub.github.io/transformer-explainer/
🔗 https://stanford-cs324.github.io/winter2022/lectures/modeling/#model-architecture
- Tokenization
- Attention
- Probability
↗ RWKV (Receptance Weighted Key Value) ↗ Mamba
↗ LLM Training, Utilization, and Evaluation
- ↗ Pre-Training (In-Weight Learning)
- ↗ LLM Adaptation & Alignment Tuning
- ↗ LLM Utilization & Prompt, Context, and Harness Engineering (In-Context Learning)
↗ Reinforcement Learning (RL) & Sequential Decision Making
A reasoning model, also known as reasoning language models (RLMs) or large reasoning models (LRMs), is a type of large language model (LLM) that has been specifically trained to solve complex tasks requiring multiple steps of logical reasoning. These models demonstrate superior performance on logic, mathematics, and programming tasks compared to standard LLMs. They possess the ability to revisit and revise earlier reasoning steps and utilize additional computation during inference as a method to scale performance, complementing traditional scaling approaches based on training data size, model parameters, and training compute.
Unlike traditional language models that generate responses immediately, reasoning models allocate additional compute, or thinking, time before producing an answer to solve multi-step problems. OpenAI introduced this terminology in September 2024 when it released the o1 series, describing the models as designed to "spend more time thinking" before responding. The company framed o1 as a reset in model naming that targets complex tasks in science, coding, and mathematics, and it contrasted o1's performance with GPT-4o on benchmarks such as AIME and Codeforces. Independent reporting the same week summarized the launch and highlighted OpenAI's claim that o1 automates chain-of-thought style reasoning to achieve large gains on difficult exams.
In operation, reasoning models generate internal chains of intermediate steps, then select and refine a final answer. OpenAI reported that o1's accuracy improves as the model is given more reinforcement learning during training and more test-time compute at inference. The company initially chose to hide raw chains and instead return a model-written summary, stating that it "decided not to show" the underlying thoughts so researchers could monitor them without exposing unaligned content to end users. Commercial deployments document separate "reasoning tokens" that meter hidden thinking and a control for "reasoning effort" that tunes how much compute the model uses. These features make the models slower than ordinary chat systems while enabling stronger performance on difficult problems.
↗ LLM Infrastructure & Deployment ↗ AI (Data) Infrastructure & Techniques Stack
↗ LLM Applications & LLM-Driven Automation
↗ LLM Agents, AI Workflow, & Agentic MLLM ↗ AI Agent Assistants (General Purpose) & LLM OS
↗ AI4X, AGI (Artificial General Intelligence) & AIGC
[!links] ↗ LLM Foundation Models List & Evaluation and Benchmarks & Leaderboard
Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2025). A Survey of Large Language Models (arXiv:2303.18223). arXiv.
https://doi.org/10.48550/arXiv.2303.18223
https://github.com/Hannibal046/Awesome-LLM/tree/main?tab=readme-ov-file#milestone-papers (2025.01)
Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2025). A Survey of Large Language Models (arXiv:2303.18223). arXiv. https://doi.org/10.48550/arXiv.2303.18223
↗ DeepSeek
Prompt Injection 是一种攻击技术,黑客或恶意攻击者操纵 AI 模型的输入值,以诱导模型返回非预期的结果。这里提到的属于是SSTI服务端模板注入。
这允许攻击者利用模型的安全性来泄露用户数据或扭曲模型的训练结果。在某些模型中,很多情况下输入提示的数据会直接暴露或对输出有很大影响。
QuantumBlack AI by McKinsey: "The next innovation revolution - powered by AI" Gruber & Tal: The Market Opportunity Navigator , PDF worksheet
