The definitive curated list of AI models, agent frameworks, tools, protocols, and resources for 2026 — the year agents went mainstream and AI became infrastructure.
Covering foundation models, multimodal AI, agent protocols (MCP/A2A), coding agents, computer use, generative AI, and more.
Entries may carry one or more status tags so readers can judge maturity at a glance:
- 🆕 New — Added in the last 60 days, still settling.
- 📦 Archived — Repository archived by its owner; preserved for historical reference, no further updates expected.
- 💤 Stale — No commits in 6+ months; project may still work but is no longer actively maintained.
⚠️ Unverified — Recent submission with limited independent traction (low stars / no third-party adoption / sole-maintainer / submitted to many awesome lists in parallel). Listed for completeness, not endorsed — vet before using.- 🇨🇳 Chinese ecosystem — Project from a mainland-China team or primarily targeting the Chinese market.
- 🔥 Hot — GitHub stars grew >20% in the last 30 days; community momentum.
- ⚡ Updated — Received a notable release or major feature in the last 14 days.
- 🧪 Experimental — Promising but not production-ready; use for R&D only.
- 💰 Freemium — Core functionality free; paid tiers for scale/advanced features.
- 🔐 Audited — Has undergone independent security audit or formal verification.
- 🇨🇳 China-first — Optimized for Chinese language, regulation, or infra stack.
Foundation Models · Multimodal AI · Protocols · Frameworks · IDEs & Builders · Memory · Tools · Sandboxing · Security · RAG · Coding · Physical AI · Simulation · Benchmarks · Computer Use · Browser & Web · Voice · Personal · Mobile · Enterprise · Evaluation · Research Tools · Learning · Chinese Ecosystem · Compare · Notable 2026 · Timeline
New to AI agents? Follow this path:
- 📖 Understand — what an agent actually is vs. a chatbot
- 🗺️ Find your scenario → Scenario Guide
- 🧩 Copy a proven setup → Stack Recipes
- 🔍 Pick the right tool → Compare Tables
⚠️ Avoid common mistakes → Anti-PicksAlready building? Jump to:
- 🆕 Latest additions (May 2026) • 🛡️ Security • 💰 Cost comparison
| Category | Description | Count |
|---|---|---|
| 🧠 Foundation Models | Latest LLMs from OpenAI, Anthropic, Google, Meta, and 22+ providers | 80+ |
| 🎨 Multimodal & Generative AI | Image, video, audio, and music generation | 20+ |
| 🔗 Agent Protocols | MCP, A2A, and interoperability standards | 10+ |
| 🏗️ Agent Frameworks | Libraries for building autonomous AI agents | 23+ |
| 🛠️ Agent IDEs & Visual Builders | Visual / low-code environments for designing agent flows | 8+ |
| 🧠 Agent Memory | Persistent memory and context management | 10+ |
| 🔌 Tool & API Integration | Connecting agents to external services | 18+ |
| 🧪 Sandboxing & Compute Isolation | Secure runtimes for agent-generated code | 7+ |
| 🛡️ Agent Security | Prompt injection defense and guardrails | 16+ |
| 🔍 RAG & Knowledge | Retrieval-augmented generation systems | 12+ |
| 💻 Coding Agents | AI-powered software engineering | 27+ |
| 🤖 Physical AI | Humanoid robots, embodied AI, industrial automation | 22+ |
| 🎮 Simulation & World Models | Sim environments for training and stress-testing agents | 7+ |
| 📊 Benchmarks | Leaderboards tracking frontier capability | 11+ |
| 🖥️ Computer Use | Desktop automation and OS-level control | 10+ |
| 🌐 Browser & Web Agents | Agents that drive real browsers | 9+ |
| 🗣️ Voice & Multimodal Agents | Voice-enabled conversational AI | 10+ |
| 📱 Personal AI Agents | Productivity and daily life assistants | 11+ |
| 📱 Mobile Agents | Phone-control agents (Android / iOS) | 6+ |
| 🏢 Enterprise Platforms | Enterprise-grade agent deployment | 18+ |
| 📊 Evaluation & Observability | Testing, monitoring, and benchmarking | 17+ |
| 🔬 AI Research Tools | Tools for AI/ML research and experimentation | 10+ |
| 📚 Learning Resources | Papers, courses, and tutorials | 20+ |
| 🇨🇳 Chinese AI Ecosystem | Major projects from China-based teams | 18+ |
| 📝 Compare | Side-by-side comparison tables | — |
| 🗺️ Scenario Guide | 56 curated scenario-to-tool mappings | 56 |
| 📋 Stack Recipes | Curated multi-tool combinations | 8 |
| What NOT to use and why | 15 |
- 🧠 Foundation Models 2026
- 🎨 Multimodal & Generative AI
- 🔗 Agent Protocols & Standards
- 🏗️ Agent Frameworks
- 🛠️ Agent IDEs & Visual Builders
- 🧠 Agent Memory
- 🔌 Tool & API Integration
- 🧪 Agent Sandboxing & Compute Isolation
- 🛡️ Agent Security
- 🔍 RAG & Knowledge
- 💻 Coding Agents
- 🤖 Physical AI & Embodied Agents
- 🎮 Agent Simulation & World Models
- 📊 Benchmarks & Leaderboards
- 🖥️ Computer Use & Desktop Agents
- 🌐 Browser & Web Agents
- 🗣️ Voice & Multimodal Agents
- 📱 Personal AI Agents
- 📱 Mobile Agents
- 🏢 Enterprise Agent Platforms
- 📊 Agent Evaluation & Observability
- 🔬 AI Research Tools
- 📚 Learning Resources
- 🇨🇳 Chinese AI Ecosystem
- 📝 Compare — Side-by-Side Tables
- 🗺️ Scenario Guide — What Should I Use For…
- 📋 Stack Recipes — Curated Tool Combinations
⚠️ Anti-Picks — What NOT to Use For…- 🌟 Notable Agent Projects of 2026
- 📅 2026 AI Timeline
The latest large language models powering the AI ecosystem, organized by company. 60+ models from 20+ providers.
- GPT-5.5 - 🆕 Released April 23, 2026 (codename "Spud"). OpenAI's new frontier model for agentic tasks: coding, online research, data analysis, autonomous tool navigation. Significant gains in reasoning, consistency, and long-horizon task handling. Available on ChatGPT Plus / Pro / Business / Enterprise.
- GPT-5.5 Pro - 🆕 April 23, 2026. Parallel test-time compute variant for higher-accuracy cognitive tasks. Pro / Business / Enterprise tiers.
- GPT-5.5-Cyber - 🆕 April 30, 2026. Cybersecurity-specialized variant of GPT-5.5, rolled out via OpenAI's Trusted Access for Cyber (TAC) program to vetted defenders, government, critical infrastructure operators, and security vendors. Not available to the general public.
- OpenAI Daybreak - 🆕 May 12, 2026. Cyber-defense platform bundling GPT-5.5 + GPT-5.5-Cyber + Trusted-Access-for-Cyber for AI-powered vulnerability detection and patch validation; preview access extended to EU governments and security vendors.
- GPT-5.5 Instant - 🆕 May 5, 2026. New ChatGPT default model. Efficiency-first upgrade with ~50% lower hallucination rate on high-stakes prompts; available on free tier.
- GPT-Realtime-2 - 🆕 May 8, 2026. GPT-5-class reasoning brought to the Realtime API, 128K context, parallel tool calls with audio feedback, adjustable reasoning effort.
- GPT-Realtime-Translate - 🆕 May 8, 2026. Live speech-to-speech translation across 70+ input languages and 13 output languages.
- GPT-Realtime-Whisper - 🆕 May 8, 2026. Streaming low-latency speech-to-text companion to GPT-Realtime-2.
- OpenAI Deployment Company (DeployCo) - 🆕 May 11, 2026. New OpenAI-majority-owned services entity for enterprise AI rollout. Backed by $4B+ from TPG / Advent / Bain Capital / Brookfield / Goldman Sachs / SoftBank and consulting partners Bain & Company, Capgemini, McKinsey. Built around Forward Deployed Engineers; absorbs the Tomoro AI consulting acquisition (~150 engineers).
- Codex on Mobile - 🆕 May 14, 2026. ChatGPT iOS/Android can now remote-control the Codex desktop app — review outputs, approve actions, switch models, and kick off new tasks from the phone while the live session runs on Mac (Windows next). Rolling out as preview to Free, Plus and Go users.
- OpenAI ↔ Malta partnership - 🆕 May 16, 2026. First country-wide deal: every Maltese citizen / resident aged 14+ gets a free 1-year ChatGPT Plus subscription after completing a 2-hour AI literacy course built by the University of Malta. Part of the "OpenAI for Countries" initiative; phased rollout starting May 2026.
- OpenAI ↔ Dell Codex partnership - 🆕 May 18, 2026. Brings Codex to hybrid and on-premises enterprise environments via Dell Technologies infrastructure — first major Codex distribution channel outside the public cloud, targeted at regulated industries needing data-residency control.
- ChatGPT Safety Updates — sensitive-conversation tracking - 🆕 May 18, 2026. ChatGPT's safety systems updated to detect and track subtle escalation cues across long sessions for acute risks (suicide / self-harm / harm to others), with cross-session state retention.
- OpenAI Guaranteed Capacity (Compute Annual Pass) - 🆕 May 19, 2026. Long-term compute reservation product for enterprise AI products / agents / workflows. 1, 2, or 3-year terms; longer terms unlock larger discounts. OpenAI's structural response to the Anthropic "Priority Tier" model.
- OpenAI ↔ Google SynthID + C2PA content provenance - 🆕 May 19, 2026. OpenAI partners with Google to add durable cross-platform SynthID watermarking to ChatGPT/Sora images, joins C2PA, and previews a public "is-this-image-from-OpenAI" verifier. First major frontier-lab interop on watermarking.
- GPT-5.4 - Released March 2026. Frontier model with 1M-token context, advanced coding, computer use, tool search. BenchLM 94, SWE-bench Verified 77.2%, OSWorld 75% (beats human).
- GPT-5.4 Pro - Higher-accuracy variant of GPT-5.4. BenchLM 92.
- GPT-5.3 - Early 2026. Includes GPT-5.3 Instant (conversations) and GPT-5.3-Codex (coding).
- GPT-5.2 - Released Dec 2025. State-of-the-art reasoning, long-context understanding, and vision.
- GPT-5 - Launched August 2025. The default model in ChatGPT, replacing GPT-4o. Multimodal with variants: gpt-5, gpt-5-mini, gpt-5-nano.
- GPT-4o - Omni model with native text, vision, and audio. Retired from ChatGPT Feb 2026 but still available via API.
- o3 / o4-mini - Reasoning models with chain-of-thought for complex problem solving. Released April 2025.
- Codex CLI - Open-source terminal-based coding agent powered by OpenAI models.
- Claude Opus 4.7 - 🆕 Released April 16, 2026. Advanced software engineering (SWE-bench Verified 87.6%), enhanced vision, proactive code verification. Supports
/think xhighreasoning effort. 1M-token context. - Claude Opus 4.6 - Released Feb 2026. 1M-token context, 14.5-hour task horizon. Leads Arena chat leaderboard.
- Claude Sonnet 4.6 - Released Feb 2026. Frontier coding and agentic performance, 1M token context window.
- Claude Mythos Preview - 🆕 April 2026 gated research preview. BenchLM 99 (top of leaderboard), SWE-bench Verified 93.9%. Limited to Project Glasswing partners.
- Claude Opus 4 - Released May 2025. Advanced reasoning and complex task execution.
- Claude Sonnet 4 - Released May 2025. Balanced performance and cost for a wide range of tasks.
- Claude Code - Agentic coding tool operating directly in your terminal. Powered by Opus 4.7 with
/think xhighsupport. - Claude Security - 🆕 May 1, 2026. Public beta. Enterprise security tool powered by Opus 4.7 — scans entire codebases for vulnerabilities and generates targeted patches with confidence rating, severity, reproduction steps, and recommended fixes. Available to Enterprise customers via claude.ai/security.
- Claude Finance Agents - 🆕 May 5, 2026. Ten Opus-4.7-powered specialised agents for pitchbook authoring, KYC, month-end close, deal screening, etc. Deployable as Claude Cowork plugins, Claude Code skills, or Managed-Agents cookbooks.
- Claude Finance JV - 🆕 May 4, 2026. $1.5B Claude deployment joint venture with Goldman Sachs and Blackstone embedding Anthropic engineers in mid-market Wall Street firms.
- Claude Add-ins / Dreaming / Outcomes / Multi-agent orchestration - 🆕 May 8, 2026 (Code with Claude 2026). Anthropic introduces Add-ins, scheduled memory review between sessions ("Dreaming"), rubric-driven "Outcomes", and a lead-agent + sub-agent orchestration model with shared filesystem and auditable trace.
- Anthropic ↔ SpaceX Colossus 1 - 🆕 May 6, 2026. Anthropic takes all available capacity at SpaceX's Colossus 1 Memphis datacenter (>220K NVIDIA H100/H200/GB200 GPUs, 300+ MW) for Claude Opus inference. Doubles Claude Code 5-hour rate limits on Pro/Max/Team/Enterprise; also lifts peak-hour limits.
- Claude for Legal - 🆕 May 12, 2026. New legal stack on top of Claude Cowork: 20+ MCP connectors (iManage, NetDocuments, DocuSign, Ironclad, LexisNexis, Westlaw, Harvey, Everlaw, Relativity, CourtListener…) + 12 practice-area plugins (commercial, employment, privacy, product, corporate, AI governance, litigation associate, law-student bar-exam). Microsoft Word / Outlook / Excel / PowerPoint orchestration built in.
- Claude for Small Business - 🆕 May 13, 2026. Small-business toggle inside Claude Cowork — 15 pre-built agentic workflows across finance / ops / sales / marketing / HR / customer service, native connectors for QuickBooks, PayPal, HubSpot, Canva, DocuSign, Google Workspace, Microsoft 365. Bundled with a free PayPal-backed "AI Fluency for Small Business" course and a 10-city US workshop tour kicking off in Chicago.
- Anthropic ↔ Gates Foundation $200M - 🆕 May 14, 2026. 4-year, $200M partnership pairing grants + Claude usage credits + Anthropic engineers on global-health, life-sciences, education, and agriculture programs. All tools produced under the program will be freely available; first focus areas include vaccine R&D for polio / HPV / preeclampsia and agriculture-specific Claude extensions.
- Anthropic ↔ PwC strategic alliance expansion - 🆕 May 14, 2026. PwC commits to global rollout of Claude Code + Claude Cowork, certifies 30,000 PwC professionals, and stands up a joint "Agentic Enterprise" Center of Excellence — focused on agentic build, AI-native deals, and finance / supply-chain / HR reinvention.
- Anthropic ↔ Financial Stability Board briefing (Claude Mythos) - 🆕 May 18, 2026. Anthropic briefs the global FSB on Claude Mythos cyber-flaw discovery capabilities — first time a frontier lab briefs a G20-level financial-stability regulator on a frontier model's offensive-security implications.
- Code with Claude 2026 sessions on YouTube - 🆕 May 18, 2026 (sessions published). Full developer-conference recordings (May 6 event) go public: Claude Code roadmap, Claude Developer Platform updates, Managed Agents dreaming + multi-agent orchestration, and partner deployments.
- Widening the conversation on frontier AI - 🆕 May 19, 2026. Anthropic publishes its framework for engaging diverse traditions (religious, philosophical, indigenous) in frontier-AI safety dialogue. Companion to ongoing public-engagement work.
- Bristol Myers Squibb ↔ Anthropic Claude Enterprise - 🆕 May 20, 2026. BMS adopts Claude Enterprise as its shared intelligence platform for 30,000+ employees globally, embedding agentic Claude into drug-discovery / development / delivery workflows. First top-5 pharma enterprise-wide Claude deployment.
- Claude Opus 4.8 - 🆕 May 28, 2026. Major Opus refresh: codebase-scale migrations, sharper agentic judgment, dynamic workflows research preview with hundreds of parallel sub-agents in a single session, manual effort-control panel, 3× cheaper Fast mode at the same $5 / $25 per million in/out. Available on Anthropic native + Amazon Bedrock + AWS Claude Platform + Google Cloud + Microsoft Foundry. Teases an upcoming Mythos-class model series for limited orgs.
- Gemini 3.1 Pro - Released Feb 2026. BenchLM 94, GPQA Diamond 94.3% (world-record), ARC AGI2 77.1%. Most capable Google model,
$2/1M tokensflagship. - Gemini 3.1 Flash Live - 🆕 April 2026. Real-time multimodal streaming for voice assistants and interactive agents. Low latency, long context.
- Gemini 3.1 Flash-Lite (GA) - 🆕 May 8, 2026. Generally available on Gemini API / AI Studio / Vertex AI. Fastest and most cost-efficient model in the Gemini 3 family — built for low-latency code completion, real-time UX, and agentic developer tools; matches Gemini 2.5 Flash quality at significantly lower cost.
- Gemini 3.5 Flash - 🆕 May 19, 2026 — Google I/O 2026. Default model powering the Gemini app and Google Search AI Mode. Marketed as ~4× faster than other frontier models in output tokens/sec while outperforming Gemini 3.1 Pro on key benchmarks. Gemini 3.5 Pro slated for June 2026.
- Gemini Omni / Omni Flash - 🆕 May 19, 2026 — Google I/O 2026. New Google DeepMind multimodal world-model family aimed at AGI. Omni Flash, the first shipped variant, can take any input modality and generate any output (starting with video; image and text generation following). Direct lineage to Gemini Robotics / Genie line of work.
- Gemini Omni Flash — voice-controlled video editing rollout - 🆕 May 28, 2026. Omni Flash starts rolling out to consumers via the Gemini app, Google Flow, and YouTube Shorts as the editing engine — conversational cinematic zooms / background swaps / weather edits driven by text, voice, image, or audio prompts; no traditional NLE required.
- Gemini Spark (24/7 personal AI agent) - 🆕 May 19, 2026 — Google I/O 2026. Cloud-resident personal AI agent that runs 24/7 on user intent, integrates Gmail / Chat first, then ~30+ third-party tools via MCP (Adobe / Dropbox / Uber). Available to Google AI Ultra subscribers in the US within the I/O week.
- Google AI Ultra ($100/month tier) - 🆕 May 19, 2026 — Google I/O 2026. New top consumer subscription targeted at developers / creators / power users. Gates Gemini Spark, highest Gemini 3.5 quotas, and the upcoming Gemini 3.5 Pro.
- Gemini 3.1 Flash / Flash Lite - Fast, cost-efficient models for high-throughput applications.
- Gemini 4 (Open) - 🆕 Released April 2026. Open model family: 2B / 4B / 26B / 31B variants. Strong science reasoning and document understanding, local deployment ready.
- Gemini 2.5 Pro / Flash - GA June 2025. Thinking model with 1M context.
- Gemma 4 31B - 🆕 April 2026. GPQA Diamond 84.3%. Strong open-weight alternative for on-device reasoning.
- Gemma 3 - Previous open model family for on-device and research use.
- Gemini Robotics ER-1.6 - 🆕 April 14, 2026. Upgraded robotics AI with improved spatial and physical reasoning. Partnership with Agile Robotics for real-world deployment.
- Muse Spark - 🆕 April 9, 2026. First model from Meta Superintelligence Labs (MSL). Natively multimodal reasoning model powering Meta AI app, smart glasses, and features across Facebook / Instagram / WhatsApp / Messenger.
- Llama 4 Scout - 109B total params (17B active), MoE with 16 experts, 10M token context window, multimodal. Runs on single H100.
- Llama 4 Maverick - 400B total params (17B active), 128 experts, 1M context. Outperforms GPT-4o on multimodal benchmarks.
- Llama 4 Behemoth - 2T parameters (288B active). In training — Meta's frontier model rivaling top closed-source models.
- Llama 3.3 70B - Strong instruction following and reasoning, open-weight under Llama Community License.
- Sakana RL Conductor - 🆕 Paper April 27, 2026; Fugu beta late-April / early-May 2026. 7B RL-trained orchestrator (built on Qwen2.5-7B) that routes subtasks between GPT-5, Claude Sonnet 4, Gemini 2.5 Pro, etc. SOTA on LiveCodeBench (83.9%) and GPQA-Diamond (87.5%) at ~1.8K tokens/query — roughly 6× cheaper than other multi-agent ensembles.
- Sakana Fugu - 🆕 Beta April 24-25, 2026. Commercial multi-agent orchestration service productising the RL Conductor research. OpenAI-compatible API with two tiers: Fugu Mini (low-latency) and Fugu Ultra (max performance); strong reported results on SWE-Pro, GPQA-D and ALE-Bench.
- ZAYA1-8B - 🆕 May 6, 2026. MoE reasoning model (<1B active) trained end-to-end on AMD Instinct MI300X clusters. Apache 2.0 weights on Hugging Face + serverless endpoint on Zyphra Cloud; aimed at math, code, and dense reasoning per active parameter.
- ZAYA1-8B-Diffusion-Preview - 🆕 May 14, 2026. First MoE diffusion language model converted from an autoregressive LLM and the first diffusion LM trained on AMD GPUs. Generates 16 tokens per step, achieving up to 7.7× inference speedup vs the autoregressive base. Built with Zyphra's TiDAR recipe + CCA attention.
- Mistral Large 3 - 675B total / 41B active parameters, MoE, 256K context. Flagship open-weight multimodal model. Released Dec 2025.
- Mistral Medium 3.1 - Frontier-class dense model for enterprise. Multimodal, 128K context, 80+ coding languages. Released Aug 2025.
- Mistral Small 4 - 🆕 Released March 2026. 119B total / 6B active. Hybrid model combining reasoning, multimodal, and coding strengths.
- Magistral 1.2 - 🆕 2026 reasoning family challenging o3/o4-mini. Transparent and multilingual reasoning.
- Devstral 2 - 🆕 2026 agentic coding model. Best open-source model for coding agents.
- Codestral - 22B code generation model, 80+ programming languages, 32K context. Released May 2024.
- Pixtral Large - 124B multimodal model with 1B vision encoder, 128K context, processes 30+ high-res images.
- Ministral 3B/8B/14B - Compact models optimized for edge deployment and efficiency.
- Mistral Forge - 🆕 March 2026 platform for training custom LLMs on proprietary data.
- Mistral Medium 3.5 - 🆕 April 29, 2026. Dense 128B open-weight model, 256K context, Modified MIT license. Unifies instruction-following, reasoning, and coding.
- Voxtral TTS - 🆕 March 26, 2026. 4B-parameter open-weight TTS built on Ministral 3B; multilingual, optimised for voice agents.
- DeepSeek Agent Harness team - 🆕 May 19, 2026. DeepSeek hires a former Jane Street engineer to lead a new "AI harness" team building the deterministic scaffolding that turns DeepSeek V4 into autonomous, revenue-generating agents — first major signal DeepSeek is moving past raw-model R&D into agentic productisation.
- DeepSeek-V4-Pro - 🆕 April 24, 2026. 1.6T total / 49B active MoE, 1M-token context. MIT license. Leadership in agent capabilities, world knowledge, reasoning. Tops open-source benchmarks.
- DeepSeek-V4-Flash - 🆕 April 24, 2026. 284B total / 13B active MoE, 1M context. MIT. Cost-efficient tier.
- DeepSeek-V3.2 - Released Dec 2025. Advanced MoE architecture with 671B total parameters. V3.2 Speciale variant for enhanced reasoning.
- DeepSeek-R2 - 2026 advanced reasoning model. Successor to R1, competes with GPT-5 and Gemini 3 Pro.
- DeepSeek-R1 - Reasoning-focused model with chain-of-thought capabilities. Released Jan 2025.
- DeepSeek-Coder-V2 - Code generation model competitive with GPT-4 on coding benchmarks.
- Llama 5 - 🆕 April 8, 2026. 600B+ parameter open-source flagship from Meta Superintelligence Labs; "recursive self-improvement" research line. Marketed as exceeding leading proprietary models on reasoning, coding, autonomous agentic behaviour.
- Meta Muse Spark - 🆕 April 8-9, 2026. First public model from Meta Superintelligence Labs; long-context multimodal foundation.
- Llama 4 Scout / Maverick - 10M-token context (Scout) MoE flagship line shipped April 2025; still the production fallback for many enterprise stacks.
- Qwen3.7-Max - 🆕 May 20, 2026 — Alibaba Cloud Summit Hangzhou. New Qwen flagship purpose-built as the foundation for AI agents: agentic coding, complex reasoning, and long-horizon multi-step missions with sustained decision-making. Released alongside a full-stack AI infrastructure upgrade and new T-Head Zhenwu M890 AI accelerator chip. Worldwide developer/enterprise availability rolling.
- Qwen3.7-Max-Preview / Qwen3.7-Plus-Preview - 🆕 May 18, 2026. Preview ladder before the Hangzhou unveil. Ranked the highest of any Chinese model on LM Arena in both text and vision; sustained 1M-context evaluations.
- Qwen3.6-27B - 🆕 April 22, 2026. Dense 27B multimodal. Open-sourced. Focus: agentic coding + thinking-context preservation.
- Qwen3.6-Max-Preview - 🆕 April 18, 2026. Proprietary frontier preview. High coding/reasoning performance, 1M context window. Top-tier among Chinese models on coding benchmarks.
- Qwen3.6-35B-A3B - 🆕 April 15, 2026. MoE, 35B total / 3B active. Apache 2.0. Stability and real-world utility improvements.
- Qwen3.6-Plus - 🆕 April 2, 2026. Proprietary flagship. High value-per-token general model. Strong long-context, tool-calling, agentic behavior.
- Tianma (天马) AI - 🆕 April 27, 2026 (beta). Alibaba's image-to-video generation model. Strong character consistency and motion quality.
- Qwen3.5 Max Pro - April 2026. High-performance flagship. Enhanced coding and math reasoning, long context.
- Qwen3.5 Omni Plus - April 2026. Proprietary full-modal foundation model unifying text and image input.
- Qwen3-Max-Thinking - Alibaba's strongest thinking model. 1T+ parameters, enhanced agentic capabilities.
- Qwen3.5-Omni - March 2026. Fully omni-modal: language, vision, sound, motion. Speech recognition in 113 languages, 256K context.
- Qwen3-Coder-Next - Feb 2026. Open-weight coding agent model, MoE 80B total / 3B active.
- Qwen3 235B-A22B - MoE with dual-mode reasoning. Strong math, code, and commonsense reasoning.
- Qwen2.5 Coder 32B - Top open-source coding model.
- MiniMax M2.7 - 🆇 🆕 March 2026. Proprietary self-evolving LLM tuned for agent harness construction, memory updates, iterative workflow improvement; major gains on SWE-bench-style tasks.
- MiniMax M2.5 - 🆇 February 2026. 230B-parameter cost-efficient flagship for "real-world productivity".
- Hailuo 02 - 🆇 🆕 March 2026. Native 1080p text/image-to-video with longer training corpus.
- MiniMax Music 2.6 - 🆇 🆕 April 2026. Cover-generation focus with improved low-frequency reproduction; global beta.
- Doubao 2.0 - 🆇 🆕 February 2026. Agent-era upgrade focused on real-world task execution; powers ByteDance's consumer AI apps.
- Seedance 2.0 - 🆇 🆕 February 2026. Multi-modal cinematic video generation, 2K resolution, ~30% faster than Seedance 1.5.
- Step 3.5 Flash - 🆇 🆕 February 2026. ~196B-parameter compact reasoning + agent model; punches above its weight against larger Chinese rivals.
- Baichuan-M3 Plus - 🆇 🆕 January 2026. Evidence-anchored medical LLM with low hallucination rate; free API for Chinese medical institutions.
- Grok 4.3 Beta - 🆕 April 2026. Latest iteration with improved reasoning and coding benchmarks. See
2026.4benchmark snapshot. - Grok 4.3 GA - 🆕 May 2026. Grok 4.3 reached general availability on Microsoft Foundry and OCI Generative AI; xAI's flagship for agentic workloads with improved tool-calling and long-horizon reasoning.
- Grok 4.20 - Feb 2026. Multi-agent system (4 standard + 16 specialized agents in Heavy mode), 2M token context.
- Grok 4 / 4 Heavy - Released July 2025. 3T parameters. xAI's frontier model.
- Grok 3 / 3 Mini - Feb 2025. First reasoning models with "Think Mode".
- Phi-4-reasoning-vision-15B - 🆕 Released March 2026. 15B multimodal model with selective chain-of-thought reasoning. Edge-deployable.
- Phi-4 - 14B parameter SLM with reasoning rivaling much larger models. Open-source under MIT License.
- Phi-4-mini - 3.8B parameter dense model. 128K context. Excels in reasoning, math, coding, and function-calling.
- Phi-4-multimodal - 5.6B parameter. First multimodal Phi model — integrates speech, vision, and text in unified architecture.
- Command A - 🆕 Released April 2026. 111B open-weights model, 256K context. Agentic, multilingual, and coding focused.
- Command R+ - Enterprise RAG model, 128K context, multilingual (10 languages), grounded generation with citations.
- Command R - Cost-efficient model for retrieval-augmented generation and enterprise workloads.
- ERNIE 5.0 - 🆕 Released Jan 2026. 2.4T parameters MoE (activates <3% per query). Native full-modal. #1 Chinese model on LMArena.
- ERNIE 4.5 - Multimodal predecessor released 2025. Strong reasoning and Chinese language capabilities.
- GLM-5.1 - 🆕 April 7, 2026. 744B MoE / 40B active, 200K context. MIT license. Tops SWE-Bench Pro. Trained entirely on Huawei Ascend (no NVIDIA).
- GLM-5 Reasoning - 🆕 April 2026. BenchLM 85 — top open-source score. SWE-Bench Pro surpasses GPT-5.4 and Claude Opus 4.6.
- GLM-5V-Turbo - 🆕 April 2026. Native multimodal agent — vision, video clips, text inputs. Cost-performance balanced.
- GLM-5 - Released Feb 2026. 744B parameters, advanced agentic intelligence. MIT license.
- GLM-4.7 - Released late 2025. Matches Claude Opus 4 on SWE-Bench.
- MiniMax-M2.7 (Open Weights) - 🆕 April 2026. Ultra-long context (1M+ window). Top-tier performance on coding and Agent tasks.
- MiniMax-M1-80k - Open-weight hybrid-attention reasoning model. 456B parameters, 1M token context.
- Hailuo AI (Video) - Text/image-to-video generation with AI avatars, voiceovers, and character consistency.
- Kilo Code Integration - 🆕 MiniMax powers Kilo Code (new AI coding editor). Default model for its code-generation pipeline.
- Kimi K2.6 - 🆕 April 20-21, 2026. 1T MoE / 32B active, 256K context. Enhanced coding, long multi-step execution, agent swarm up to 1,000 collaborating agents. Supports
thinking.keep="all"persistent reasoning. Default in OpenClaw v2026.4.20+. - Kimi K2.5 - Jan-Feb 2026. 1T total / 32B active MoE. Native multimodal, Agent Swarm (up to 100 parallel sub-agents). Open-source.
⚠️ Support ending May 25, 2026 — migrate to K2.6. - Kimi Code - Premium coding tier powered by K2.5/K2.6, terminal-based developer workflows.
- Doubao-Seed-2.0 Pro - 🆕 Released Feb 2026. Frontier reasoning and complex agents. Competes with GPT-5.2 at ~90% lower cost.
- Doubao-Seed-2.0 Lite - 🆕 General production workloads. Balanced performance and efficiency.
- Doubao-Seed-2.0 Code - 🆕 Software development — code generation, debugging, and review.
- BAGEL - 🆕 Open-source multimodal model for text, image, and video understanding and generation.
- Nova 2 Pro - 🆕 Amazon's most intelligent reasoning model. Text, image, video, speech input. Agentic coding and long-range planning.
- Nova 2 Lite - 🆕 Fast, cost-effective reasoning with 1M-token context. Adjustable "thinking effort" controls.
- Nova 2 Sonic - 🆕 Speech-to-speech model for real-time conversational AI. 1M token context, multilingual.
- Nova Act - 🆕 Browser-based AI agent service for web task automation. Powered by Nova 2 Lite.
- Nova Forge - 🆕 "Open training" service for building custom Nova model variants with proprietary data.
- Nemotron 3 Ultra - 🆕 Released March 2026 (GTC). Frontier-level reasoning, 5x throughput efficiency on Blackwell platform.
- Nemotron 3 Super - 🆕 Released March 2026. 120B total / 12B active. 1M context. 5x higher throughput vs predecessor.
- Nemotron 3 Nano - Cost-efficient hybrid Transformer-Mamba MoE. Optimized for targeted agentic tasks.
- Nemotron 3 Nano Omni - 🆕 April 28, 2026. 30B-A3B hybrid MoE (Mamba + Transformer). Natively multimodal: text, image, audio, video, charts, and documents in one model. 9x higher throughput than comparable open omni models. Topped 6 leaderboards (MMlongbench-Doc, OCRBenchV2, WorldSense, DailyOmni, VoiceBench). Open weights on Hugging Face, OpenRouter, Amazon SageMaker JumpStart.
- Hunyuan Hy3 Preview - 🆕 April 23, 2026. 295B total / 21B active MoE, 256K context. Open-sourced on GitHub, Hugging Face, ModelScope, GitCode. Fast-slow thinking fusion architecture, 40% improved inference efficiency. Supports vLLM and SGLang. Integrated in Yuanbao, CodeBuddy, QQ, Tencent Docs. Available on OpenRouter (free preview period).
- Apple Foundation Models (AFM) - On-device (~3B) and server-based models powering Apple Intelligence. Privacy-first, offline capable.
- OpenELM - Open-source efficient language models (270M–3B). Designed for on-device processing on Apple silicon.
- Samsung Gauss 2.3 - 🆕 2026 on-device AI model for Galaxy S26. Includes Gauss 2.3 Think and Gauss O Flash variants. Agentic AI capable.
- Inflection 2.5 / Pi - Empathetic conversational AI model. Known for emotional intelligence and human-centered interactions.
- Yi-Lightning - MoE architecture, 200+ tokens/s on RTX 4090. Strong multilingual (Chinese/English), open-source Apache 2.0. Released Oct 2024.
- ScienceOne 100 / 磐石100 - 🆕 April 28-29, 2026. AI model system for scientific research from CAS. Core "ScienceOne" foundation model with literature compass, innovation evaluation engine, and 2,000+ tool agent factory. Supports math, physics, biology, materials science, astronomy, aerospace, and geosciences. In use across 50+ CAS institutes and 100+ research scenarios.
Tools and models for generating and editing images, videos, audio, and music.
- Midjourney V8.1 - 🆕 April 30, 2026. HD 2K image support, new Raw mode options. V8 (3D model generation) reportedly later in 2026.
- Flux 2 Pro / Flex / Dev / Klein - 🆕 November 2025. Black Forest Labs' next-generation family. SOTA image quality, multi-reference consistency, dramatically improved text rendering.
- Recraft V4 - 🆕 February 17, 2026. Ground-up rebuild; major prompt-accuracy improvements; editable SVG vector output.
- Stable Diffusion 3.5 - Open-source image generation with improved coherence and prompt following.
- Ideogram 3.0 - Excels at text rendering in images; March 2025 release with style references and in-platform canvas editor.
- ChatGPT Images 2.0 - 🆕 April 2026. Free tier. Improved image detail, text understanding, and multi-turn editing for iterative refinement.
- gpt-image-2 - 🆕 OpenAI's latest image generation API. Supports 2K/4K resolution hints. Default in OpenClaw v2026.4.21.
- DALL·E 3 - OpenAI's text-to-image model integrated with ChatGPT for iterative refinement.
- Gemini 3 Pro Image - Google's native image generation within Gemini.
- Nano Banana 2 (Gemini 3 Pro Image) - 🆕 Google's transparent-background-friendly image model exposed via OpenClaw image_generate.
- Kling IMAGE 3.0 - 🆇 🆕 April 23, 2026. Cinema-grade native 4K image generation from Kuaishou.
- Flux - 💤 Stale (last update 2025-07). Black Forest Labs' original open-source repo — superseded by Flux 2 family.
- Seedance 2.0 (image side) - 🆇 🆕 ByteDance's next-gen image/animation generation API; pairs with the video model of the same name.
- Veo 3.1 - 🆕 October 2025. Google DeepMind's flagship video model. Veo 4 rumoured for late-April / late-May 2026.
- Runway Gen-4 - 🆕 Professional video generation and editing with character and style consistency. Now exposes Kling 3.0 / Sora 2 Pro inside the platform (April 2026).
- Kling VIDEO 3.0 - 🆇 🆕 February 4-7, 2026. Kuaishou's new generation; realistic human motion, lip-sync, narrative production with audio sync.
- Sora 2 (via Runway) - 🆕 OpenAI's Sora app shut down 2026-04, but Sora 2 Pro is integrated into Runway as of April 7, 2026.
- Seedance 2.0 - 🆇 🆕 February 2026. ByteDance multi-modal cinematic video generation, 2K resolution, ~30% faster than 1.5.
- Hailuo 02 - 🆇 🆕 March 2026. MiniMax video model now native 1080p with expanded training data.
- Pika 2.0 - 🆕 Creative video generation with scene and effects control.
- LTX Studio - 🆕 AI-powered cinematic video creation platform.
- Tianma (天马) AI - 🆇 🆕 April 27, 2026 (beta). Alibaba's image-to-video model.
- Sora - 📦 Discontinued (April 26, 2026). OpenAI's text-to-video app shut down; Sora 2 Pro lives on inside Runway.
- Runway Agent - 🆕 May 13, 2026. Conversational agent that takes a written brief and ships a complete multi-shot finished video: storyboard → generation → cut → voiceover. Pipes through Gen-4 / Gen-4 Turbo / Aleph editing under the hood; first credible end-to-end "prompt-to-rough-cut" production agent.
- ElevenLabs Eleven v3 + ElevenAgents - 🆕 2026 "audio layer of the internet" — 70+ language TTS with emotional Audio Tags, plus the AIUC-1-certified ElevenAgents voice-agent platform with multimodal messages, conversation topic discovery, and pre-tool speech controls.
- Eleven Music + Scribe v2 Realtime - 🆕 ElevenLabs' music generation and live transcription stack.
- Cartesia Sonic 3 / 3.5 - 🆕 2026. State-space-model TTS hitting ~40-90ms time-to-first-audio; powers the Line Agents voice-agent platform launched April 2026.
- Deepgram Nova-3 + Aura-2 + Flux Multilingual - 🆕 April 2026. Speech-to-text in 45+ languages, sub-200ms TTS, conversational STT with mid-call language switching across 10 languages.
- MiniMax Music 2.6 - 🆇 🆕 April 2026. Cover generation focus with improved low-frequency reproduction.
- Voxtral TTS - 🆕 March 26, 2026. Mistral's open-weight 4B TTS built for voice-agent latency.
- Suno V4 - 🆕 AI music generation from text prompts with high-quality vocals and instruments.
- Udio - 🆕 Text-to-music generation with professional audio quality.
- OpenAI Audio Models - Native audio understanding and generation within GPT-4o, GPT-Realtime-2 (May 8, 2026).
- Stability Audio - Open-source audio and music generation.
- Bark - 💤 Stale (no commits since 2024-08). Open-source text-to-audio model supporting speech, music, and sound effects.
Open standards enabling agent interoperability, tool access, and cross-platform communication.
- CorpusIQ - 🆕 Official MCP Registry — Multi-source business data connector with 25+ integrations (GA4, Google Ads, TikTok, YouTube, Shopify, Stripe, Airtable, Slack, HubSpot, Calendly, Klaviyo, and more). Intelligent query routing, cross-source attribution, unified business intelligence. Live as
io.corpusiq/multi-source-mcp. HTTP transport with Ed25519 signature auth. - MCP Specification - 🆕 The "USB-C for AI" — open protocol by Anthropic for connecting LLMs to tools and data sources. Donated to Agentic AI Foundation (Linux Foundation) in Dec 2025.
- MCP 2026-07 Release Candidate - 🆕 May 2026 (final July 28, 2026). Release candidate for the next major MCP spec revision: stateless protocol core (scalability + simpler servers), an extensions framework, the new MCP Apps capability for server-rendered UI, Tasks graduated to an extension, and hardened authorization aligned with OAuth / OpenID Connect.
- MCP Servers - Official reference implementations of MCP servers for popular services.
- MCP TypeScript SDK - Official TypeScript SDK for building MCP clients and servers.
- MCP Python SDK - Official Python SDK for MCP implementation.
- mcp.so - 🆕 Community directory of MCP servers and tools.
- mcp-gateway - Gateway server for routing and managing MCP connections.
- A2A Protocol - 🆕 Google's open standard for agent-to-agent communication. Enables agents to discover, delegate, and collaborate regardless of framework. Now governed by Linux Foundation with 150+ partner organizations.
- A2A Course (DeepLearning.AI) - 🆕 Free course on building multi-agent systems with A2A.
- OpenAI Agents SDK - 🆕 Major update April 15, 2026: native sandbox execution, first-class MCP integration, sub-agent / handoff patterns, and Codex-style filesystem tools for production-ready multi-agent workflows.
- Agentic AI Foundation - 🆕 Linux Foundation fund co-founded by Anthropic, Block, and OpenAI to govern open agent standards.
- Kuberna Labs -
⚠️ Unverified. Cross-chain intent execution protocol for AI agents. Claims ERC-8004 on-chain identity, zkTLS/TEE attestation, and a typed intent schema enabling agents to autonomously execute transactions across NEAR, Base, and Mantle with verifiable execution proofs. New repo, independent adoption unverified — listed for visibility, evaluate before depending on it.
Frameworks and libraries for building autonomous AI agents.
- Koog 1.0 - 🆕 May 28, 2026 — KotlinConf 2026. JetBrains' open-source agent framework for Kotlin + Java hits a stable 1.0 with a 1-year API stability guarantee. Kotlin Multiplatform deployment (JVM / Android / iOS / JS / WASM), Java interop without wrapper modules, local Android LiteRT, OpenTelemetry across all targets, graph-based workflows, Spring Boot / Ktor integration, and providers for OpenAI / Anthropic / Google / Bedrock. Apache-2.0.
- LangChain - Build context-aware reasoning applications with LLMs.
- LangGraph - Build resilient language agents as graphs with stateful, multi-actor orchestration. v0.3.19 (April 27, 2026) split prebuilt agents into
langgraph-prebuilt(Supervisor, Swarm, LangMem, Trustcall). v1.2 (May 2026) adds per-node timeouts / error recovery / graceful shutdown, a newDeltaChannelto cut checkpoint overhead on long threads, and a content-block-centric streaming API v3. - CrewAI - Framework for orchestrating role-playing autonomous AI agents in collaborative teams.
- Microsoft Agent Framework - 🆕 Unified framework merging AutoGen + Semantic Kernel. Multi-agent conversations with enterprise features. GA Q1 2026.
- Microsoft Agent 365 - 🆕 GA May 1, 2026. Enterprise observability + governance + security for AI agents across environments; May 2026 update adds Secure Access Service Edge (SASE) for agents, threat detection / blocking, and agent-threat-hunting workflows.
- AutoGen - Multi-agent conversation framework by Microsoft (now part of Microsoft Agent Framework).
- Google Agent Development Kit (ADK) - 🆕 Modular framework integrated with Gemini and Vertex AI. Hierarchical agent compositions.
- OpenAI Agents SDK - 🆕 Next evolution shipped April 15, 2026 — native sandbox execution, MCP-native tool use, sub-agent handoffs, Codex-style filesystem ops. Production-ready multi-agent workflows.
- MetaGPT - Multi-agent framework assigning different roles to GPTs for collaborative software entities.
- Mastra - 🆕 TypeScript-first agent framework with workflow-driven development and built-in observability.
- Ontheia - Self-hosted, open-source AI agent platform. Multi-provider (Claude, OpenAI, Gemini, Ollama), MCP-native, Chain Engine for visual workflow automation, long-term memory (pgvector), multi-user RBAC, GDPR-compliant by architecture. AGPL-3.0.
- AgentGPT - 📦 Archived (2025-04). Assemble, configure, and deploy autonomous AI agents in your browser. Influential first-wave project, kept for historical reference; no longer maintained.
- BabyAGI - AI-powered task management system using LLMs to create, prioritize, and execute tasks.
- SuperAGI - 💤 Stale (no commits since 2025-01). Open-source autonomous AI agent framework to build, manage & run agents.
- Semantic Kernel - Integrate LLM technology into apps. C#, Python, Java support.
- Phidata (Agno) - Build multi-modal agents with memory, knowledge, tools and reasoning.
- DSPy - The framework for programming—not prompting—language models.
- OpenClaw - 🆕 Personal AI agent platform with skills, memory, multi-channel messaging, Dreaming (3-stage memory consolidation), Canvas/A2UI, ACP coding harness integration, and Standing Orders. v2026.5.12 (May 14, 2026) with Claude Opus 4.7, Kimi K2.6,
/think xhighsupport, native model identity injection, isolated Telegram polling worker, and tightened protected-config paths. - Dify - Open-source LLM app development platform with visual agent builder.
- Haystack Agents - End-to-end LLM framework for agentic pipelines.
- Vellum AI - 🆕 Production-grade agent framework with prompt-based building, evaluations, versioning, and observability.
- FastAgency - 🆕 High-speed inference and production scaling framework for agents.
- Rasa - Open-source conversational AI with strong intent recognition and dialogue management.
- Lindy - 🆕 Top no-code agent framework for business users with visual workflow builder.
- Octomind - 🆕 Rust-based open-source AI agent runtime. Model-agnostic (13+ providers), community-built specialist agents (developer, medical, legal, DevOps), MCP support with runtime self-extension, zero-config setup. Apache 2.0.
- Microsoft AI Agent Governance Toolkit - 🆕 April 3, 2026. Open-source toolkit for enforcing runtime security policies across agent frameworks including LangChain and AutoGen. Policy-as-code approach for enterprise AI governance.
- Bernstein - 🆕 Python orchestrator for 40+ CLI coding agents (Claude Code, Codex, Gemini CLI, Cursor, Aider). One LLM plan call up front; scheduling, git worktree isolation, quality gates, and HMAC-chained audit are deterministic. Apache-2.0.
- Genkit Middleware - 🆕 May 14, 2026. New middleware system for Google's open-source Genkit framework. Composable hooks at the generate / model / tool layers — retries with exponential backoff, model fallbacks, tool approval gates, scoped filesystem access, skill injection from
SKILL.md. TypeScript / Go / Dart; Python next. - LlamaIndex ↔ Google Agents API integration - 🆕 May 20, 2026. LlamaIndex ships a template for Google's newly launched Agents API exposing LlamaParse / LiteParse over unstructured documents inside a sandboxed Linux environment. Companion Sandboxed-Lit runtime and ParseBench (first OCR benchmark designed for agents) introduced in the same release wave.
Visual environments for designing, debugging, and shipping agent workflows without (or with minimal) code.
- LangGraph Studio - Visual debugger and trace inspector for LangGraph agents — step through state, replay turns, edit messages mid-flight. Companion to the LangGraph runtime.
- Dify - Open-source LLM app development platform with drag-and-drop agent workflow builder. Mainstream production deployments.
- Agenta - 🆕 Open-source LLMOps platform combining a prompt playground, prompt management, evaluation runs, and observability in one UI.
- Vellum AI - Production-grade agent IDE with prompt building, evaluations, versioning, and observability — closed-source SaaS.
- Cozeloop - 🆕 🇨🇳 ByteDance's open-source agent optimization platform: full-lifecycle development, debugging, evaluation, and monitoring. Apache-2.0.
- Restack - Durable agent runtime + visual workflow editor (built on Temporal-style replay). Open-source examples in restackio/examples-python.
- Bisheng - 🇨🇳 Open enterprise LLM DevOps platform: workflow editor, RAG, agent orchestration, fine-tuning, dataset management, observability. Apache-2.0.
- n8n - General-purpose visual workflow automation that has become a popular agent canvas — 400+ integrations + native AI nodes. Fair-code license.
- Mastra - 🆕 Opinionated TypeScript agent framework with RAG, observability, MCP, and visual workflow builder; 21K+ stars.
- VoltAgent - 🆕 End-to-end TypeScript AI Agent Engineering Platform with memory, RAG, guardrails, MCP, voice, and workflow capabilities.
Systems for giving agents persistent memory and context management.
- Letta (MemGPT) - Create LLM services with long-term memory and custom tools.
- Mem0 - The Memory layer for your AI apps — self-improving memory for LLM applications. April 2026 algorithm upgrade: single-pass add-only extraction, entity linking, multi-signal retrieval; benchmark wins on LoCoMo, LongMemEval, BEAM. 55K+ stars, 21+ official framework integrations.
- Zep - Long-term memory for AI assistants and agents.
- agent-memory - Lightweight agent memory framework for persistent context across sessions.
- Mem0g (graph variant) - 🆕 Graph-enhanced sibling of Mem0 for multi-hop questions; 21+ framework integrations as of early 2026.
- Graphiti - 🆕 Zep's open-source temporal knowledge graph engine; every fact is timestamped so agents can reason about "when" as well as "what".
- LangMem - 🆕 Spun out of LangGraph 0.3.19 (April 2026). Long-term episodic + procedural memory primitive for agents.
- Claude Managed Agents Memory - 🆕 April 23, 2026 public beta. Persistent cross-session memory baked into Anthropic's hosted agent runtime.
- LangMem - Long-term memory library for LangChain agents.
- Motorhead - 💤 Stale (no commits since 2025-07). Memory and context management server for LLMs.
- ChromaDB - AI-native open-source embedding database for memory-augmented agents.
- Cognee - Deterministic LLM outputs using graphs, LLMs, and vector retrieval.
- LangGraph Memory - 🆕 Built-in persistence and checkpointing for stateful agent workflows.
- Graphiti - 🆕 Build and query knowledge graphs for agent memory using temporal awareness.
- Claude Managed Agents Memory - 🆕 April 23, 2026 (public beta). Anthropic's persistent memory feature for Claude Managed Agents. Agents retain information across sessions by mounting read/write memory stores to a filesystem. Enables long-running agents to learn and adapt without resetting context.
Protocols and tools for connecting agents to external services and APIs.
- Model Context Protocol (MCP) - Open protocol for connecting AI models to external tools and data sources.
- mcp-gateway - Gateway server for routing and managing MCP protocol connections.
- Composio - Integration platform for AI agents — 150+ tools with managed auth.
- Toolhouse - Cloud infrastructure for AI tool use — store, manage, and execute tools.
- LangChain Tools - Extensive collection of tool integrations within the LangChain ecosystem.
- Arcade AI - Tool calling platform for AI agents and assistants.
- E2B - Open-source cloud runtime for AI agents — secure sandboxed environments for code execution.
- Browser Use - Make websites accessible for AI agents with browser automation.
- Firecrawl - 🆕 Turn websites into LLM-ready data. Crawl and convert any website for AI.
- Crawl4AI - 🆕 Open-source LLM-friendly web crawler and scraper.
- Stagehand - 🆕 AI-powered browser automation framework by Browserbase.
- AgentQL - 🆕 Query language for AI agents to interact with web pages semantically.
- StackOne - 🆕 Unified API for AI agent integrations across HR, CRM, and ATS platforms.
- AWS MCP Server - 🆕 GA May 6, 2026. AWS-managed MCP server giving coding agents secure, auditable access to any AWS API; sandboxed Python execution for multi-step ops; replaces "agent SOPs" with agent skills. First-party from AWS.
- Google Workspace MCP Server - 🆕 Rollout from May 1, 2026. Workspace-native MCP server exposing Gmail / Drive / Calendar / Docs / Sheets to MCP clients, with admin-controlled OAuth scopes and audit trails.
- iManage MCP Server - 🆕 May 14, 2026. Native MCP endpoint for the iManage knowledge-work platform — lets any AI client securely read/write iManage documents without custom integration. First major legal/professional-services SaaS to ship a public MCP server.
- Power Platform Canvas Authoring MCP Server - 🆕 May 14, 2026. Microsoft Power Platform feature exposing Canvas Apps authoring as an MCP server; lets Copilot / Claude Code drive natural-language InfoPath → Canvas Apps migration.
- The Colony -
⚠️ Unverified. Self-described public agent-first social network with REST API for agent posts/votes/DMs and SDKs in Python (colony-sdk-python), TypeScript (colony-sdk-js) and Go (colony-sdk-go). Organisation and SDK repos are <30 days old, all 0–2 stars, single-maintainer; same submission was sent to 15+ awesome lists in parallel — listed for visibility, evaluate before depending on it.
Secure runtimes that let agents execute generated code and shell commands without compromising the host. Critical infrastructure once you let an agent off the leash.
- E2B - Open-source secure cloud sandbox for AI-generated code. Used as the execution layer in OpenAI Agents SDK and many production agents.
- Daytona - 🆕 Secure, elastic infrastructure for running AI-generated code. Spin up isolated dev environments per agent task; AGPL-3.0.
- Modal - Serverless cloud platform popular for agent compute, GPU jobs, and sandboxed Python —
modal-clientis the official SDK. - Microsandbox - 🆕 Local, programmable microVM sandboxes for AI agents — secure code execution on your own machine, no cloud dependency.
- SandboxFusion - ByteDance's multi-language code-execution sandbox built for agent / model evaluation pipelines. Apache-2.0.
- Northflank - General-purpose container PaaS used as an agent runtime backend (per-task ephemeral environments, GPU pools).
- Firecracker - The microVM kernel underneath E2B, Daytona and most agent sandboxes. Useful as a primitive when building your own.
Tools and frameworks for securing AI agents against prompt injection, data leaks, and misuse.
- AgentGate - 🆕 Pre-execution authorization PDP for autonomous AI agents. Scores trust across 4 dimensions per request, detects 24h kill-chain patterns (BULK_READ_THEN_EXFIL, SENSITIVITY_RAMP), Merkle-chained audit trail. MIT license, drop-in with LangGraph, LangChain, and AutoGen. tryagentgate.com
- prompt-firewall - Firewall for LLM prompts — detect and block prompt injection attacks.
- LLM Guard - The Security Toolkit for LLM Interactions — input/output scanners for AI.
- Rebuff - 📦 Archived (2024-08). Self-hardening prompt injection detector — detect, deflect, and report. Listed for historical reference; no longer maintained.
- Guardrails AI - Adding guardrails to large language models — validate and correct LLM outputs.
- NeMo Guardrails - Toolkit for adding programmable guardrails to LLM-based conversational systems.
- Vigil - 💤 Stale (no commits since 2024-01). LLM security scanner — detect prompt injections, jailbreaks, and data leakage.
- Lakera Guard - Enterprise-grade AI security platform for prompt injection defense.
- Garak - LLM vulnerability scanner by NVIDIA — probe for weaknesses in language models.
- Invariant Guardrails - 🆕 Runtime guardrails for AI agents — policy enforcement and safety checks.
- Prompt Armor - 🆕 Enterprise prompt injection protection with real-time detection.
- Descope MCP Auth - 🆕 Authentication and authorization layer for MCP server security.
- AgentDojo - 🆕 ETH Zürich research benchmark for evaluating prompt-injection attacks and defenses against tool-using LLM agents.
- ModelScan - Scan ML model files (Pickle, PyTorch, TF) for serialization-based code-execution attacks.
- PyRIT - Microsoft's Python Risk Identification Tool for generative AI — automated red-teaming framework.
- RAMPART - 🆕 May 20, 2026. Microsoft's pytest-native safety + security testing framework for agentic AI. Developer-facing white-box counterpart to PyRIT — cross-prompt-injection probes, benign-failure asserts, harm-category coverage, statistical thresholds (e.g. safe in 80%+ runs). Integrates straight into CI/CD. MIT.
- Clarity (Microsoft) - 🆕 May 20, 2026. Companion to RAMPART. Structured design-review tool for AI agents — "living artifacts" documenting intent, risks, and behavior before code is written. Open-sourced from Microsoft AI Red Team's internal practice.
- Nobulex -
⚠️ Unverified. Cryptographic receipts for AI agent actions (Ed25519 dual signatures, hash-chained audit logs). MIT. Bilateral-receipt primitive merged into Microsoft's Agent Governance Toolkit (PRs #1302, #1333). Same submission sent to 15+ awesome lists in parallel; submitter's claim of "4,500 npm downloads" doesn't match registry data (@nobulex/mcp-server~19/month at audit time). Listed for visibility on the strength of the Microsoft adoption.
Retrieval-augmented generation and knowledge management systems for agents.
- LlamaIndex - Data framework for LLM-based applications — ingest, structure, and access private data.
- Haystack - End-to-end LLM framework for building RAG pipelines and search systems.
- Unstructured - Open-source components for pre-processing documents for LLMs and RAG.
- Chroma - AI-native open-source embedding database.
- Weaviate - Open-source vector database for AI-native applications.
- Qdrant - High-performance vector similarity search engine and database.
- Pinecone - Managed vector database for high-performance AI applications.
- Milvus - Cloud-native vector database for scalable similarity search.
- RAGFlow - Open-source RAG engine based on deep document understanding.
- Docling - Document parsing and conversion for RAG and generative AI.
- Kotaemon - 🆕 Open-source RAG-based tool for chatting with documents.
- LightRAG - 🆕 Simple and fast RAG engine with graph-based knowledge indexing.
- R2R - 🆕 Production-ready RAG engine with built-in auth, observability, and ingestion.
- Vanna - 📦 Archived (2026-02). RAG for SQL — chat with your database using natural language.
- Morphik - 🆕 Multimodal RAG engine for documents containing tables, figures, and charts; rapidly-rising 2026 alternative to LlamaIndex for complex PDFs.
- Cognee - 🆕 Memory + reasoning engine that builds a knowledge graph as agents ingest documents; 2026 darling for "long-running research agent" stacks.
AI-powered coding assistants and autonomous software engineering agents.
- Claude Code - Anthropic's agentic coding tool. 80.9% SWE-bench score, handles complex multi-file bugs. May 2026 (v2.1.128–2.1.141): new
/goalcommand for cross-turn completion conditions, agent view, plugin loading from.ziparchives + URLs,Ctrl+Rglobal history search, broader MCP/hook handling, enterprise feedback surveys. - Codex CLI - OpenAI's open-source terminal coding agent (Rust, Apache-2.0, 82K+ stars). 77.3% Terminal-Bench score. May 2026 adds Codex Chrome extension for in-browser DevTools workflows,
codex remote-controlheadless app-server, plugin-detail bundled-hook display, and Codex on Mobile preview (May 14) that lets ChatGPT iOS/Android remote-control the macOS Codex app. - Codex Security - 🆕 March 2026. Application-security agent that finds and fixes software vulnerabilities; available to OSS maintainers via the Codex-for-OSS program.
- Aider - AI pair programming in your terminal — works with any LLM, with first-class git commit handling.
- Gemini CLI - 🆕 Google's terminal-first coding agent for large-context refactors.
- Grok Build - 🆕 May 14, 2026 (early beta). xAI's agentic CLI coding agent powered by grok-code-fast-1. Parallel sub-agents in isolated environments, daily release notes, available to SuperGrok Heavy subscribers ($99/mo intro for 6 months, then $300/mo). xAI's reply to Claude Code and Codex CLI.
- Antigravity CLI - 🆕 May 19, 2026 (Google I/O 2026). Lightweight CLI companion to Antigravity 2.0 — create and interact with Google agent harnesses directly from the terminal. macOS / Linux / Windows.
- Cursor 3.09 - 🆕 April 3, 2026 update. Strengthened Agent mode for true Vibe Coding workflows. Core AI code editor in 2026 landscape.
- Kilo Code - 🆕 April 2026 rising challenger to Cursor. Default model: MiniMax. Viral on Chinese developer communities (Bilibili).
- Cursor - The AI code editor with Feb 2026 update supporting up to 8 parallel agents.
- Windsurf - Agentic IDE by Codeium — AI-first code editing experience.
- Cline - Autonomous coding agent in your IDE — VS Code extension.
- Roo Code - 🆕 Open-source VS Code extension that reads/writes across multiple files, runs commands, model-agnostic; free except for the LLM API you bring.
- Void - 🆕 Fork of VS Code positioned as the open-source Cursor alternative; data stays with you, BYO model.
- Continue - Open-source AI code assistant for VS Code and JetBrains.
- GitHub Copilot - Agent mode with expanded model access and
gh copilotshell integration in early 2026. - Cursor 3.3 - 🆕 May 2026. PR-review experience, parallel agents, enterprise model controls; previous 3.1 in April.
- Cursor SDK - 🆕 May 4, 2026. TypeScript SDK exposing Cursor's runtime, harness, and models so developers can build programmatic agents on top of the Cursor stack; ships with the v2.5 security patch fixing an arbitrary-code-execution vulnerability via malicious git repos.
- Cursor 3.4 (Teams + PR review) - 🆕 May 11–13, 2026. Microsoft Teams integration (
@Cursorin Teams delegates to cloud agents), faster parallel-agent plan execution, multi-repo / Dockerfile-based dev-environment configs for agents,/multitaskasync sub-agents, Vulnerability Scanner, granular per-model access controls. - Kiro - AWS autonomous agent. Spec-driven development, manages up to 10 simultaneous tasks.
- Amazon Q Developer - AI coding companion deeply integrated with AWS ecosystem.
- Visual Studio 2026 Agent Mode + Skills - 🆕 VS 2026 Insiders May 12-15, 2026. Copilot Chat "Agent Mode" now ships a guided Skills workflow inside Visual Studio 2026: discover, manage, and author reusable Copilot Skills with whole-solution context, plus terminal command execution and tool invocation.
- JetBrains Rider AI Test-Writing Skill - 🆕 May 22, 2026. New AI Assistant skill for JetBrains Rider that surfaces .NET coverage data to Claude Code / Codex so agents target untested branches, reducing AI cost for test generation.
- Cursor 3.4 Cloud Agent Environments - 🆕 May 13, 2026. New dev environments for cloud agents: multi-repo workspaces, Dockerfile-based config with build secrets, 70% faster cached image layers, per-environment version history with rollback, audit logs, scoped egress and secrets. Companion to the Cursor 3.4 release.
- Devin 3.0 - 🆕 By Cognition. Dynamic re-planning, self-healing code, legacy codebase migration, multi-modal input (UI mockups, video recordings).
- Devin 2.2 - 🆕 February 2026. Sandboxed terminal + editor + browser; commercial product (Core $20/mo, Team $500/mo).
- OpenHands - Open-source platform for AI software developers as autonomous agents.
- SWE-agent - Turn LLMs into software engineering agents that fix real GitHub issues.
- Devika - 💤 Stale (no commits since 2025-09). Agentic AI software engineer — open-source alternative to Devin.
- GPT Engineer - 📦 Archived (2025-05). Specify what you want built, AI asks for clarification, then builds it. Foundational project of the autonomous-coding era, kept for historical reference.
- Codegen - 🆕 Programmatic code manipulation and multi-file refactoring SDK.
- Qodo - 🆕 AI Code Review Platform focused on quality, security, and test generation.
- Google Antigravity 2.0 - 🆕 May 19, 2026 (Google I/O 2026). Standalone desktop application (macOS / Linux / Windows) for orchestrating multiple agents in parallel. Adds scheduled cron-style runs, async long-running tasks, dynamic sub-agents, and integrations with AI Studio / Android / Firebase. Companion Antigravity SDK lets you host the harness on your own infra; enterprise edition lands inside Gemini Enterprise Agent Platform.
AI systems that perceive, reason about, and act in the physical world — humanoid robots, factory automation, Physical AI infrastructure. The next wave after language agents.
- Google Gemini Robotics ER-1.6 - 🆕 April 14, 2026. Robotics AI model with enhanced spatial and physical reasoning. Integrated into real robots via Agile Robotics partnership.
- Project Prometheus (Bezos) - 🆕 Jeff Bezos-led Physical AI venture. Raising $10B at $38B valuation to embed AI into physical systems and robotics.
- NVIDIA Isaac GR00T - NVIDIA's foundation model platform for humanoid robots. Unveiled at GTC, expanded at Hannover Messe 2026.
- NVIDIA Industrial AI Cloud - 🆕 April 2026 (Hannover Messe). Deutsche Telekom-built AI factory infrastructure for industrial AI workloads.
- Tesla Optimus Gen3 (V3) - 🆕 AWE 2026 Shanghai debut. First mass-produced Optimus; Fremont line started January 2026, 50K-100K units/year initial target, ~$30K USD initial price, late-2026 limited external sales. 37 joints, 1.2 m/s walking, 22-DoF hands.
- Figure 03 (Helix AI) - 🆕 Late 2025 announcement, ramping in 2026. First Figure model designed for the home: soft textile coverings, wireless charging, tactile sensors. May 2026 demo: two F.03 robots autonomously cleaning a room and making a bed in <2 minutes via visual coordination only.
- Figure 04 - 🆕 May 13, 2026. Founder Brett Adcock announces Figure 04 design finalized; component deliveries underway. Successor to F.03 with the Helix VLA model.
- Helix 02 package-sort 72h run - 🆕 May 13-16, 2026. Live-streamed Figure F.03 fleet runs Helix 02 fully autonomously on a package-sort line — 8-hour shift on day one (~22K packages), then ~30K in the first 24 hours, ending in a stress test that hit ~88K packages over ~72 hours before mechanical failure. First public continuous-run evidence for a home-form-factor humanoid stack.
- Figure F.03 vs human 8-hour sort challenge - 🆕 May 18, 2026. Figure runs the first public head-to-head: one F.03 robot vs one trained human, 8-hour shift on the same package-sort line. Human wins narrowly — 12,924 parcels (2.79 s/item) vs 12,732 parcels (2.83 s/item). Tightest published gap to human throughput on a real industrial task to date.
- Boston Dynamics Atlas 100-lb manipulation + Hyundai 25K plan - 🆕 May 18-19, 2026. Boston Dynamics publishes video + technical blog showing Atlas lifting and carrying >100 lb loads (mini-fridge / washing machine) via RL + large-scale sim training; whole-body control adapts to weight shifts without per-object identification. Hyundai Motor Group commits to deploying 25,000+ Atlas units across Hyundai/Kia plants starting 2028 in Georgia.
- Unitree G1 deployed at JAL Haneda - 🆕 May 2026. Japan Airlines starts a Haneda ground-operations trial with Unitree G1 humanoids (baggage loading, container transport, cabin cleaning) — marketed as the first commercial airline trial of bipedal robots in active aviation service. US Congress separately moves to add Unitree to the entity list on national-security grounds the same week, underscoring how fast the embodied-AI supply chain is becoming geopolitical.
- Figure 02 + Helix 02 - 🆕 January 2026. Helix 02 expands whole-body autonomy (load/unload dishwashers, fold laundry); BotQ facility rated for 12K units/year.
- Unitree G1 + H1-2 - 🆕 CES 2026. G1 dance/boxing/skating demos, autonomous kung fu (February), 5'8" H1-2 industrial unit at 7.4 mph. 20K humanoid shipments targeted in 2026.
- Unitree R1 Air - 🆕 Consumer humanoid at $4,900 — runs, flips, walks on hands.
- Unitree Gen 2 (lifelike skin) - 🆕 Realistic human-like skin with embedded pressure / temperature / touch sensors.
- Unitree GD01 - 🆕 May 2026. Nearly 10-foot manned mecha; pilot-driven, switches between bipedal and quadrupedal modes. Priced from ¥3.9M (~$650K). Tracks how the embodied-agent stack is starting to fork into operator-piloted form factors.
- Honor (荣耀) Humanoid - 🆕 Set world record at 2026 half-marathon for humanoid robots.
- Zhiyuan (智元) AGIBOT - 🆕 April 2026. New humanoid body, foundation model, and solution suite. Calls 2026 "Deployment Year Zero."
- Unitree H-series - Boston Dynamics competitor from China. Ongoing 2026 iterations.
- Agile Robotics - 🆕 Gemini Robotics ER-1.6 deployment partner. German robotics company.
- Shenzhen Humanoid Pilot Line - 🆕 🇨🇳 Shenzhen launched its first pilot production line for humanoid robots on April 12, 2026 (Leju Robotics + Dongfang Precision in Longhua District). 2-hour assembly cycle, 500–1,000 units/year, with mass production moving to a 10,000-units/year Foshan facility.
- Doubao AI Glasses (ByteDance) - 🆕 Q2 2026 launch. Real-time translation, object recognition, Doubao LLM integration.
- Nothing AI Glasses/Earbuds - 🆕 Announced April 2026. AI-integrated smart wearables.
- Samsung Galaxy S26 (Gauss 2.3) - On-device agentic AI. Gauss 2.3 Think and Gauss O Flash variants.
- Meta Ray-Ban Stories 3 - Continued iteration with deeper Llama integration.
- Tesla FSD v13 - Expanding L4-capable deployment across major markets.
- Waymo - Continuing commercial L4 rollout in US cities through 2026.
- WeRide / Pony.ai / Baidu Apollo - 🇨🇳 Chinese L4 fleets expanding operational zones.
Research environments where agents are trained, observed, or stress-tested in simulated worlds. Increasingly relevant as world-model and embodied research bleeds into language-agent design.
- Generative Agents - 💤 Stanford's seminal Smallville simulacrum (Park et al., 2023). Memory + reflection + planning in a town of 25 LLM-driven characters. Reference implementation that influenced almost every multi-agent paper since.
- Voyager - 💤 First lifelong-learning agent in Minecraft — GPT-4 with skill library and curriculum (Wang et al., 2023). Still the canonical open-ended agent benchmark.
- SWE-Gym - Open environment to train SWE agents on real GitHub issues; companion to SWE-bench.
- WebArena - Realistic, reproducible web environment (Reddit / shopping / GitLab clones) used by OSWorld and most browser-agent papers.
- WorkArena - ServiceNow's enterprise workplace benchmark for browser agents.
- Genie 3 / Genie 4 - Google DeepMind's interactive video world models — generate playable 3D worlds from a prompt. Closed-weights research, no public code.
- NVIDIA Cosmos - NVIDIA's foundation world model for embodied AI / robotics — generate physically plausible video futures.
Standard evaluation suites and live leaderboards tracking frontier AI capability as of 2026.
- BenchLM - 🆕 Composite leaderboard that aggregates multiple benchmark families. April 2026 top: Claude Mythos Preview 99, Gemini 3.1 Pro / GPT-5.4 tied at 94, Claude Opus 4.6 / GPT-5.4 Pro at 92, GLM-5 Reasoning 85 (top open).
- SWE-bench Verified - Real-world GitHub issue resolution benchmark. April 2026 top: Claude Mythos 93.9%, Claude Opus 4.7 87.6%.
- GPQA Diamond - 💤 Stale dataset repo (last update 2024-09). Expert-level science reasoning. April 2026 top: Gemini 3.1 Pro 94.3% (world-record), Claude Opus 4.7 94.2%.
- ARC-AGI 2 - Abstract reasoning over novel tasks. Gemini 3.1 Pro 77.1%.
- OSWorld - Desktop GUI manipulation. GPT-5.4 at 75% (exceeded human baseline).
- LMArena (formerly Chatbot Arena) - Crowdsourced chat preference battles. Opus 4.6 currently leads.
- MMLU-Pro - Multi-task language understanding, harder successor to MMLU.
- LiveCodeBench - Contest-style coding benchmark, updated continuously to resist contamination.
- AIME 2025 / Humanity's Last Exam (HLE) - Elite math / PhD-level general reasoning.
- Terminal-Bench - CLI agent evaluation. Codex CLI 77.3%.
- Wolfram LLM Benchmarking Project - Code generation benchmark from English spec to Wolfram Language. Updated continuously.
- Terminal-Bench 2.0 - 🆕 Late 2025 / early 2026. 89 curated terminal tasks (compile, train, configure, debug). May 2026 leader: GPT-5.5 82.7%, Claude Opus 4.7 69.4%.
- GDPval / GDPval-MM - 🆕 Feb 2026. OpenAI economic-value benchmark across 44 occupations / 9 industries; 1,320 expert-built tasks. May 2026 leader: GPT-5.5 84.9% on GDPval-MM.
- SWE-bench Pro - 🆕 Repository-level engineering successor to Verified. Claude Opus 4.7 64.3% > GPT-5.5 58.6% (Claude leads on long-horizon repo work).
- Hieroglyphic Benchmark - 🆕 Lateral / abstract-reasoning benchmark; Gemini 3.5 "Snowbunny" 80% (leaked).
- LLM-Stats Live Leaderboard - 🆕 Continuously-refreshed cross-benchmark dashboard for newly-released models.
AI agents that can see, control, and automate desktop environments at the OS level. For purely browser-based agents see 🌐 Browser & Web Agents.
- Claude Computer Use - 🆕 Anthropic's "Desktop Intelligence" — Claude sees your screen and uses mouse/keyboard to automate any software.
- OpenAI Operator - 🆕 Browser agent for booking, form-filling, and web task automation.
- Google Project Mariner - 📦 Discontinued (2026-05-04). Browser-agent research project; capabilities merged into Gemini Agent.
- Microsoft Copilot Agents - 🆕 Autonomous background agents across the Microsoft 365 stack. Beyond sidebar — executes tasks and surfaces for approvals.
- Open Interpreter - A natural language interface for computers — let LLMs run code locally.
- Manus AI - 🆕 🇨🇳 Autonomous general-purpose AI agent with cloud-to-local hybrid model. Handles research, coding, and complex multi-step tasks.
- Genspark - 🆕 All-in-one autonomous work agent with mixture-of-agents architecture. Can make phone calls.
- Perplexity Computer - 🆕 Research-focused desktop agent with multi-model orchestration and local file access.
- Beam AI - 🆕 Self-learning desktop agents that refine logic based on successful outcomes.
- ChatGPT Workspace Agents - 🆕 Research preview April 22, 2026; credit-based pricing May 6, 2026; EKM support May 7, 2026. OpenAI's successor to Custom GPTs for enterprises — cloud-side agents with file access, code execution, scheduled runs and built-in connectors for Slack, Google Drive, Salesforce. Available on Business / Enterprise / Edu / Teachers; powered by Codex.
Frameworks and infrastructure for agents that interact with the web through real browsers — navigate, click, scrape, and complete multi-page workflows.
- Browser Use - Make websites accessible for AI agents with browser automation. The de-facto open-source choice in 2026, 92K stars.
- Stagehand - The SDK for browser agents — typed
act/extract/observeprimitives over Playwright by Browserbase. MIT. - Steel Browser - 🆕 Open-source browser API for AI agents — batteries-included sandboxed Chromium with session persistence and proxy rotation. Apache-2.0.
- Skyvern - Automate browser-based workflows with LLMs and computer vision. AGPL-3.0.
- AgentQL - Query language + Playwright integration for semantic web extraction. Reliable on dynamic, cluttered pages.
- Hyperbrowser MCP - 🆕 Hosted headless-browser fleet exposed as an MCP server — plug into Claude/GPT/LangChain via the standard tool interface.
- Playwright MCP - 🆕 Microsoft's official Playwright server exposed as an MCP tool. Production-grade automation primitives without rolling your own bridge.
- MultiOn - Hosted browser agent platform with native Reasoning + Memory. Closed-source.
- Browserbase - Headless browser infrastructure built specifically for AI agents — stealth, persistence, captcha solving, observability.
Voice-enabled and multimodal AI agent platforms.
- ElevenLabs - AI voice platform with conversational AI agents and realistic speech synthesis.
- Vapi - Enterprise voice AI platform — build, test, and deploy voice agents. $50M Series B announced May 12, 2026 after crossing 1B platform calls; May 2026 updates ship Squads v2 (multi-assistant orchestration), Composer alpha (prompt-built agents), Simulations alpha (systematic AI-powered testing), and GA of the Soniox low-latency multilingual transcriber.
- Retell AI - Build production-ready conversational voice AI agents.
- Bland AI - AI phone calling platform — enterprise-grade conversational AI.
- LiveKit Agents - Build real-time multimodal AI agents with voice, video, and data.
- Pipecat - Open-source framework for voice and multimodal conversational AI.
- Vocode - 💤 Stale (no commits since 2024-11). Open-source library for building voice-based LLM agents.
- Bolna - End-to-end open-source voice AI agents framework.
- Cartesia - 🆕 Ultra-low-latency voice AI for real-time conversational agents.
- Meta Voice AI - 🆕 Former PlayHT/Play.ai team's tech, integrated into Meta AI, AI Characters, and Meta wearables after July 2025 acquisition. Original Play.ai platform shut down Dec 31, 2025.
- Sesame - 🆕 Voice AI companion with emotional understanding and natural conversation.
- ElevenAgents - 🆕 ElevenLabs' full-stack voice-agent platform (April-May 2026 updates): MCP, multimodal messages, conversation topic discovery, knowledge-base search, pre-tool speech controls. First voice-agent platform to earn AIUC-1 certification.
- Cartesia Line - 🆕 April 2026. Code-first voice-agent platform built on Sonic 3 TTS + Ink STT; ~40-90ms time-to-first-audio.
- Deepgram Voice Agent API - 🆕 Single endpoint bundling STT (Nova-3) + LLM routing + TTS (Aura-2) + Flux conversational STT with mid-call language switching across 10 languages.
- OpenAI Realtime API (GPT-Realtime-2) - 🆕 May 8, 2026. GPT-5-class reasoning over voice with parallel tool calls; supersedes the previous Realtime models for production voice agents.
- OpenYabby - 🆕 Open-source macOS voice-driven multi-agent orchestrator — Realtime API + CLI runners + multi-channel orchestration. A lead agent plans the work and delegates to sub-agents for review and QA. MIT.
AI agents designed for personal use, productivity, and daily life assistance.
- OpenClaw - 🆕 Personal AI agent platform with skills, memory, Dreaming, Canvas/A2UI, ACP coding harness integration. Runs on your machine with multi-channel messaging.
- Rabbit R1 - Dedicated AI hardware device with a large action model for personal assistance.
- Limitless - Personalized AI powered by what you've seen, said, and heard (formerly Rewind).
- Open Interpreter - A natural language interface for computers — let LLMs run code locally.
- 01 Light - 💤 Stale (no commits since 2024-11). Open-source voice interface for computers.
- Leon - Open-source personal assistant — lives on your server.
- Khoj - Personal AI second brain — search and chat with your notes, docs, and images.
- Humane AI Pin - Wearable AI device with a screenless, ambient computing experience.
- Arahi AI - 🆕 Personal productivity and business automation assistant.
- Lindy AI - 🆕 No-code AI agent for email, calendar, and workflow automation.
- MuleRun - 🆕 Always-on agents for recurring tasks and background automation.
- Gemini Intelligence - 🆕 May 12, 2026 (Android Show: I/O Edition). Proactive agentic AI features integrated into Googlebooks laptops, Wear OS, Android Auto, Android XR, and starting on the latest Samsung Galaxy + Pixel devices. Auto-creates shopping carts from grocery lists, books spin classes, filler-word removal via the Rambler speech-to-text.
- Gemini Spark - 🆕 May 14, 2026 (pre-I/O leak / insight). Upcoming branded agent capability inside the Gemini app for autonomously running multi-step processes; sits above Gemini 3.1 Pro reasoning stack.
- QwenPaw - 🆕 🇨🇳 May 2026 rebrand from CoPaw. Self-hostable personal assistant in the Qwen / AgentScope family. Local-first memory, hot-loadable skills, multi-agent collaboration, multi-channel (DingTalk / Feishu / WeChat / Discord / Telegram), tool guard + skill scanner. Apache-2.0.
- AI Growth Agents for Marketers - 🆕 Growth marketing prompts and Python agents built from real fintech campaigns in Southeast Asia. Covers campaign briefs, MEU planning, and A/B test analysis with multi-agent workflows. Agent Skills format — installable via
npx skills add. Bilingual VI + EN. MIT.
GUI agents that drive Android/iOS phones — the next frontier after desktop computer-use. Most major model providers now ship a mobile-grounded variant.
- Mobile-Agent - 🇨🇳 Alibaba's flagship multimodal phone-control agent family (v1 → v3, plus Mobile-Agent-E and Mobile-Agent-V). State-of-the-art on Android benchmarks.
- AppAgent - 💤 Tencent's multimodal agent that operates smartphone apps by tapping/swiping. Influential early implementation.
- Apple Intelligence - On-device agent layer in iOS / iPadOS / macOS. App Intents and screen-aware actions across the OS.
- Samsung Galaxy AI / Bixby 2.0 - On-device Gauss-powered agentic capabilities baked into the Galaxy S26 line.
- Google Gemini for Android - Replaces Google Assistant on Android with full Gemini-powered, app-aware actions including system intents and Workspace.
- Magma - Microsoft Research foundation model for multimodal agents — grounds across UI, robotics, and physical action; targets phones, web, and embodied tasks.
Enterprise-grade platforms for deploying AI agents at scale.
- Salesforce Agentforce - Autonomous AI agents for enterprise CRM — sales, service, and marketing.
- Microsoft Copilot Studio - Build and customize AI agents and copilots for your organization.
- Gemini Enterprise Agent Platform - 🆕 April 22, 2026 (Google Cloud Next '26). Evolution of Vertex AI into a unified hub for building, scaling, governing, and optimizing enterprise agents. Supports Gemini 3.1 Pro/Flash, Lyria 3, plus third-party models (Claude Opus/Sonnet/Haiku). Integrated agent DevOps, security, and orchestration.
- Google Vertex AI Agent Builder - Build and deploy enterprise-ready generative AI agents on Google Cloud.
- Amazon Bedrock Agents - Build AI agents that can execute multi-step tasks across company systems.
- ServiceNow AI Agents - AI agents for enterprise IT service management with AI Control Tower. 🆕
- IBM watsonx Orchestrate - AI assistant platform to automate work across enterprise applications.
- Oracle AI Agents - Enterprise AI agents integrated with Oracle Fusion Cloud ERP. 🆕
- Moveworks - Enterprise copilot platform — AI that works across every system.
- UiPath Agentic Automation - 🆕 Agentic reasoning layered onto RPA bot estates for intelligent process automation.
- AgentX - 🆕 Agentic enterprise solution for scalable AI automation with plug-and-play chatbots.
- Sistava - AI agent orchestration platform for deploying and operating multiple AI agents that run sales, marketing, finance, and customer support. Reachable via Slack, WhatsApp, email, voice, Telegram, API, MCP, A2A, and webhooks, with full computer use on your own OS.
- OutSystems - 🆕 AI development platform for rapidly building mission-critical apps and agent governance.
- Sema4.ai - 🆕 Enterprise AI agent platform with Python-first approach and built-in governance.
- SAP Business AI Platform + Joule Studio 2.0 - 🆕 SAP Sapphire 2026 (May 11-13). SAP unifies BTP + Business Data Cloud + Business AI into one platform and reframes Joule as an agentic operating layer. Joule Studio 2.0 (rolling out June 2026) lets enterprises build with LangGraph / AutoGen-style frameworks against live SAP business data; the new Autonomous Suite ships 50+ domain Joule Assistants and 200+ specialised agents across finance, supply chain, procurement, HCM, and CX.
- Microsoft Agent 365 + Microsoft 365 E7 - 🆕 May 1, 2026 GA with extended May rollouts. Identity-first control plane for governing and securing AI agents across enterprise environments; $15/user/month standalone, $99/user/month inside the new Microsoft 365 E7 "Frontier" suite. May 2026 update adds AWS Bedrock + Google Cloud registry sync, Intune/Defender preview policies, and SASE for agents.
- OpenAI Guaranteed Capacity (Compute Annual Pass) - 🆕 May 19, 2026. Long-term enterprise compute reservations (1 / 2 / 3-year terms, larger discounts at longer terms) sold as a structured product. Designed to derisk enterprise rollout of GPT-5.5-class agents — OpenAI's reply to the Anthropic Priority Tier model.
- Bristol Myers Squibb ↔ Claude Enterprise - 🆕 May 20, 2026. BMS standardises on Claude Enterprise as its shared intelligence platform for 30,000+ employees, embedding agentic Claude into drug-discovery / development / delivery pipelines. First top-5 pharma to make a public, company-wide Claude commitment.
- Kore.ai Artemis Agent Platform - 🆕 May 22, 2026 (launched on Azure). AI-native enterprise agent platform built around the new YAML-style Agent Blueprint Language (ABL) for declarative multi-agent workflows. Kore.ai's structural challenge to Copilot Studio and Agentforce.
- FPT Flezi Foundry™ - 🆕 May 22, 2026. AI-augmented delivery platform with two governed Service-as-a-Software modes — Agentic Development Lifecycle (ADLC) for full SDLC agent crews and Agentic Managed Services (AMS) for incident-resolution agents on top of existing ITOps.
- GreenOps Agent - 🆕 A 4-agent GCP cost and carbon optimization pipeline built on Google ADK, Gemini Flash, and Cloud Run. Detects idle VMs, unattached disks, and unused reserved IPs to calculate CO₂ footprint and execute cleanups with human approval.
Tools for testing, evaluating, and monitoring AI agents in production.
- AgentBench - Multi-dimensional benchmark for evaluating LLMs as agents.
- LangSmith - Platform for debugging, testing, evaluating, and monitoring LLM applications.
- Helicone - Open-source LLM observability platform — logs, metrics, and traces.
- Braintrust - Enterprise-grade stack for building AI products — eval, prompt playground, logging.
- Arize Phoenix - AI observability & evaluation — traces, evals, and datasets.
- Langfuse - Open-source LLM engineering platform — traces, evals, prompt management. Acquired by ClickHouse Jan 2026; March 2026 shift to an observations-centric data model, April 2026 added Langfuse Cloud Japan + Experiments + Langfuse Academy + LLM-as-a-Judge API; v4 self-host release queued.
- OpenLLMetry - Open-source observability for LLM applications based on OpenTelemetry.
- Weights & Biases Weave - Toolkit for developing, evaluating, and monitoring AI applications.
- SWE-bench - Benchmark for evaluating LLMs on real-world software engineering problems.
- Terminal-Bench - 🆕 Benchmark for terminal-based coding agent evaluation. Maintained by Harbor Framework.
- LMArena (formerly LMSYS Chatbot Arena) - 🆕 Crowdsourced LLM benchmark using human preference voting. LMSYS rebranded to LMArena in 2025.
- Patronus AI - 🆕 Automated LLM evaluation and red-teaming platform.
- DeepEval - Pytest-style LLM eval framework with 14+ built-in metrics (G-Eval, hallucination, faithfulness). Most-starred open-source eval lib in 2026. Apache-2.0.
- Agenta - 🆕 Open-source LLMOps platform combining prompt playground, prompt management, evaluation, and observability.
- LangSmith SDK - Official client SDK for LangChain's hosted observability platform.
- AutoEvals - Standalone library of best-practice LLM eval scorers (factuality, JSON validity, semantic similarity, etc.) by Braintrust. Drop-in for any framework.
- BenchClaw -
⚠️ Unverified. Self-described multi-dimensional agent evaluation harness (17-judge tribunal, deception detectors, 10 scoring dimensions). Repo is single-maintainer with very low independent adoption; the same submission was sent to 8+ awesome lists in parallel — one was merged at eudk/awesome-ai-tools, the rest are pending or declined. Listed for visibility, evaluate before relying on its scores. - PromptEden -
⚠️ Unverified. Commercial AI-visibility monitoring service — tracks how ChatGPT, Claude, Gemini, Perplexity, Copilot, and Grok describe brands and which competitors they recommend, refreshed daily across 9+ platforms. Submitted to 10 awesome lists on the same day — promising category but listed for visibility only, evaluate before purchasing.
Tools and platforms for AI/ML research, experimentation, and development.
- Hugging Face - The AI community's platform — models, datasets, and Spaces for ML research.
- vLLM - 🆕 High-throughput LLM serving engine with PagedAttention.
- Ollama - Run LLMs locally with a simple API. Supports Llama, Mistral, Qwen, and more.
- LM Studio - Desktop app for running local LLMs with a user-friendly interface.
- SGLang - 🆕 Fast serving framework for large language and vision models.
- llama.cpp - LLM inference in C/C++ — run models on consumer hardware.
- MLX - 🆕 Apple's array framework for ML on Apple silicon.
- Unsloth - 🆕 Fine-tune LLMs 2x faster with 70% less memory.
- OpenRouter - 🆕 Unified API for accessing 200+ AI models from all major providers.
- Weights & Biases - ML experiment tracking, dataset versioning, and model management.
- Label Studio - Multi-type data labeling and annotation tool.
Papers, courses, tutorials, and guides for understanding and building AI agents.
- ReAct: Synergizing Reasoning and Acting in Language Models - The foundational paper on reasoning + acting in LLMs.
- Toolformer: Language Models Can Teach Themselves to Use Tools - Teaching LLMs to use external tools autonomously.
- Generative Agents: Interactive Simulacra of Human Behavior - Stanford's generative agent architecture with memory and reflection.
- A Survey on Large Language Model based Autonomous Agents - Comprehensive survey of LLM-based autonomous agents.
- The Rise and Potential of Large Language Model Based Agents - In-depth analysis of LLM agent capabilities and future directions.
- Agent Hospital - A simulacrum of hospital with evolvable medical agents.
- Multimodal Intelligence as the Dominant Paradigm in 2026 AI Systems - 🆕 Research on multimodal AI becoming the default paradigm.
- DeepLearning.AI — AI Agents in LangGraph - Short course on building agents with LangGraph.
- DeepLearning.AI — Multi AI Agent Systems with crewAI - Course on building multi-agent systems.
- DeepLearning.AI — A2A Protocol - 🆕 Free course on Google's Agent-to-Agent protocol.
- LangChain Academy - Free courses on LangChain, LangGraph, and agent development.
- Hugging Face — Building AI Agents - Open course on building AI agents with open-source tools.
- LLM Agents MOOC (Berkeley) - UC Berkeley course on LLM agents.
- Microsoft Agent Framework Docs - 🆕 Official documentation for Microsoft's unified agent framework.
- Hugging Face Agents Course - Free 5-unit course (notebooks + videos) on building production agents with smolagents, LangGraph, and Llama-Index.
- Anthropic Cookbook - Official notebooks for tool use, computer use, agent patterns, prompt engineering, and Claude Code recipes.
- Google Gemini Cookbook - Official Gemini API examples covering grounding, function calling, multimodal, and live audio.
- LLM Course (Maxime Labonne) - End-to-end LLM curriculum from fundamentals to fine-tuning, with Colab notebooks. 79K stars.
- Anthropic Courses - Anthropic's official educational courses on prompt engineering, real-world prompts, evals, and tool use.
- awesome-ai-agents - 💤 Stale (last update 2025-02). Curated list of AI autonomous agents by E2B — pre-2026 reference.
- awesome-llm-agents - Curated list of LLM-powered agent resources.
- awesome-mcp-servers - 🆕 Curated list of MCP server implementations.
Major projects from mainland-China teams or primarily targeting the Chinese market. Listed because the China stack is increasingly its own parallel ecosystem with distinct frameworks, models, and developer culture.
Foundation models from Chinese labs (Qwen, DeepSeek, GLM, Doubao, Kimi, Hunyuan, ERNIE) are listed under 🧠 Foundation Models directly.
- Dify - Open-source LLM app development platform with visual agent builder. The dominant low-code agent canvas in Chinese tech.
- Lobe Chat - Multi-agent chat workspace + plugin/agent marketplace. One of the highest-starred TypeScript AI projects. Apache-2.0.
- Cozeloop - 🆕 ByteDance's open-source agent optimization platform from the Coze team.
- AgentScope - Alibaba ModelScope's multi-agent framework with visual debugging and distributed execution. Apache-2.0.
- Bisheng - Open enterprise LLM DevOps platform: workflows, RAG, agents, fine-tuning, evals. Apache-2.0.
- MetaGPT - Multi-agent collaboration framework that assigns SOP roles (PM, architect, engineer) to LLMs. DeepWisdom.
- FastGPT - Knowledge-base-first platform on top of LLMs: data ingestion, RAG retrieval, visual workflow orchestration.
- QAnything - 💤 NetEase Youdao's question-answering engine over arbitrary local documents (PDF/Word/Excel/PPT).
- RAGFlow - Deep-document-understanding RAG engine — strong on scanned PDFs, tables, and charts.
- LightRAG - HKU Data Science Lab's lightweight graph-based RAG engine.
- AppFlowy - Open-source Notion alternative with AI workspace agents. AGPL-3.0.
- Manus AI - General-purpose autonomous agent (Beijing-based Butterfly Effect). One of the most-watched 2026 agent products in Chinese tech.
- Coze (扣才) - ByteDance's no-code agent builder. Mainland-only consumer surface; international counterpart is coze.com.
- Tongyi Qianwen Agent - Alibaba's mass-market consumer agent, integrated across Taobao / DingTalk / Quark.
- Doubao Agents - ByteDance's flagship consumer assistant on top of the Doubao model family.
- Kilo Code - 2026 viral Chinese-community challenger to Cursor. Default model: MiniMax.
- Cherry Studio - Most-installed open-source desktop client for LLMs in Chinese dev circles — multi-provider chat with knowledge base.
- ScienceOne 100 / 磐石100 - 🆕 Chinese Academy of Sciences scientific reasoning agent system, 50+ CAS institutes, 2,000+ research tools.
Quick decision matrices for the most common "which one do I pick?" questions in 2026.
| Framework | Language | Multi-Agent | State / Graph | Streaming | License | Best For |
|---|---|---|---|---|---|---|
| LangGraph | Python / JS | ✅ native | ✅ first-class | ✅ | MIT | Production stateful workflows |
| CrewAI | Python | ✅ role-based | ✅ | MIT | Role-playing agent teams | |
| AutoGen / Microsoft Agent Framework | Python / .NET | ✅ conversational | ✅ | CC-BY-4.0 / MIT | Enterprise multi-agent chat | |
| OpenAI Agents SDK | Python | ✅ handoffs | ❌ | ✅ | MIT | OpenAI-native production |
| Mastra | TypeScript | ✅ | ✅ workflows | ✅ | Elastic-2.0 | TypeScript-first stack |
| Google ADK | Python / Java | ✅ hierarchical | ✅ | Apache-2.0 | Gemini + Vertex AI | |
| DSPy | Python | ✅ | MIT | Programmatic prompt optimization | ||
| Phidata / Agno | Python | ✅ teams | ❌ | ✅ | MPL-2.0 | Multi-modal agents w/ memory |
| Sandbox | Hosting | Cold Start | Languages | Persistence | License | Best For |
|---|---|---|---|---|---|---|
| E2B | Cloud (managed) | ~150ms | Python / Node / shell | per-session | Apache-2.0 | OpenAI Agents SDK / production |
| Daytona | Cloud / self-host | ~500ms | Polyglot | persistent workspaces | AGPL-3.0 | Long-running dev tasks |
| Modal | Cloud (managed) | ~200ms | Python | function-scoped | proprietary | GPU + serverless agents |
| Microsandbox | Local microVM | ~100ms | Polyglot | per-session | Apache-2.0 | Privacy-first local dev |
| SandboxFusion | Self-host | ~300ms | 20+ languages | ephemeral | Apache-2.0 | Eval / benchmark pipelines |
| Stack | Approach | Hosting | Strengths | License |
|---|---|---|---|---|
| Browser Use | Vision + DOM, Playwright | Self-host | Largest community, MIT, 92K stars | MIT |
| Stagehand | Typed act/extract/observe |
Browserbase or self | Strong typing, structured output | MIT |
| Steel Browser | Headless API | Self-host or cloud | Sessions + proxy + captcha | Apache-2.0 |
| Skyvern | Vision-first | Self-host | Robust to dynamic pages | AGPL-3.0 |
| AgentQL | Query language | SDK + self-host | Semantic selectors | MIT |
| Playwright MCP | MCP-native | Self-host | Drop-in MCP tool for any client | Apache-2.0 |
| Tool | Self-host | OpenTelemetry | Eval Suite | Prompt Mgmt | License |
|---|---|---|---|---|---|
| Langfuse | ✅ | ✅ | ✅ | ✅ | MIT |
| Helicone | ✅ | ✅ | ✅ | Apache-2.0 | |
| Arize Phoenix | ✅ | ✅ | ✅ | Elastic-2.0 | |
| LangSmith | ❌ (cloud only) | ✅ | ✅ | ✅ | proprietary |
| Braintrust | ❌ (cloud only) | ✅ | ✅ | ✅ | proprietary |
| DeepEval | ✅ (library) | ✅ | ❌ | Apache-2.0 | |
| Agenta | ✅ | ✅ | ✅ | ✅ | Apache-2.0 |
| OpenLLMetry | ✅ (instrumentation) | ✅ native | ❌ | ❌ | Apache-2.0 |
| Tool | Surface | Open Source | Free Tier | SWE-bench | Best For |
|---|---|---|---|---|---|
| Claude Code | CLI / IDE | ❌ | 80.9% | Long-horizon engineering | |
| Codex CLI | CLI | ✅ | ✅ | n/a (Terminal-Bench 77.3%) | OpenAI-native shells |
| Cursor | IDE | ❌ | ✅ (limited) | n/a | Pair-programming UX |
| Cline | VS Code ext | ✅ | ✅ (BYO key) | n/a | OSS IDE alternative |
| Aider | CLI | ✅ | ✅ (BYO key) | strong on Polyglot | Git-aware refactors |
| Devin 3.0 | Cloud | ❌ | ❌ | leading | Hands-off long tasks |
| OpenHands | Self-host | ✅ | ✅ | competitive | Self-hosted SWE agent |
Tables verified 2026-05-05. Send PRs with sources when figures change.
Prices in USD per 1M tokens. Data: 2026-05-20.
| Model | Provider | Context Window | Input $/1M | Output $/1M | Best For |
|---|---|---|---|---|---|
| GPT-4o | OpenAI | 128K | $2.50 | $10.00 | Broad tool-use, vision, broad ecosystem |
| GPT-4o-mini | OpenAI | 128K | $0.15 | $0.60 | High-volume simple tasks |
| Claude Sonnet 4.6 | Anthropic | 200K | $3.00 | $15.00 | Coding agents, complex reasoning |
| Claude Opus 4.7 | Anthropic | 200K | $5.00 | $25.00 | Hardest reasoning tasks |
| Claude Haiku 4.5 | Anthropic | 200K | $1.00 | $5.00 | Fast Anthropic-ecosystem tasks |
| Gemini 2.5 Flash | 1M | $0.30 | $2.50 | Cost-effective multimodal | |
| Gemini 2.5 Pro | 2M | $1.25 | $10.00 | Long-context, multimodal | |
| Gemini 2.5 Flash-Lite | 1M | $0.10 | $0.40 | Ultra-cheap high-volume | |
| DeepSeek V3.2 | DeepSeek | 128K | $0.14 | $0.28 | Budget-friendly coding + reasoning |
| Qwen3 235B A22B | Alibaba | 131K | ~$0.29 | ~$1.15 | Best Chinese + coding, MoE |
| Kimi K2.6 | Moonshot AI | 262K | ~$0.60 | ~$2.50 | Chinese + long-context tasks |
| Grok 4 | xAI | 256K | $3.00 | $15.00 | X/Twitter integration, reasoning |
| Grok 4.20 | xAI | 2M | $2.00 | $6.00 | Very long context, agentic tasks |
Sources: Anthropic, OpenAI, Google, DeepSeek, Alibaba, Moonshot, xAI official pricing pages — May 2026.
Estimated VRAM at Q4_K_M quantization. Speed varies by hardware.
| Model | Params | Min VRAM (Q4) | Speed (tok/s)* | Best Quantization | Chinese Support | Best For |
|---|---|---|---|---|---|---|
| Qwen3.6-27B | 27B dense | ~17 GB | ~23 (M5 Max) | Q4_K_M / FP8 | ⭐⭐⭐⭐⭐ | Coding, Chinese, agentic tasks |
| Qwen3 235B A22B | 235B MoE | ~40 GB (active) | ~15–20 | Q2_K / Q4_K_M | ⭐⭐⭐⭐⭐ | Best local quality, huge context |
| Llama 3.3 70B | 70B dense | ~42 GB | ~12–18 | Q4_K_M | ⭐⭐☆☆☆ | Best English open-weight |
| DeepSeek V3-671B | 671B MoE | ~40 GB (active) | ~10–15 | Q2_K | ⭐⭐⭐⭐☆ | Open-weight coding champion |
| Gemma 4 27B | 27B dense | ~17 GB | ~20–25 | Q4_K_M | ⭐⭐⭐☆☆ | Multilingual reasoning, Apache-2.0 |
| Phi-4 14B | 14B dense | ~9 GB | ~35–45 | Q4_K_M | ⭐⭐☆☆☆ | Best 8–16GB VRAM coding model |
| Mistral Small 4 24B | 24B dense | ~14 GB | ~25–30 | Q4_K_M | ⭐⭐⭐☆☆ | Multilingual, function calling |
* tok/s measured at typical decode context; varies with hardware, context length, and batch size.
| System | Storage | Retrieval | Local | Self-host | Temporal | License | Best For |
|---|---|---|---|---|---|---|---|
| Mem0 | Vector + Graph | Semantic | ✅ | ✅ | ✅ | Apache-2.0 | Drop-in memory for any LLM app |
| Basic Memory | Markdown files | Keyword + embedding | ✅ | ✅ | MIT | Human-readable, Obsidian-compatible | |
| Graphiti | Temporal knowledge graph | Graph traversal | ✅ | ✅ | ⭐ native | Apache-2.0 | Time-aware agent memory |
| Zep | Vector + summary | Semantic | ✅ | ✅ | ✅ | Apache-2.0 | Production memory for chat agents |
| Memary | Knowledge graph | Graph + semantic | ✅ | ✅ | MIT | Open-source agent memory layer | |
| CORE | Episodic + semantic | Hybrid | ✅ | ✅ | ✅ | Apache-2.0 | Structured episodic + semantic memory |
| Letta (fka MemGPT) | Tiered (core/archival) | Paged retrieval | ✅ | ✅ | ✅ | Apache-2.0 | Long-term memory with infinite context illusion |
| Model / Service | STT | TTS | Realtime | Local | Latency | Languages | License |
|---|---|---|---|---|---|---|---|
| ElevenLabs v3 | ❌ | ⭐⭐⭐⭐⭐ | ✅ | ❌ | ~200ms | 32+ | Proprietary |
| Whisper v3 (local) | ⭐⭐⭐⭐★ | ❌ | ❌ | ✅ | ~1s (large) | 99 | MIT |
| Deepgram Nova-3 | ⭐⭐⭐⭐⭐ | ✅ | ✅ | ❌ | <100ms | 30+ | Proprietary |
| Gemini Live API | ✅ | ✅ | ⭐ native | ❌ | <300ms | 30+ | Proprietary |
| OpenAI Realtime API | ✅ | ✅ | ⭐ native | ❌ | ~300ms | 57 | Proprietary |
| MiniMax TTS | ❌ | ⭐⭐⭐⭐☆ | ✅ | ❌ | ~200ms | 20+ | Proprietary |
| Kokoro | ❌ | ⭐⭐⭐⭐☆ | ❌ | ✅ | ~100ms | 8 | Apache-2.0 |
| Voxtral | ⭐⭐⭐⭐☆ | ❌ | ❌ | ✅ | batch | 20+ | Apache-2.0 |
| Model | Max Resolution | API / Local | Photorealism | Best For | Pricing (approx) |
|---|---|---|---|---|---|
| DALL-E 3 | 1024×1024 | API | High | Instruction-following, broad | $0.04/image (std) |
| gpt-image-2 | 2048×2048 | API | Very high | API workflows, 4K output | $0.04–$0.17/image |
| Flux 2 Pro | 2K+ | API | ⭐ high | Photorealistic, fast generation | ~$0.05/image |
| Midjourney V8 | 2K+ | Web only | Artistic | Best artistic quality | $10–$120/mo plan |
| Stable Diffusion 3.5 | 2K | Local + API | Good | Open-weight, self-hostable | Open weights (Apache-2.0) |
| Ideogram 3 | 2K | API + Web | Good | Typography + text in images | Freemium |
| Gemini 3 Pro Image | 1K | API | High | Native multimodal edit | Vertex AI pricing |
| Model | Max Length | Resolution | API / Local | Best For | Status (2026-05) |
|---|---|---|---|---|---|
| Veo 3.1 | 2 min | 4K | API (Vertex) | Highest fidelity, physics-aware | GA (Google) |
| Kling VIDEO 3.0 | 3 min | 1080p | API + Web | Cinematic style, leading post-Sora | GA (Kuaishou) |
| Runway Gen-4 | 10s/clip | 1080p | API + Web | Precise motion control, professional | GA |
| Pika 2.0 | 10s | 1080p | Web | Creative / social media | GA |
| Seedance 2.0 | 60s | 1080p | API | Fast, cost-effective, social media | GA (ByteDance) |
| Hailuo 02 | 60s | 1080p | Web + API | Smooth motion, accessible | GA (MiniMax) |
| ❌ | ❌ | ❌ | — | Discontinued Apr 2026 |
| Framework | Language | Vector DB | Hybrid Search | Streaming | License | Best For |
|---|---|---|---|---|---|---|
| LlamaIndex | Python | Any | ✅ | ✅ | MIT | Production RAG, document pipelines |
| Haystack | Python | Any | ✅ | ✅ | Apache-2.0 | Pipelines, search-heavy RAG |
| LangChain LCEL | Python / JS | Any | ✅ | ✅ | MIT | Flexible chaining, large ecosystem |
| RAGFlow | Python | Built-in | ✅ | ✅ | Apache-2.0 | Deep document parsing, OCR-aware |
| Cognee | Python | Vector + Graph | ✅ | Apache-2.0 | Knowledge graph + RAG hybrid | |
| txtai | Python | Built-in | ✅ | ❌ | Apache-2.0 | Lightweight, embeddings-first |
| Verba | Python | Weaviate | ❌ | BSD-3 | Weaviate-native RAG chatbot |
| Database | Self-host | Cloud | Scale | Hybrid Search | License | Best For |
|---|---|---|---|---|---|---|
| Qdrant | ✅ | ✅ | Very large | ✅ | Apache-2.0 | Best all-round OSS vector DB |
| Weaviate | ✅ | ✅ | Large | ✅ | BSD-3 | Multi-modal, GraphQL API |
| Pinecone | ❌ | ✅ | Very large | ✅ | Proprietary | Managed, easiest setup |
| Chroma | ✅ | Medium | ❌ | Apache-2.0 | Fast prototyping, Python-native | |
| Milvus | ✅ | ✅ | Very large | ✅ | Apache-2.0 | Billion-scale production |
| pgvector | ✅ | ✅ | Medium | PostgreSQL | Existing Postgres stack | |
| FAISS | ✅ | ❌ | Large | ❌ | MIT | In-memory, GPU-accelerated search |
| Tool | Open Source | Local LLM | Memory | Multi-channel | Self-host | Best For |
|---|---|---|---|---|---|---|
| OpenClaw | ❌ | ✅ | ✅ native | ✅ (TG/Discord/WA) | ✅ | All-in-one personal agent platform |
| Khoj | ✅ | ✅ | ✅ | ✅ | Research, notes, calendar integration | |
| Jan.ai | ✅ | ✅ | ❌ | ❌ | ✅ | Offline ChatGPT replacement, GUI |
| LM Studio | ❌ | ✅ | ❌ | ❌ | ✅ | Easy local model runner, non-technical |
| Perplexity | ❌ | ❌ | ❌ | ❌ | Search-first, cited answers | |
| Claude.ai Pro | ❌ | ❌ | ✅ Projects | ❌ | ❌ | Best reasoning, MCP tools |
| Zo Computer | ❌ | ❌ | ✅ | ❌ | ❌ | Autonomous computer use assistant |
Stars data approximate, 2026-05.
| MCP Server | Category | Stars | Auth | Security Audit | License |
|---|---|---|---|---|---|
| GitHub MCP | Dev / Code | 🔥 High | OAuth | ✅ (GitHub) | MIT |
| Playwright MCP | Browser | 🔥 High | None (local) | Apache-2.0 | |
| Filesystem MCP | Files | 🔥 High | None (local) | MIT | |
| Brave Search MCP | Search | High | API key | ❌ | MIT |
| Slack MCP | Comms | Medium | OAuth | ❌ | MIT |
| Notion MCP | Notes | Medium | OAuth | ❌ | MIT |
| PostgreSQL MCP | Database | Medium | Conn string | MIT | |
| Google Maps MCP | Location | Medium | API key | ❌ | MIT |
Use mcp-scan (Invariant Labs) to audit any MCP server before production deployment.
| Platform | Open Source | MCP Support | A2A Support | Self-host | Compliance | Best For |
|---|---|---|---|---|---|---|
| Microsoft Agent Framework | ✅ | ✅ | SOC2, ISO 27001 | Azure-native enterprise | ||
| Salesforce Agentforce | ❌ | ❌ | ❌ | SOC2, GDPR | Salesforce CRM orgs | |
| SAP Joule | ❌ | ❌ | ❌ | SOC2, ISO | SAP ERP environments | |
| Google Gemini Enterprise | ❌ | ✅ | ✅ | ❌ (cloud) | SOC2, FedRAMP | Google Workspace orgs |
| IBM watsonx | ✅ | ✅ (on-prem) | FedRAMP, HIPAA | Regulated / on-prem enterprise | ||
| ServiceNow AI Agents | ❌ | ✅ | ❌ | SOC2 | IT service management | |
| Dify Enterprise | ✅ (CE) | ✅ | ✅ | ✅ | SOC2 (cloud) | Multi-model, low-code agent platform |
MTEB = Massive Text Embedding Benchmark leaderboard score (EN, 2026-05 approx).
| Model | Dims | Context | Local | API | Languages | License | MTEB ≈ |
|---|---|---|---|---|---|---|---|
| OpenAI text-embedding-3-large | 3072 | 8K | ❌ | ✅ | Multi | Proprietary | ~64 |
| Cohere embed-v4 | 1024 | 512 | ❌ | ✅ | Multi | Proprietary | ~66 |
| Gemini text-embedding-004 | 768 | 2K | ❌ | ✅ | Multi | Proprietary | ~63 |
| BGE-M3 | 1024 | 8K | ✅ | ❌ | Multi | MIT | ~65 |
| Jina-embeddings-v3 | 1024 | 8K | ✅ | ✅ | Multi | CC-BY-NC | ~65 |
| Nomic-embed-text-v2 | 768 | 8K | ✅ | ✅ | Multi | Apache-2.0 | ~62 |
| Voyage-3 | 1024 | 32K | ❌ | ✅ | Multi | Proprietary | ~67 |
| Tool | MCP Scan | Prompt Injection Defense | Audit Logs | Self-host | License |
|---|---|---|---|---|---|
| mcp-scan | ⭐ native | ✅ | ❌ | ✅ | MIT |
| Lakera Guard | ❌ | ⭐⭐⭐⭐⭐ | ✅ | ❌ | Proprietary |
| Zenity | ✅ | ✅ | ✅ | ❌ | Proprietary |
| Prompt Armor | ❌ | ⭐⭐⭐⭐☆ | ✅ | ❌ | Proprietary |
| Azure AI Content Safety | ❌ | ✅ | ✅ | ❌ (Azure) | Proprietary |
| Rebuff | ❌ | ⭐⭐⭐⭐☆ | ❌ | ✅ | MIT |
| Tool | OS | Vision | Local | API | Open Source | Best For |
|---|---|---|---|---|---|---|
| Claude Desktop Intelligence | Mac / Linux | ✅ | ❌ | ✅ | ❌ | Best all-round screen agent |
| UFO | Windows | ✅ | ✅ | Optional | ✅ | Windows native automation |
| OSWorld | Mac/Win/Linux | ✅ | ✅ | Optional | ✅ | Cross-platform benchmark + agent |
| Nemo Agent | Linux | ✅ | ✅ | Optional | ✅ | Open desktop control |
| Screenpipe | Mac / Linux | ✅ | ✅ | ❌ | ✅ | Screen + audio memory, privacy-first |
| Claude Computer Use (API) | Any (via API) | ✅ | ❌ | ✅ | ❌ | API-driven desktop control |
| Platform | Type | Open Source | SDK | Simulation | Best For |
|---|---|---|---|---|---|
| NVIDIA Isaac GR00T N1.5 | Humanoid foundation | ✅ | ✅ (Isaac Sim) | Universal humanoid robot foundation model | |
| ROS 2 Jazzy | Robot OS | ✅ | ✅ | ✅ (Gazebo) | Standard robot middleware |
| Gemini Robotics | Manipulation | ❌ | ✅ | Vision + language + dexterous manipulation | |
| Unitree SDK2 | Quadruped / Humanoid | ✅ | ✅ | Go2, H1, G1 robot dev | |
| Boston Dynamics API | Quadruped | ❌ | ✅ | ❌ | Spot industrial deployment |
| Genesis Sim | Simulation | ✅ | ✅ | ⭐ native | Ultra-fast physics sim for embodied AI |
Chinese language capability benchmarks are approximate. API prices in USD/1M tokens, May 2026.
| Model | Provider | Context | Chinese Bench≈ | Coding | Open Weight | Input $/1M |
|---|---|---|---|---|---|---|
| Qwen3 235B A22B | Alibaba | 131K | Top | ⭐⭐⭐⭐⭐ | ✅ Apache-2.0 | ~$0.29 |
| DeepSeek V3.2 | DeepSeek | 128K | Very high | ⭐⭐⭐⭐⭐ | ✅ MIT | $0.14 |
| Kimi K2.6 | Moonshot AI | 262K | High | ⭐⭐⭐⭐☆ | ❌ | ~$0.60 |
| GLM-5.1 | Zhipu AI | 128K | High | ⭐⭐⭐⭐☆ | ~$0.50 | |
| Hunyuan Pro | Tencent | 256K | High | ⭐⭐⭐⭐☆ | ❌ | ~$0.45 |
| Doubao Pro 256K | ByteDance | 256K | High | ⭐⭐⭐☆☆ | ❌ | ~$0.80 |
| ERNIE 5 | Baidu | 128K | High | ⭐⭐⭐☆☆ | ❌ | ~$0.70 |
| Framework | Multi-Agent | Streaming | MCP | A2A | Stars≈ | License |
|---|---|---|---|---|---|---|
| Mastra | ✅ | ✅ | ✅ | ✅ | ~12K | Elastic-2.0 |
| Vercel AI SDK | ✅ | ✅ | ❌ | ~12K | Apache-2.0 | |
| LangChain.js | ✅ | ✅ | ✅ | ❌ | ~14K | MIT |
| Genkit | ✅ | ✅ | ✅ | ❌ | ~3K | Apache-2.0 |
| OpenAI Agents SDK (Node) | ✅ | ✅ | ✅ | ❌ | ~2K | MIT |
| Rivet | ✅ | ✅ | ❌ | ~4K | MIT | |
| Flowise | ✅ | ✅ | ✅ | ❌ | ~35K | Apache-2.0 |
| Category | Example Tools | Best For | Abstraction Level | Flexibility |
|---|---|---|---|---|
| Orchestration Platform | Dify, n8n, Flowise, Langflow | Non-engineers, fast deployment | Very high | Low-medium |
| Agent Framework | LangGraph, CrewAI, Mastra, OpenAI Agents SDK | Engineers building custom agents | Medium | High |
| Agent IDE / Coding Agent | Claude Code, Cursor, Cline, Devin | Developers pair-programming | Low | Very high |
| Low-code Builder | Voiceflow, Botpress, Microsoft Copilot Studio | Business / product teams | Very high | Low |
| AI-native App Platform | Vertex AI Agent Builder, Azure AI Foundry | Enterprise with managed infra | High | Medium |
| Framework | iOS | Android | Local LLM | On-device Inference | License | Best For |
|---|---|---|---|---|---|---|
| MLX | ✅ | ❌ | ✅ | ⭐ Apple Silicon | MIT | Apple-native, fast LLM on Mac/iPhone |
| llama.cpp (mobile) | ✅ | ✅ | ✅ | ✅ (arm/x86) | MIT | Universal local LLM, all platforms |
| MediaPipe | ✅ | ✅ | ✅ | ✅ | Apache-2.0 | On-device ML tasks (vision, NLP) |
| Core ML | ✅ | ❌ | ✅ | ✅ (ANE) | Apple SDK | iOS/macOS native model inference |
| Google AI Edge | ✅ | ✅ | ✅ | ✅ | Apache-2.0 | LiteRT + Gemma Nano on-device |
| Ollama (mobile proxy) | ✅ | ❌ (server-side) | MIT | Run Ollama server, hit from mobile | ||
| Qualcomm AI Hub | ❌ | ✅ | ✅ | ✅ (Snapdragon NPU) | SDK | Snapdragon-optimized model deployment |
All Compare tables data: 2026-05-20. Send PRs with sources when figures change.
50+ curated scenarios matching your goal to the right tool or stack. Updated weekly.
I want to build a coding agent for my startup (lowest cost, high quality) → Claude Code (CLI) + E2B sandbox + Langfuse observability. SWE-bench 80.9%. ~$200/mo at moderate usage.
I want an enterprise coding agent with security controls
- GitHub Copilot Enterprise — Deep GitHub integration, IP indemnity, SSO/SAML, SOC 2. → best if already on GitHub Enterprise
- Cursor Business — Privacy mode, code never leaves your infra, admin dashboard. → best for teams needing IDE-first UX
- Devin 3.0 (Cognition) — Fully autonomous PR-to-merge with re-planning. → best for hands-off long-horizon tasks
I want an open-source self-hosted coding agent (no vendor lock-in)
- OpenHands (All-Hands-AI) — MIT, competitive SWE-bench, BYO model. → best if you need full control
- Cline (VS Code ext) — BYO key, large community, free. → best for VS Code users
- Aider — Git-aware CLI refactoring, excellent polyglot support. → best for terminal-based git workflows
I want a browser automation / web scraping agent
- Browser Use — 92K stars, vision + DOM, MIT. → best for general web automation
- Stagehand (Browserbase) — Typed
act/extract/observeAPI, structured output. → best for reliability-critical scraping - Skyvern — Vision-first, handles dynamic pages without CSS selectors. → best for changing / heavily JS-rendered sites
I want a document processing / PDF analysis agent → LlamaIndex (document pipeline) + Gemini 2.5 Pro (2M context) or Claude Opus 4.7 (200K, best reasoning) + Unstructured.io for ingestion. For local: Ollama + Qwen3.6-27B.
I want a customer service / support agent
- Dify — No-code LLM workflow builder, self-hostable, RAG built-in. → best for non-technical teams
- LangGraph + Zendesk MCP — Stateful workflows, ticket resolution loop. → best for engineering-led teams
- Salesforce Agentforce — CRM-native, works within existing Salesforce data. → best for Salesforce-first orgs
I want a research / deep-research agent → Perplexity Deep Research (managed) or OpenDevin / OpenHands + Tavily Search + Claude Opus 4.7. For local: Khoj (self-hosted). Expect multi-minute runs and $1–5 per deep report at cloud rates.
I want a data analysis / BI agent
- Julius AI / Code Interpreter (ChatGPT) — Managed, no setup. → best for analysts without eng support
- LangChain + Pandas Agent + Langfuse — Fully custom, code-gen for queries. → best for eng teams with custom data
- Metabase AI / Tableau Pulse — Embedded BI copilot. → best inside existing BI stack
I want a computer use / desktop automation agent
- Claude Desktop Intelligence (Anthropic) — Screen-aware, controls any GUI app. → best all-around for macOS/Linux
- UFO (Microsoft, open-source) — Windows-native, Win32 + UI Automation APIs. → best for Windows automation
- Screenpipe — Continuous screen + audio recording + local LLM inference. → best for local privacy-first
I want a voice / conversational agent
- Gemini Live API — Real-time voice, <300ms latency, Google cloud. → best for Google ecosystem
- OpenAI Realtime API (GPT-4o Audio) — Native voice with tool calling. → best for OpenAI ecosystem
- LiveKit + Whisper + ElevenLabs v3 — Self-hostable voice pipeline. → best for custom, brand-specific voice
I want a multi-agent orchestration system
- LangGraph — Stateful graph workflows, best Python production option. → best for complex state machines
- OpenAI Swarm / Agents SDK — Lightweight handoffs, OpenAI-native. → best for simple OpenAI agent networks
- Google ADK — Hierarchical agent coordination, Gemini-native. → best for Google/Vertex stack
- Mastra (TypeScript) — Type-safe workflows, TS-first teams. → best for TypeScript stacks
I want a personal AI assistant (self-hosted)
- OpenClaw — Multi-channel (Telegram/Discord/WhatsApp), memory, cron, MCP support, full local LLM option. → best all-in-one self-hosted
- Khoj — Search + research + calendar, open-source, self-host. → best for knowledge workers
- Jan.ai / LM Studio — GUI-first local model runners. → best for non-technical local LLM
I want a personal AI assistant (managed / easy setup)
- Claude.ai (Pro) — Projects, memory, MCP tools, best reasoning. → best for power users
- Perplexity Pro — Search-first, cites sources. → best for research-heavy use
- ChatGPT Plus — Code interpreter, image gen, broad tools. → best for general-purpose
I want to build a RAG application → LlamaIndex (orchestration) + Qdrant (vector DB) + Cohere embed-v4 (embeddings) + BGE reranker (reranking). Managed alternative: Ragie or Cognee. Production telemetry: Langfuse.
I want a financial analysis agent → LangChain + yfinance / Alpha Vantage MCP + Claude Sonnet 4.6 (Excel/table reasoning) + Langfuse. Avoid: don’t use hallucination-prone models for numbers — always validate with structured output + code execution.
I want a legal document agent → Claude Opus 4.7 (200K context, best contract analysis) + LlamaIndex (ingestion) + pgvector (self-hosted vector). Important: always have a human-in-the-loop for final legal decisions.
I want an education / tutoring agent
- Khanmigo (Khan Academy) — Purpose-built for K–12, COPPA-compliant. → best for K–12 safe deployment
- Custom with GPT-4o + LangGraph state machine + spaced repetition logic. → best for HEd or corporate training
I want a creative writing assistant → Claude Opus 4.7 (best prose quality) or Gemini 2.5 Pro (long-form, 2M context) + Notion / Obsidian MCP for knowledge base. For structured fiction: Sudowrite (managed).
I want an IoT / physical AI agent → ROS 2 (robot OS) + NVIDIA Isaac GR00T (humanoid foundation model) + Genesis Sim (simulation). For home automation: Home Assistant + custom LLM backend.
I want a game playing / simulation agent → PettingZoo (multi-agent RL env) + Gymnasium + GPT-4o Vision for game-state parsing. For LLM-in-the-loop games: Concordia (Google DeepMind).
I want a security scanning / vulnerability agent → Semgrep (static analysis) + Claude Sonnet 4.6 (explain + triage findings) + mcp-scan (MCP server audit). See also: Agent Security table.
I want a healthcare AI tool (non-clinical / administrative) → Claude Opus 4.7 + RAG on medical knowledge base + strict output validation. Always: disclose AI, human oversight for any clinical decision, check HIPAA/GDPR compliance. Never automate clinical diagnosis.
I want a code review / PR security agent → CodeRabbit (managed, instant PR reviews) or Claude Code in CI + Semgrep + custom rules. For enterprise: Copilot Code Review (GitHub).
I want a social media / content creation agent → n8n (workflow automation) + Claude Sonnet 4.6 (drafting) + gpt-image-2 (images) + Buffer/Later MCP (scheduling). Self-hosted option: all via n8n + Ollama.
I want a translation / localization agent → DeepL API (best quality for EU languages) or Claude Sonnet 4.6 (nuanced context-aware) + Weblate (open-source TMS). For Chinese: Qwen3 235B + human review loop.
I need the smartest model for complex multi-step reasoning
- Claude Opus 4.7 (/think xhigh) — Best in class for math, logic, long-horizon reasoning. $5/$25 per M tokens.
- Gemini 2.5 Pro — 2M context, strong multimodal, competitive pricing. $1.25/$10 per M tokens.
- GPT-4o — Broadly capable, strong tool-use ecosystem. $2.50/$10 per M tokens.
I need the fastest + cheapest model for simple, high-volume tasks
- Gemini 2.5 Flash-Lite — $0.10/$0.40 per M tokens, 1M context.
- DeepSeek V3.2 — $0.14/$0.28 per M tokens, surprisingly strong quality.
- Claude Haiku 4.5 — $1/$5 per M tokens, Anthropic ecosystem integration.
- GPT-4o-mini — $0.15/$0.60 per M tokens, broad OpenAI tooling.
I need the best Chinese language support
- Qwen3 235B A22B (Alibaba) — Strongest Chinese benchmark, MoE architecture, $0.29/$1.15 per M. → cloud API
- Kimi K2.6 (Moonshot) — 262K context, great Chinese instruction-following. → both API + local
- DeepSeek V3.2 — Open weights, excellent Chinese coding. → self-host or API
- GLM-5.1 (Zhipu AI) — Strong long-context, Chinese-first. → API or local
I need the best local/offline model with ~16GB VRAM
- Qwen3.6-27B Q4_K_M — ~17GB VRAM, ~23 tok/s, excellent coding + Chinese. Best overall 16GB pick.
- Gemma 4 27B (Google) — Strong reasoning, multilingual, Apache-2.0.
- Phi-4 14B (Microsoft) — ~9GB VRAM (Q4), punches above its weight on coding.
- Mistral Small 4 24B — Fast, multilingual, well-rounded.
I need the best local/offline model with 40GB+ VRAM
- Llama 3.3 70B Q4_K_M — ~42GB VRAM, strong English + coding, Meta Apache-2.0.
- DeepSeek V3-671B Q2 — MoE, only 40GB active params in Q2 but requires 2×A100 setup.
- Qwen3 235B A22B Q2 — MoE flagship, 40-48GB VRAM at Q2, best local quality.
I need the best coding capability → Claude Sonnet 4.6 (SWE-bench 80.9% via Claude Code) for agentic coding. GPT-4o for code generation + explanation. DeepSeek V3.2 for open-weight coding. For IDE use: Cursor (Claude backend) or Cline.
I need multimodal understanding (vision + text)
- Gemini 2.5 Pro — Native vision, PDF, audio, video understanding. 2M context.
- GPT-4o — Mature vision API, strong diagram/chart understanding.
- Claude Opus 4.7 — Best for complex document image reasoning.
- Qwen3-VL 72B — Best open-weight multimodal, self-hostable.
I need very long context (500K+ tokens)
- Gemini 2.5 Pro — 2M context window, best for entire codebase or book analysis.
- Gemini 2.5 Flash — 1M context, cheaper option.
- Kimi K2.6 — 262K context, strong Chinese.
- Claude Opus 4.7 — 200K context, best quality within that window.
I need real-time voice / audio model
- Gemini Live API — <300ms latency, native Google cloud.
- OpenAI Realtime API — GPT-4o Audio, native function calling during voice.
- ElevenLabs v3 — Best TTS quality, 32+ languages.
- Voxtral (Mistral) — Open-weight audio model, transcription + understanding.
I need the best image generation
- gpt-image-2 (OpenAI) — Best instruction-following, 2K/4K, $0.04–0.17/image.
- Flux 2 Pro (Black Forest Labs) — Photorealistic, fast, API available.
- Midjourney V8 — Best artistic quality, no API (web only).
- Stable Diffusion 3.5 — Open weights, local deployment, Apache-2.0.
I need the best video generation
- Veo 3.1 (Google) — High fidelity, physics-aware, best quality 2026.
- Kling VIDEO 3.0 (Kuaishou) — Leading post-Sora, strong cinematic style.
- Runway Gen-4 — Precise motion control, professional use.
- Seedance 2.0 (ByteDance) — Fast, cost-effective, strong for social media.
I need an open-weight model (MIT or Apache license)
- Llama 3.3 70B (Meta, Apache-2.0) — Best English open-weight.
- Qwen3 235B A22B (Alibaba, Apache-2.0) — Best Chinese + coding open-weight.
- Mistral Small 4 (Mistral AI, Apache-2.0) — Fast, multilingual.
- DeepSeek V3.2 (MIT) — Best open-weight coding.
- Gemma 4 27B (Google, Apache-2.0) — Strong multilingual reasoning.
I want to run everything locally (privacy-first, zero cloud) → Ollama (model runner) + Open WebUI (UI) + Qdrant (local vector DB) + Qwen3.6-27B (16GB VRAM) or Llama 3.3 70B (40GB+). Full stack: OpenClaw (local mode) or AnythingLLM. No data leaves your machine.
I want to minimize API costs (budget <$50/month) → Use DeepSeek V3.2 ($0.14/$0.28) or Gemini 2.5 Flash ($0.30/$2.50) for high-volume. Reserve Claude Sonnet 4.6 for complex tasks only. Use Anthropic Batch API (50% off) for non-real-time work. Cache aggressively (prompt caching saves ~70% on repeated context).
I want to scale to enterprise (millions of requests/month) → Google Vertex AI (managed Gemini, auto-scale, SLAs) or Azure OpenAI (GPT-4o, compliance, dedicated capacity). Add LangFuse for observability. For routing: PortKey or LiteLLM as unified gateway.
I want to deploy in an air-gapped / regulated environment → Ollama (local inference) + Qwen3 235B A22B / Llama 3.3 70B (open weights) + Qdrant (local vector DB). For enterprise needs: IBM watsonx (on-prem) or Azure Government (FedRAMP). Compliance certifications matter more than model quality here.
I want to build for edge / mobile deployment → Core ML (Apple, iOS/macOS) + Phi-4 14B or Gemma 4 2B (quantized). For Android: MediaPipe + Gemma 4. For cross-platform: llama.cpp + GGUF models. Check Qualcomm AI Hub for Snapdragon-optimized models.
I need multi-cloud / want to avoid vendor lock-in → LiteLLM (unified API proxy for 100+ providers) + LangGraph (framework-agnostic) + provider-agnostic embeddings (BGE-M3 open-weight). Store all state in self-hosted Qdrant or Postgres+pgvector.
I want to self-host everything (no managed services at all) → Ollama (models) + Qdrant (vectors) + Langfuse (observability, self-host Docker) + n8n (workflows) + OpenClaw (agent runtime). GPU recommendations: 2× RTX 4090 (24GB each = 48GB) for 70B models; single RTX 4090 for 27B.
I want to evaluate agent output quality → DeepEval (rich metric suite: faithfulness, relevancy, hallucination) + Langfuse (traces + evals). For custom: LangSmith (tight LangChain integration). Open-source: Agenta (self-host).
I want to debug why my agent is failing → Langfuse (trace each tool call + LLM call with timing) + Arize Phoenix (root cause analysis). Enable verbose logging in your framework (LangGraph / CrewAI all support it). Use LLM-as-judge to flag low-confidence steps.
I want to monitor production agents in real-time → Langfuse (OpenTelemetry-native, self-host) or Helicone (zero-latency proxy logging). Set up cost + latency + error-rate dashboards. Alert on error spikes via Grafana or Datadog integration.
I want to A/B test different models or prompts → Braintrust (experiment tracking, online/offline eval) or LangSmith (prompt playground + evals). For open-source: Agenta + Langfuse experiments feature.
I want to benchmark models on my specific tasks → LMSYS Chatbot Arena custom eval + Evals by OpenAI (open framework) + DeepEval (custom metric). Run your own eval harness: prepare 50–200 golden examples, measure precision/recall on your actual task.
I want to evaluate MCP server security → mcp-scan (Invariant Labs) — Detects prompt injection, tool poisoning, shadow tools in MCP servers. Run before production deployment. See also: Agent Security.
I want to build within the OpenAI ecosystem → OpenAI Agents SDK + GPT-4o + E2B sandbox + LangSmith eval. Benefits: widest third-party tooling, most community examples. Cost: premium pricing.
I want to build within the Anthropic Claude ecosystem → Claude Code (agentic IDE/CLI) + Claude Sonnet 4.6 / Opus 4.7 + MCP protocol (Claude Desktop) + Langfuse (observability). Benefits: best coding quality, MCP is Claude-native. Cost: ~mid-tier.
I want to build within the Google Gemini ecosystem → Google ADK (Agent Development Kit) + Gemini 2.5 Pro/Flash + Vertex AI (deployment) + Vertex AI Eval + AlloyDB / BigQuery (data). Benefits: 2M context, multimodal, cheap Flash tier. Cost: scales well.
I want to build for the Chinese market (domestic cloud / regulation) → Qwen3 235B (Alibaba Cloud DashScope) + Baidu ERNIE 5 or Kimi K2.6 (Moonshot) as fallback + Alibaba Cloud PAI (deployment). All data stays within China borders. ICP-compliant.
I want a TypeScript-first stack → Mastra (TS workflows, MCP, A2A, Elastic-2.0) + Vercel AI SDK (streaming, RSC-friendly) + Qdrant JS client + Langfuse JS SDK. Alternative: LangChain.js + LangGraph.js.
I want an open-source-only stack (zero proprietary) → Ollama + Llama 3.3 70B or DeepSeek V3.2 (model) + LangGraph (MIT, framework) + Qdrant (Apache-2.0, vector DB) + Langfuse (MIT, observability) + E2B (Apache-2.0, sandbox). Fully self-hosted, no vendor dependencies.
8 battle-tested multi-tool setups for common use cases. Copy and adapt.
| # | Recipe Name | Stack | Best For |
|---|---|---|---|
| 1 | Lean Coding Agent | Claude Code + E2B + Langfuse | Solo dev / startup, best quality per dollar |
| 2 | Open-Source SWE Agent | OpenHands + Ollama + Qwen3.6-27B + Qdrant | Full local, privacy-first coding |
| 3 | Enterprise RAG | LlamaIndex + Qdrant + Cohere embed-v4 + Langfuse + Claude Sonnet 4.6 | Production Q&A on internal docs |
| 4 | Voice Assistant Pipeline | LiveKit + Whisper (STT) + Claude Sonnet 4.6 + ElevenLabs v3 (TTS) | Custom branded voice AI |
| 5 | Browser Automation | Browser Use + Stagehand + Claude Sonnet 4.6 + Langfuse | Reliable web scraping + form filling |
| 6 | Local-Only Privacy Stack | Ollama + Qwen3.6-27B + Open WebUI + Qdrant + n8n | Zero cloud, air-gapped use |
| 7 | TypeScript Agent | Mastra + Vercel AI SDK + Gemini 2.5 Flash + Qdrant + Langfuse | TS-first production SaaS |
| 8 | Chinese Market Stack | Qwen3 235B API + RAGFlow + Milvus + Langfuse | Domestic China deployment, ICP-compliant |
Avoid common mistakes. These recommendations are based on observed production failures in 2026.
| ❌ Don’t Use | ❌ For This | ✅ Use Instead | Why |
|---|---|---|---|
| LangChain v0.x | New production agents | LangGraph | LangChain chains are deprecated; LangGraph has proper state management |
| AutoGPT (legacy) | Production workloads | OpenHands or LangGraph | AutoGPT’s 2023 architecture has poor reliability at scale |
| GPT-3.5-Turbo | Complex reasoning | Gemini 2.5 Flash or Claude Haiku 4.5 | GPT-3.5 deprecated, same cost range as modern models |
| Pinecone Starter | Self-hosted / cost-sensitive | Qdrant or pgvector | Pinecone Starter tier removed 2025; OSS alternatives are cheaper |
| LLM for real-time stock trading | Financial execution | Deterministic rule engine | LLMs hallucinate numbers; catastrophic for live trading |
| ChatGPT Plus | Production API workflows | OpenAI API direct | ChatGPT Plus is consumer; no SLA, no rate control, no observability |
| Hugging Face Inference API (free) | Production load | Modal or self-hosted Ollama | Free tier has extreme rate limits, cold starts >30s |
| Autonomous agents without human-in-loop | Medical / legal decisions | Any model + mandatory human review step | No current model is reliable enough for high-stakes autonomous decisions |
| PDF viewer MCP for sensitive docs | Compliance environments | Local LlamaIndex + on-prem Qdrant | Sending sensitive PDFs to cloud MCP servers violates data residency rules |
| CrewAI for single-agent tasks | Simple one-shot tasks | Direct API call | CrewAI’s multi-agent overhead adds latency and cost when only one agent is needed |
| Midjourney | Programmatic / API image gen | gpt-image-2 or Flux 2 Pro API | Midjourney has no public API; requires Discord bot workaround |
| GPT-4o Vision for OCR | High-accuracy document OCR | Tesseract 5 + Azure Document Intelligence | LLM OCR has ~2-5% error rate; dedicated OCR is 10x cheaper and more accurate |
| Sora | Any video generation (2026) | Kling VIDEO 3.0 or Veo 3.1 | Sora discontinued by OpenAI, April 2026 |
| Vector DB without reranking | High-precision RAG | Vector DB + BGE reranker or Cohere Rerank | Raw vector search recall is ~70%; reranking brings it to ~90%+ |
| Gemini 2.5 Flash-Lite | Complex legal/medical reasoning | Claude Opus 4.7 or Gemini 2.5 Pro | Flash-Lite optimized for speed, not accuracy on high-stakes tasks |
Standout projects and developments that shaped the AI agent landscape in 2026.
- Model Context Protocol (MCP) - Became the universal standard for agent-tool interoperability. Donated to Linux Foundation.
- A2A Protocol - 🆕 Google's Agent-to-Agent protocol enabled cross-framework agent collaboration with 150+ partners.
- Claude Code - Anthropic's agentic coding tool became the go-to terminal-based coding agent with 80.9% SWE-bench.
- Kiro - 🆕 AWS launched an autonomous coding agent capable of managing 10 simultaneous development tasks.
- Devin 3.0 - 🆕 Evolved to include dynamic re-planning, self-healing code, and legacy codebase migration.
- Microsoft Agent Framework - 🆕 AutoGen + Semantic Kernel merged into unified enterprise agent platform.
- OpenAI Codex CLI - OpenAI entered the agentic coding space with an open-source terminal agent.
- Browser Use - Breakthrough in making AI agents interact with the web naturally.
- Claude Computer Use - 🆕 Desktop Intelligence let Claude control any software by seeing the screen.
- Manus AI - 🆕 General-purpose autonomous agent that can handle research, coding, and complex workflows.
- OpenHands - Open-source AI software engineering platform gained massive adoption.
- Dify - Low-code LLM agent platform reached mainstream adoption.
- Cline - VS Code autonomous coding agent with rapid community growth.
- Mem0 - Memory layer for AI became essential component of agent architectures.
- Sora Discontinuation - 🆕 OpenAI shut down Sora (April 2026), signaling strategic pivot to enterprise AI and reasoning.
- Kling VIDEO 3.0 - 🆕 Kuaishou's video generation became the leading AI video platform post-Sora.
- Cohere + Aleph Alpha Merger - 🆕 April 24, 2026. Canadian AI firm Cohere merged with Germany's Aleph Alpha at ~$20B valuation. $600M Series E from Schwarz Group. Creates transatlantic "sovereign AI" powerhouse with dual HQ in Toronto and Germany.
- ScienceOne 100 / 磐石100 - 🆕 April 28-29, 2026. Chinese Academy of Sciences launches specialized scientific AI system. 2,000+ research tools, 50+ CAS institutes. Flagship-level scientific reasoning and agent capabilities.
- Google Invests $40B in Anthropic - 🆕 April 2026. $10B initial + up to $30B contingent on performance milestones. Includes 5GW compute capacity over 5 years. Largest AI partnership investment to date.
- OpenAI Deployment Company (DeployCo) - 🆕 May 11, 2026. OpenAI spins out a $4B+ enterprise-deployment services unit (TPG / Bain Capital / Brookfield / Advent / Goldman Sachs / SoftBank + Bain & Company / Capgemini / McKinsey) and absorbs the Tomoro consulting acquisition. Signals the AI vendor race shifting toward services + Forward Deployed Engineers.
- Anthropic ↔ SpaceX Colossus 1 - 🆕 May 6, 2026. Anthropic takes all available capacity on the 300+ MW / 220K-GPU Colossus 1 Memphis cluster. SpaceX repositions itself as an AI infrastructure provider after its xAI acquisition; Anthropic doubles Claude Code rate limits for paid plans.
- DeepSeek $4B state-backed round - 🆕 May 16, 2026. China's National AI Industry Investment Fund + Big Fund III + Tencent close in on a ~$4B first external round for DeepSeek at a ~$50B valuation — first known LLM investment from Big Fund III, signalling Beijing's bet on efficient open-weight frontier models and domestic silicon.
- Pope Leo XIV → Vatican AI Commission - 🆕 May 16, 2026. Pope Leo XIV publishes the rescriptum establishing an inter-dicasterial Vatican commission on artificial intelligence (Dicastery for Integral Human Development coordinating, with Doctrine of the Faith, Culture & Education, Communication, Pontifical Academies for Life / Sciences / Social Sciences). One-year renewable mandate. First AI-focused encyclical expected to follow.
- Google I/O 2026 — Gemini 3.5 + Omni + Spark + AI Ultra - 🆕 May 19, 2026. Google's biggest agent-and-AGI keynote of the year: Gemini 3.5 Flash GA (default model), Gemini Omni world-model family, Gemini Spark 24/7 personal agent with ~30+ MCP-based tool integrations, and a new Google AI Ultra $100/mo tier. Pichai confirms Google now processes 3.2 quadrillion tokens / month.
- Alibaba Cloud Summit Hangzhou — Qwen 3.7-Max + Zhenwu M890 - 🆕 May 20, 2026. Alibaba unveils Qwen 3.7-Max (agentic-coding flagship for long-horizon missions), the T-Head Zhenwu M890 AI accelerator, and a full-stack AI infrastructure upgrade — China's most aggressive bid yet to position itself as the country's "AI factory."
- OpenAI Guaranteed Capacity (Compute Annual Pass) - 🆕 May 19, 2026. Long-term enterprise compute reservations (1/2/3-year terms) sold as a structured product — OpenAI's structural answer to Anthropic's Priority Tier and the wider supply crunch for frontier-model inference.
- Google Antigravity 2.0 + Microsoft RAMPART + xAI Grok Build - 🆕 May 14–22, 2026. Three structural agent-stack shifts in one week: Google's standalone multi-agent desktop + SDK at I/O 2026, Microsoft open-sourcing agentic-AI safety testing (RAMPART + Clarity), and xAI entering the CLI-agent race with Grok Build on
grok-code-fast-1. Major / Anthropic-Google-Microsoft / xAI all show up with agent platforms within the same 8-day window.
Key milestones and events in the AI landscape of 2026.
| Date | Event | Category |
|---|---|---|
| Jan 2026 | AMD Ryzen AI 400 Series unveiled at CES — mainstream AI PCs with 60 TOPS NPU | Hardware |
| Feb 2026 | Claude Opus 4.6 released — agent team capabilities | Models |
| Feb 2026 | Claude Sonnet 4.6 released — 1M token context, agentic search | Models |
| Feb 2026 | Gemini 3.1 Pro released | Models |
| Feb 2026 | Qwen3.5 Series launched — native multimodal, agentic coding | Models |
| Feb 2026 | Qwen3-Coder-Next released — 80B MoE coding agent model | Models |
| Feb 2026 | Cursor updated with 8 parallel agents | Tools |
| Feb 2026 | GitHub Copilot expanded agent mode and model access | Tools |
| Mar 2026 | Gemini 3.1 Flash Lite released to developers | Models |
| Mar 2026 | Mistral Forge launched — custom LLM training platform | Platforms |
| Mar 2026 | Microsoft Agent Framework (AutoGen + Semantic Kernel) targets GA | Frameworks |
| Mar 2026 | DeepSeek announces new model trained on latest Nvidia chips | Models |
| Mar 2026 | MCP 2026 roadmap published — focus on production scaling and governance | Protocols |
| Mar 2026 | Sora shutdown announced (app closes April 26) | Events |
| Apr 2, 2026 | Qwen3.6-Plus proprietary flagship launched by Alibaba | Models |
| Apr 3, 2026 | Microsoft AI Agent Governance Toolkit released (open-source) | Tools |
| Apr 6, 2026 | Microsoft Agent Framework officially announced (AutoGen + Semantic Kernel unified) | Frameworks |
| Apr 7, 2026 | GLM-5.1 open-sourced by Zhipu AI — 744B MoE, trained on Huawei Ascend | Models |
| Apr 8-9, 2026 | Meta Muse Spark released — first model from Meta Superintelligence Labs | Models |
| Apr 2026 | Claude Mythos Preview — gated cybersecurity research model (BenchLM 99, SWE-bench 93.9%) | Models |
| Apr 2026 | Sora app officially shuts down | Events |
| Apr 14, 2026 | Gemini Robotics ER-1.6 upgraded robotics AI with enhanced spatial reasoning | Robotics |
| Apr 15, 2026 | Qwen3.6-35B-A3B open-sourced (Apache 2.0) by Alibaba | Models |
| Apr 16, 2026 | Claude Opus 4.7 released — SWE-bench Verified 87.6%, /think xhigh reasoning |
Models |
| Apr 18, 2026 | Qwen3.6-Max-Preview launched — top Chinese model on coding benchmarks | Models |
| Apr 20-21, 2026 | Kimi K2.6 released by Moonshot AI — 1T MoE, 1,000-agent swarm | Models |
| Apr 22, 2026 | Qwen3.6-27B open-sourced by Alibaba — dense 27B multimodal | Models |
| Apr 23, 2026 | Tencent open-sources Hunyuan Hy3 Preview — 295B/21B MoE, 256K context | Models |
| Apr 23, 2026 | Claude Managed Agents Memory public beta — persistent cross-session agent memory | Tools |
| Apr 23, 2026 | GPT-5.5 released by OpenAI — major agentic coding and reasoning upgrade | Models |
| Apr 24, 2026 | DeepSeek V4 Pro & Flash released — 1.6T MoE, 1M context, MIT license | Models |
| Apr 24, 2026 | Cohere merges with Germany's Aleph Alpha at ~$20B valuation + $600M funding | Industry |
| Apr 27, 2026 | Alibaba Tianma AI image-to-video model enters beta | Models |
| Apr 27, 2026 | LangGraph v0.3.19 released; LangGraph Swarm prebuilt agents | Frameworks |
| Apr 28, 2026 | NVIDIA Nemotron 3 Nano Omni released — 30B multimodal (text/image/audio/video) | Models |
| Apr 28-29, 2026 | CAS ScienceOne 100 / 磐石100 launched — scientific AI for 50+ research institutes | Models |
| Apr 30, 2026 | OpenAI begins rollout of GPT-5.5-Cyber via the Trusted Access for Cyber (TAC) program | Models |
| Apr 30, 2026 | OpenAI publishes "A practical guide to building agents" | Resources |
| May 1, 2026 | Anthropic launches Claude Security in public beta — Opus 4.7-powered codebase vulnerability scanner with auto-patches | Tools |
| May 2026 | Macquarie Bank reports 130,000 hours saved in 7 months using Gemini Enterprise | Industry |
| May 2026 | Google starts rolling Gemini into eligible vehicles, replacing Google Assistant (English-first, U.S. rollout) | Industry |
| May 4, 2026 | Google retires Project Mariner; browser-agent tech folded into Gemini Agent | Tools |
| May 4, 2026 | Anthropic + Goldman Sachs + Blackstone announce $1.5B Claude deployment JV to embed Anthropic engineers in mid-market Wall Street firms | Industry |
| May 5, 2026 | OpenAI rolls out GPT-5.5 Instant as the new default ChatGPT model — efficiency-first upgrade, hallucination rate down ~50% | Models |
| May 5, 2026 | Anthropic launches Claude Finance Agents — 10 specialised agents for pitchbooks, KYC, month-end close, available as Claude Cowork plugins / Claude Code skills / Managed-Agents cookbooks | Tools |
| May 5, 2026 | OpenAI ↔ PwC partnership announced for financial-services agents (forecasting, payments) | Industry |
| May 7, 2026 | Google preparing Agent Mode for Flow (Veo-based AI filmmaking) — automated video production pipeline | Tools |
| May 8, 2026 | OpenAI launches GPT-Realtime-2 / Realtime-Translate / Realtime-Whisper — voice agents, live translation, real-time transcription | Models |
| May 9, 2026 | OpenAI rolls out Workspace Agents in ChatGPT Enterprise — repeatable workflow automation across connected apps | Tools |
| May 11–13, 2026 | Cursor 3.4 + SDK — Microsoft Teams integration, parallel-agent plan execution, multi-repo / Dockerfile dev environments, async sub-agents (/multitask), Vulnerability Scanner, granular model controls; Cursor SDK ships v2.5 security patch |
Tools |
| May 12, 2026 | OpenAI Daybreak — cyber-defense platform bundling GPT-5.5 + GPT-5.5-Cyber + Trusted Access for Cyber for AI-powered vuln detection / patch validation; EU preview to governments and security vendors | Tools |
| May 12, 2026 | Gemini Intelligence revealed at Android Show: I/O Edition — proactive agentic AI across Googlebooks, Wear OS, Android Auto, Android XR; first on Samsung Galaxy + Pixel | Industry |
| May 12, 2026 | Vapi raises $50M Series B after crossing 1B platform calls; Squads v2 + Composer + Simulations + Soniox transcriber GA | Industry |
| May 13, 2026 | Figure 04 design finalized; component deliveries underway. Helix VLA-powered, follows F.03 home-focused build | Robotics |
| May 14, 2026 | Claude Code v2.1.141 — /goal cross-turn completion conditions, agent view, plugin loading from .zip / URL, Ctrl+R global history search, enterprise feedback surveys |
Tools |
| May 14, 2026 | Codex on Mobile (preview) — ChatGPT iOS/Android can remote-control the macOS Codex app; OpenAI also issues TanStack supply-chain security patch | Tools |
| May 14, 2026 | Gemini Spark pre-I/O leak — upcoming branded agent capability inside the Gemini app for autonomous multi-step processes | Tools |
| May 14, 2026 | OpenClaw v2026.5.12 shipped — native model identity injected into system prompt, isolated Telegram polling worker, MEMORY.md auto-compaction, protected config paths for owner/exec approvals | Tools |
| May 11, 2026 | OpenAI Deployment Company launched — $4B+ enterprise services unit with TPG / Bain Capital / Brookfield + Bain & Company / Capgemini / McKinsey; Tomoro consulting acquisition folded in | Industry |
| May 11-13, 2026 | SAP Sapphire 2026 Orlando — SAP Business AI Platform, Joule Studio 2.0, Autonomous Suite with 50+ Joule Assistants and 200+ agents; Joule Studio 2.0 GA from June 2026 | Industry |
| May 12, 2026 | Claude for Legal — 20+ MCP connectors (iManage, NetDocuments, DocuSign, LexisNexis, Westlaw, Harvey, Everlaw, Relativity…) + 12 practice-area plugins on Claude Cowork | Tools |
| May 12-15, 2026 | Visual Studio 2026 Insiders — Copilot Chat "Agent Mode" with guided Agent Skills authoring inside the IDE | Tools |
| May 13, 2026 | Claude for Small Business — 15 pre-built agentic workflows + connectors for QuickBooks / PayPal / HubSpot / Canva / DocuSign / Google Workspace / Microsoft 365; 10-city US workshop tour | Tools |
| May 13, 2026 | Cursor 3.4 cloud agent environments — multi-repo, Dockerfile-based config with build secrets, 70% faster cached layers, env version history, audit logs, scoped egress / secrets | Tools |
| May 13-16, 2026 | Figure Helix 02 live-stream — F.03 + Helix 02 stress-test on a package-sort line, ~22K in 8h, ~30K in 24h, ~88K over ~72h until mechanical failure | Robotics |
| May 14, 2026 | Anthropic ↔ Gates Foundation $200M partnership — 4-year grants + Claude credits + Anthropic engineering on global health, life sciences, education, agriculture | Industry |
| May 14, 2026 | Anthropic ↔ PwC alliance expansion — global Claude Code + Cowork rollout, 30,000 PwC professionals certified, joint Agentic Enterprise Center of Excellence | Industry |
| May 14, 2026 | Genkit Middleware — Google releases composable middleware for the open-source Genkit agent framework (TS / Go / Dart) | Frameworks |
| May 14, 2026 | Zyphra ZAYA1-8B-Diffusion-Preview — first MoE diffusion LM converted from an autoregressive LLM; first diffusion LM trained on AMD GPUs; up to 7.7× inference speedup | Models |
| May 16, 2026 | Pope Leo XIV establishes Vatican AI Commission — inter-dicasterial body to coordinate the Church's response to AI; first AI-focused encyclical expected next | Industry |
| May 16, 2026 | OpenAI ↔ Malta partnership — every Maltese resident 14+ gets free 1-year ChatGPT Plus after a 2-hour AI literacy course ("OpenAI for Countries") | Industry |
| May 16, 2026 | DeepSeek state-backed $4B raise at ~$50B valuation — National AI Industry Investment Fund + Big Fund III + Tencent close in on first external round | Industry |
| May 2026 | LangGraph v1.2 — per-node timeouts/error-recovery/graceful shutdown, DeltaChannel checkpoint optimisation, content-block streaming API v3 |
Frameworks |
| May 2026 | Grok 4.3 GA on Microsoft Foundry + Oracle OCI Generative AI; xAI flagship for agentic workloads | Models |
| May 1, 2026 | Microsoft Agent 365 GA — enterprise observability + governance + security for AI agents across environments; May adds SASE for agents + threat detection | Industry |
| May 8, 2026 | Code with Claude 2026 — Anthropic introduces Add-ins, Dreaming (scheduled memory review), Outcomes (rubric-driven generation), lead+sub-agent orchestration with shared filesystem audit | Tools |
| May 18, 2026 | OpenAI ↔ Dell Codex partnership — Codex extended to hybrid/on-prem enterprise environments via Dell Technologies; first major non-cloud Codex distribution | Industry |
| May 18, 2026 | Alibaba Qwen 3.7-Max-Preview / Plus-Preview — highest-ranked Chinese models on LM Arena in text + vision | Models |
| May 18, 2026 | Boston Dynamics Atlas 100-lb manipulation + Hyundai commits to 25K+ Atlas units across Hyundai/Kia plants starting 2028 (GA) | Robotics |
| May 18, 2026 | Figure F.03 vs human 8h sort challenge — human wins narrowly 12,924 vs 12,732 packages (2.79 vs 2.83 s/item) | Robotics |
| May 18, 2026 | Anthropic briefs FSB on Claude Mythos — first frontier-lab briefing to a G20 financial-stability regulator on offensive-cyber model capabilities | Industry |
| May 18, 2026 | ChatGPT safety systems update — OpenAI adds cross-session risk tracking for suicide / self-harm / harm-to-others escalation cues | Industry |
| May 19, 2026 | Google I/O 2026 — Gemini 3.5 Flash launches as the new default Gemini app + Search AI Mode model (~4× faster than peers); Gemini 3.5 Pro slated for June | Models |
| May 19, 2026 | Google I/O 2026 — Gemini Omni / Omni Flash, Google DeepMind's new multimodal world-model line aimed at AGI (any input, any output, video first) | Models |
| May 19, 2026 | Google I/O 2026 — Gemini Spark, a 24/7 personal AI agent integrating ~30+ third-party tools via MCP, gated behind the new Google AI Ultra ($100/mo) tier | Tools |
| May 19, 2026 | OpenAI Guaranteed Capacity / Compute Annual Pass launches — 1/2/3-year long-term compute reservations for enterprise AI products & agents | Industry |
| May 19, 2026 | OpenAI ↔ Google SynthID + C2PA content provenance — first major frontier-lab interop on durable cross-platform AI image watermarking and a public verifier preview | Industry |
| May 19, 2026 | Anthropic: Widening the conversation on frontier AI — framework for engaging wisdom traditions in frontier-AI safety dialogue | Industry |
| May 19, 2026 | DeepSeek hires former Jane Street engineer to build AI harness team — DeepSeek pivoting from model R&D toward autonomous, revenue-generating agents | Industry |
| May 13, 2026 | Runway Agent launches — conversational agent that takes a written brief and ships a multi-shot finished video end-to-end on Gen-4 / Aleph | Tools |
| May 20, 2026 | Alibaba Cloud Summit Hangzhou — Qwen 3.7-Max GA, agentic-coding flagship for long-horizon multi-step missions; new T-Head Zhenwu M890 AI chip + full-stack AI infrastructure upgrade | Models |
| May 20, 2026 | Bristol Myers Squibb ↔ Anthropic Claude Enterprise — 30K+ employees standardise on Claude Enterprise for drug discovery / development / delivery; first top-5 pharma full Claude deployment | Industry |
| May 20, 2026 | LlamaIndex ↔ Google Agents API — LlamaParse / LiteParse exposed inside the new Google Agents API sandbox; Sandboxed-Lit runtime + ParseBench (first OCR benchmark for agents) ship in the same wave | Frameworks |
| May 20, 2026 | Microsoft RAMPART + Clarity open-sourced — pytest-native white-box safety/security testing framework for agentic AI + structured design-review companion; CI/CD-friendly successor to PyRIT | Tools |
| May 6, 2026 | AWS MCP Server GA — AWS-managed MCP endpoint exposes every AWS API with sandboxed Python and agent skills; first hyperscaler-first-party MCP server | Protocols |
| May 1, 2026 | Google Workspace MCP Server rolls out — Workspace-native MCP for Gmail / Drive / Calendar / Docs / Sheets with admin-scoped OAuth | Protocols |
| May 14, 2026 | Grok Build (early beta) — xAI's agentic CLI coding agent powered by grok-code-fast-1; parallel sub-agents in isolated envs, SuperGrok Heavy gating | Tools |
| May 14, 2026 | iManage MCP Server launched — first major legal/professional-services SaaS to ship a public MCP endpoint | Tools |
| May 19, 2026 | Google Antigravity 2.0 at I/O 2026 — standalone desktop app for multi-agent orchestration, scheduled / async runs, dynamic sub-agents, Antigravity CLI + SDK, enterprise edition inside Gemini Enterprise Agent Platform | Tools |
| May 22, 2026 | Kore.ai Artemis Agent Platform launched on Azure — AI-native enterprise platform with Agent Blueprint Language (ABL) for declarative multi-agent workflows | Industry |
| May 22, 2026 | FPT Flezi Foundry™ launched — AI-augmented delivery platform with Agentic Development Lifecycle (ADLC) and Agentic Managed Services (AMS) modes under "Service-as-a-Software" governance | Industry |
| May 22, 2026 | JetBrains Rider AI test-writing skill — surfaces .NET coverage data to Claude Code / Codex so agents focus tests on untested branches | Tools |
| May 28, 2026 | Claude Opus 4.8 released by Anthropic — codebase-scale migrations, dynamic-workflows research preview (hundreds of parallel sub-agents), effort-control panel, 3× cheaper Fast mode; teases upcoming Mythos-class models | Models |
| May 28, 2026 | Koog 1.0 released at KotlinConf 2026 — JetBrains' open-source Kotlin/Java AI-agent framework hits stable, Kotlin Multiplatform deployment, OpenTelemetry across targets | Frameworks |
| May 28, 2026 | Gemini Omni Flash conversational video editing starts rolling out via Gemini app / Google Flow / YouTube Shorts — voice-and-text-driven cinematic edits replace NLEs | Tools |
| May 29, 2026 | MCP 2026-07 Release Candidate published — stateless protocol core, extensions framework, MCP Apps server-rendered UI, hardened OAuth/OIDC alignment; final spec target July 28, 2026 | Protocols |
| Apr 2026 | Gartner predicts 40% of enterprise apps will embed AI agents by end of 2026 | Industry |
| Apr 2026 | Google commits up to $40B investment in Anthropic (initial $10B) | Industry |
| 2026 (ongoing) | A2A Protocol grows to 150+ partner organizations | Protocols |
| 2026 (ongoing) | 85% of developers regularly use AI coding tools | Industry |
| 2026 (ongoing) | Enterprise agentic AI adoption accelerates — "Agents as a Service" emerges | Industry |
Contributions welcome! Please read the contributing guidelines first.
This list is released under MIT License.
⭐ If you find this list useful, please give it a star! ⭐
440+ resources across 25 categories — from foundation models to agent protocols to generative AI.
Made with ❤️ by Zijian Ni
Last updated: May 30, 2026
