🤖 Awesome AI Agents 2026

The definitive curated list of AI models, agent frameworks, tools, protocols, and resources for 2026 — the year agents went mainstream and AI became infrastructure.

Covering foundation models, multimodal AI, agent protocols (MCP/A2A), coding agents, computer use, generative AI, and more.

🏷️ Status Legend

Entries may carry one or more status tags so readers can judge maturity at a glance:

🆕 New — Added in the last 60 days, still settling.
📦 Archived — Repository archived by its owner; preserved for historical reference, no further updates expected.
💤 Stale — No commits in 6+ months; project may still work but is no longer actively maintained.
⚠️ Unverified — Recent submission with limited independent traction (low stars / no third-party adoption / sole-maintainer / submitted to many awesome lists in parallel). Listed for completeness, not endorsed — vet before using.
🇨🇳 Chinese ecosystem — Project from a mainland-China team or primarily targeting the Chinese market.
🔥 Hot — GitHub stars grew >20% in the last 30 days; community momentum.
⚡ Updated — Received a notable release or major feature in the last 14 days.
🧪 Experimental — Promising but not production-ready; use for R&D only.
💰 Freemium — Core functionality free; paid tiers for scale/advanced features.
🔐 Audited — Has undergone independent security audit or formal verification.
🇨🇳 China-first — Optimized for Chinese language, regulation, or infra stack.

Foundation Models · Multimodal AI · Protocols · Frameworks · IDEs & Builders · Memory · Tools · Sandboxing · Security · RAG · Coding · Physical AI · Simulation · Benchmarks · Computer Use · Browser & Web · Voice · Personal · Mobile · Enterprise · Evaluation · Research Tools · Learning · Chinese Ecosystem · Compare · Notable 2026 · Timeline

🚀 Start Here

New to AI agents? Follow this path:

📖 Understand — what an agent actually is vs. a chatbot

🗺️ Find your scenario → Scenario Guide

🧩 Copy a proven setup → Stack Recipes

🔍 Pick the right tool → Compare Tables

⚠️ Avoid common mistakes → Anti-Picks

Already building? Jump to:

🆕 Latest additions (May 2026) • 🛡️ Security • 💰 Cost comparison

Quick Navigation

Category	Description	Count
🧠 Foundation Models	Latest LLMs from OpenAI, Anthropic, Google, Meta, and 22+ providers	80+
🎨 Multimodal & Generative AI	Image, video, audio, and music generation	20+
🔗 Agent Protocols	MCP, A2A, and interoperability standards	10+
🏗️ Agent Frameworks	Libraries for building autonomous AI agents	23+
🛠️ Agent IDEs & Visual Builders	Visual / low-code environments for designing agent flows	8+
🧠 Agent Memory	Persistent memory and context management	10+
🔌 Tool & API Integration	Connecting agents to external services	18+
🧪 Sandboxing & Compute Isolation	Secure runtimes for agent-generated code	7+
🛡️ Agent Security	Prompt injection defense and guardrails	16+
🔍 RAG & Knowledge	Retrieval-augmented generation systems	12+
💻 Coding Agents	AI-powered software engineering	27+
🤖 Physical AI	Humanoid robots, embodied AI, industrial automation	22+
🎮 Simulation & World Models	Sim environments for training and stress-testing agents	7+
📊 Benchmarks	Leaderboards tracking frontier capability	11+
🖥️ Computer Use	Desktop automation and OS-level control	10+
🌐 Browser & Web Agents	Agents that drive real browsers	9+
🗣️ Voice & Multimodal Agents	Voice-enabled conversational AI	10+
📱 Personal AI Agents	Productivity and daily life assistants	11+
📱 Mobile Agents	Phone-control agents (Android / iOS)	6+
🏢 Enterprise Platforms	Enterprise-grade agent deployment	18+
📊 Evaluation & Observability	Testing, monitoring, and benchmarking	17+
🔬 AI Research Tools	Tools for AI/ML research and experimentation	10+
📚 Learning Resources	Papers, courses, and tutorials	20+
🇨🇳 Chinese AI Ecosystem	Major projects from China-based teams	18+
📝 Compare	Side-by-side comparison tables	—
🗺️ Scenario Guide	56 curated scenario-to-tool mappings	56
📋 Stack Recipes	Curated multi-tool combinations	8
⚠️ Anti-Picks	What NOT to use and why	15

🧠 Foundation Models 2026
🎨 Multimodal & Generative AI
🔗 Agent Protocols & Standards
🏗️ Agent Frameworks
🛠️ Agent IDEs & Visual Builders
🧠 Agent Memory
🔌 Tool & API Integration
🧪 Agent Sandboxing & Compute Isolation
🛡️ Agent Security
🔍 RAG & Knowledge
💻 Coding Agents
🤖 Physical AI & Embodied Agents
🎮 Agent Simulation & World Models
📊 Benchmarks & Leaderboards
🖥️ Computer Use & Desktop Agents
🌐 Browser & Web Agents
🗣️ Voice & Multimodal Agents
📱 Personal AI Agents
📱 Mobile Agents
🏢 Enterprise Agent Platforms
📊 Agent Evaluation & Observability
🔬 AI Research Tools
📚 Learning Resources
🇨🇳 Chinese AI Ecosystem
📝 Compare — Side-by-Side Tables
🗺️ Scenario Guide — What Should I Use For…
📋 Stack Recipes — Curated Tool Combinations
⚠️ Anti-Picks — What NOT to Use For…
🌟 Notable Agent Projects of 2026
📅 2026 AI Timeline

🧠 Foundation Models 2026

The latest large language models powering the AI ecosystem, organized by company. 60+ models from 20+ providers.

OpenAI

GPT-5.5 - 🆕 Released April 23, 2026 (codename "Spud"). OpenAI's new frontier model for agentic tasks: coding, online research, data analysis, autonomous tool navigation. Significant gains in reasoning, consistency, and long-horizon task handling. Available on ChatGPT Plus / Pro / Business / Enterprise.
GPT-5.5 Pro - 🆕 April 23, 2026. Parallel test-time compute variant for higher-accuracy cognitive tasks. Pro / Business / Enterprise tiers.
GPT-5.5-Cyber - 🆕 April 30, 2026. Cybersecurity-specialized variant of GPT-5.5, rolled out via OpenAI's Trusted Access for Cyber (TAC) program to vetted defenders, government, critical infrastructure operators, and security vendors. Not available to the general public.
OpenAI Daybreak - 🆕 May 12, 2026. Cyber-defense platform bundling GPT-5.5 + GPT-5.5-Cyber + Trusted-Access-for-Cyber for AI-powered vulnerability detection and patch validation; preview access extended to EU governments and security vendors.
GPT-5.5 Instant - 🆕 May 5, 2026. New ChatGPT default model. Efficiency-first upgrade with ~50% lower hallucination rate on high-stakes prompts; available on free tier.
GPT-Realtime-2 - 🆕 May 8, 2026. GPT-5-class reasoning brought to the Realtime API, 128K context, parallel tool calls with audio feedback, adjustable reasoning effort.
GPT-Realtime-Translate - 🆕 May 8, 2026. Live speech-to-speech translation across 70+ input languages and 13 output languages.
GPT-Realtime-Whisper - 🆕 May 8, 2026. Streaming low-latency speech-to-text companion to GPT-Realtime-2.
OpenAI Deployment Company (DeployCo) - 🆕 May 11, 2026. New OpenAI-majority-owned services entity for enterprise AI rollout. Backed by $4B+ from TPG / Advent / Bain Capital / Brookfield / Goldman Sachs / SoftBank and consulting partners Bain & Company, Capgemini, McKinsey. Built around Forward Deployed Engineers; absorbs the Tomoro AI consulting acquisition (~150 engineers).
Codex on Mobile - 🆕 May 14, 2026. ChatGPT iOS/Android can now remote-control the Codex desktop app — review outputs, approve actions, switch models, and kick off new tasks from the phone while the live session runs on Mac (Windows next). Rolling out as preview to Free, Plus and Go users.
OpenAI ↔ Malta partnership - 🆕 May 16, 2026. First country-wide deal: every Maltese citizen / resident aged 14+ gets a free 1-year ChatGPT Plus subscription after completing a 2-hour AI literacy course built by the University of Malta. Part of the "OpenAI for Countries" initiative; phased rollout starting May 2026.
OpenAI ↔ Dell Codex partnership - 🆕 May 18, 2026. Brings Codex to hybrid and on-premises enterprise environments via Dell Technologies infrastructure — first major Codex distribution channel outside the public cloud, targeted at regulated industries needing data-residency control.
ChatGPT Safety Updates — sensitive-conversation tracking - 🆕 May 18, 2026. ChatGPT's safety systems updated to detect and track subtle escalation cues across long sessions for acute risks (suicide / self-harm / harm to others), with cross-session state retention.
OpenAI Guaranteed Capacity (Compute Annual Pass) - 🆕 May 19, 2026. Long-term compute reservation product for enterprise AI products / agents / workflows. 1, 2, or 3-year terms; longer terms unlock larger discounts. OpenAI's structural response to the Anthropic "Priority Tier" model.
OpenAI ↔ Google SynthID + C2PA content provenance - 🆕 May 19, 2026. OpenAI partners with Google to add durable cross-platform SynthID watermarking to ChatGPT/Sora images, joins C2PA, and previews a public "is-this-image-from-OpenAI" verifier. First major frontier-lab interop on watermarking.
GPT-5.4 - Released March 2026. Frontier model with 1M-token context, advanced coding, computer use, tool search. BenchLM 94, SWE-bench Verified 77.2%, OSWorld 75% (beats human).
GPT-5.4 Pro - Higher-accuracy variant of GPT-5.4. BenchLM 92.
GPT-5.3 - Early 2026. Includes GPT-5.3 Instant (conversations) and GPT-5.3-Codex (coding).
GPT-5.2 - Released Dec 2025. State-of-the-art reasoning, long-context understanding, and vision.
GPT-5 - Launched August 2025. The default model in ChatGPT, replacing GPT-4o. Multimodal with variants: gpt-5, gpt-5-mini, gpt-5-nano.
GPT-4o - Omni model with native text, vision, and audio. Retired from ChatGPT Feb 2026 but still available via API.
o3 / o4-mini - Reasoning models with chain-of-thought for complex problem solving. Released April 2025.
Codex CLI - Open-source terminal-based coding agent powered by OpenAI models.

Anthropic

Claude Opus 4.7 - 🆕 Released April 16, 2026. Advanced software engineering (SWE-bench Verified 87.6%), enhanced vision, proactive code verification. Supports /think xhigh reasoning effort. 1M-token context.
Claude Opus 4.6 - Released Feb 2026. 1M-token context, 14.5-hour task horizon. Leads Arena chat leaderboard.
Claude Sonnet 4.6 - Released Feb 2026. Frontier coding and agentic performance, 1M token context window.
Claude Mythos Preview - 🆕 April 2026 gated research preview. BenchLM 99 (top of leaderboard), SWE-bench Verified 93.9%. Limited to Project Glasswing partners.
Claude Opus 4 - Released May 2025. Advanced reasoning and complex task execution.
Claude Sonnet 4 - Released May 2025. Balanced performance and cost for a wide range of tasks.
Claude Code - Agentic coding tool operating directly in your terminal. Powered by Opus 4.7 with /think xhigh support.
Claude Security - 🆕 May 1, 2026. Public beta. Enterprise security tool powered by Opus 4.7 — scans entire codebases for vulnerabilities and generates targeted patches with confidence rating, severity, reproduction steps, and recommended fixes. Available to Enterprise customers via claude.ai/security.
Claude Finance Agents - 🆕 May 5, 2026. Ten Opus-4.7-powered specialised agents for pitchbook authoring, KYC, month-end close, deal screening, etc. Deployable as Claude Cowork plugins, Claude Code skills, or Managed-Agents cookbooks.
Claude Finance JV - 🆕 May 4, 2026. $1.5B Claude deployment joint venture with Goldman Sachs and Blackstone embedding Anthropic engineers in mid-market Wall Street firms.
Claude Add-ins / Dreaming / Outcomes / Multi-agent orchestration - 🆕 May 8, 2026 (Code with Claude 2026). Anthropic introduces Add-ins, scheduled memory review between sessions ("Dreaming"), rubric-driven "Outcomes", and a lead-agent + sub-agent orchestration model with shared filesystem and auditable trace.
Anthropic ↔ SpaceX Colossus 1 - 🆕 May 6, 2026. Anthropic takes all available capacity at SpaceX's Colossus 1 Memphis datacenter (>220K NVIDIA H100/H200/GB200 GPUs, 300+ MW) for Claude Opus inference. Doubles Claude Code 5-hour rate limits on Pro/Max/Team/Enterprise; also lifts peak-hour limits.
Claude for Legal - 🆕 May 12, 2026. New legal stack on top of Claude Cowork: 20+ MCP connectors (iManage, NetDocuments, DocuSign, Ironclad, LexisNexis, Westlaw, Harvey, Everlaw, Relativity, CourtListener…) + 12 practice-area plugins (commercial, employment, privacy, product, corporate, AI governance, litigation associate, law-student bar-exam). Microsoft Word / Outlook / Excel / PowerPoint orchestration built in.
Claude for Small Business - 🆕 May 13, 2026. Small-business toggle inside Claude Cowork — 15 pre-built agentic workflows across finance / ops / sales / marketing / HR / customer service, native connectors for QuickBooks, PayPal, HubSpot, Canva, DocuSign, Google Workspace, Microsoft 365. Bundled with a free PayPal-backed "AI Fluency for Small Business" course and a 10-city US workshop tour kicking off in Chicago.
Anthropic ↔ Gates Foundation $200M - 🆕 May 14, 2026. 4-year, $200M partnership pairing grants + Claude usage credits + Anthropic engineers on global-health, life-sciences, education, and agriculture programs. All tools produced under the program will be freely available; first focus areas include vaccine R&D for polio / HPV / preeclampsia and agriculture-specific Claude extensions.
Anthropic ↔ PwC strategic alliance expansion - 🆕 May 14, 2026. PwC commits to global rollout of Claude Code + Claude Cowork, certifies 30,000 PwC professionals, and stands up a joint "Agentic Enterprise" Center of Excellence — focused on agentic build, AI-native deals, and finance / supply-chain / HR reinvention.
Anthropic ↔ Financial Stability Board briefing (Claude Mythos) - 🆕 May 18, 2026. Anthropic briefs the global FSB on Claude Mythos cyber-flaw discovery capabilities — first time a frontier lab briefs a G20-level financial-stability regulator on a frontier model's offensive-security implications.
Code with Claude 2026 sessions on YouTube - 🆕 May 18, 2026 (sessions published). Full developer-conference recordings (May 6 event) go public: Claude Code roadmap, Claude Developer Platform updates, Managed Agents dreaming + multi-agent orchestration, and partner deployments.
Widening the conversation on frontier AI - 🆕 May 19, 2026. Anthropic publishes its framework for engaging diverse traditions (religious, philosophical, indigenous) in frontier-AI safety dialogue. Companion to ongoing public-engagement work.
Bristol Myers Squibb ↔ Anthropic Claude Enterprise - 🆕 May 20, 2026. BMS adopts Claude Enterprise as its shared intelligence platform for 30,000+ employees globally, embedding agentic Claude into drug-discovery / development / delivery workflows. First top-5 pharma enterprise-wide Claude deployment.
Claude Opus 4.8 - 🆕 May 28, 2026. Major Opus refresh: codebase-scale migrations, sharper agentic judgment, dynamic workflows research preview with hundreds of parallel sub-agents in a single session, manual effort-control panel, 3× cheaper Fast mode at the same $5 / $25 per million in/out. Available on Anthropic native + Amazon Bedrock + AWS Claude Platform + Google Cloud + Microsoft Foundry. Teases an upcoming Mythos-class model series for limited orgs.

Google DeepMind

Gemini 3.1 Pro - Released Feb 2026. BenchLM 94, GPQA Diamond 94.3% (world-record), ARC AGI2 77.1%. Most capable Google model, $2/1M tokens flagship.
Gemini 3.1 Flash Live - 🆕 April 2026. Real-time multimodal streaming for voice assistants and interactive agents. Low latency, long context.
Gemini 3.1 Flash-Lite (GA) - 🆕 May 8, 2026. Generally available on Gemini API / AI Studio / Vertex AI. Fastest and most cost-efficient model in the Gemini 3 family — built for low-latency code completion, real-time UX, and agentic developer tools; matches Gemini 2.5 Flash quality at significantly lower cost.
Gemini 3.5 Flash - 🆕 May 19, 2026 — Google I/O 2026. Default model powering the Gemini app and Google Search AI Mode. Marketed as ~4× faster than other frontier models in output tokens/sec while outperforming Gemini 3.1 Pro on key benchmarks. Gemini 3.5 Pro slated for June 2026.
Gemini Omni / Omni Flash - 🆕 May 19, 2026 — Google I/O 2026. New Google DeepMind multimodal world-model family aimed at AGI. Omni Flash, the first shipped variant, can take any input modality and generate any output (starting with video; image and text generation following). Direct lineage to Gemini Robotics / Genie line of work.
Gemini Omni Flash — voice-controlled video editing rollout - 🆕 May 28, 2026. Omni Flash starts rolling out to consumers via the Gemini app, Google Flow, and YouTube Shorts as the editing engine — conversational cinematic zooms / background swaps / weather edits driven by text, voice, image, or audio prompts; no traditional NLE required.
Gemini Spark (24/7 personal AI agent) - 🆕 May 19, 2026 — Google I/O 2026. Cloud-resident personal AI agent that runs 24/7 on user intent, integrates Gmail / Chat first, then ~30+ third-party tools via MCP (Adobe / Dropbox / Uber). Available to Google AI Ultra subscribers in the US within the I/O week.
Google AI Ultra ($100/month tier) - 🆕 May 19, 2026 — Google I/O 2026. New top consumer subscription targeted at developers / creators / power users. Gates Gemini Spark, highest Gemini 3.5 quotas, and the upcoming Gemini 3.5 Pro.
Gemini 3.1 Flash / Flash Lite - Fast, cost-efficient models for high-throughput applications.
Gemini 4 (Open) - 🆕 Released April 2026. Open model family: 2B / 4B / 26B / 31B variants. Strong science reasoning and document understanding, local deployment ready.
Gemini 2.5 Pro / Flash - GA June 2025. Thinking model with 1M context.
Gemma 4 31B - 🆕 April 2026. GPQA Diamond 84.3%. Strong open-weight alternative for on-device reasoning.
Gemma 3 - Previous open model family for on-device and research use.
Gemini Robotics ER-1.6 - 🆕 April 14, 2026. Upgraded robotics AI with improved spatial and physical reasoning. Partnership with Agile Robotics for real-world deployment.

Sakana AI

Sakana RL Conductor - 🆕 Paper April 27, 2026; Fugu beta late-April / early-May 2026. 7B RL-trained orchestrator (built on Qwen2.5-7B) that routes subtasks between GPT-5, Claude Sonnet 4, Gemini 2.5 Pro, etc. SOTA on LiveCodeBench (83.9%) and GPQA-Diamond (87.5%) at ~1.8K tokens/query — roughly 6× cheaper than other multi-agent ensembles.
Sakana Fugu - 🆕 Beta April 24-25, 2026. Commercial multi-agent orchestration service productising the RL Conductor research. OpenAI-compatible API with two tiers: Fugu Mini (low-latency) and Fugu Ultra (max performance); strong reported results on SWE-Pro, GPQA-D and ALE-Bench.

Zyphra

ZAYA1-8B - 🆕 May 6, 2026. MoE reasoning model (<1B active) trained end-to-end on AMD Instinct MI300X clusters. Apache 2.0 weights on Hugging Face + serverless endpoint on Zyphra Cloud; aimed at math, code, and dense reasoning per active parameter.
ZAYA1-8B-Diffusion-Preview - 🆕 May 14, 2026. First MoE diffusion language model converted from an autoregressive LLM and the first diffusion LM trained on AMD GPUs. Generates 16 tokens per step, achieving up to 7.7× inference speedup vs the autoregressive base. Built with Zyphra's TiDAR recipe + CCA attention.

Mistral AI

Mistral Large 3 - 675B total / 41B active parameters, MoE, 256K context. Flagship open-weight multimodal model. Released Dec 2025.
Mistral Medium 3.1 - Frontier-class dense model for enterprise. Multimodal, 128K context, 80+ coding languages. Released Aug 2025.
Mistral Small 4 - 🆕 Released March 2026. 119B total / 6B active. Hybrid model combining reasoning, multimodal, and coding strengths.
Magistral 1.2 - 🆕 2026 reasoning family challenging o3/o4-mini. Transparent and multilingual reasoning.
Devstral 2 - 🆕 2026 agentic coding model. Best open-source model for coding agents.
Codestral - 22B code generation model, 80+ programming languages, 32K context. Released May 2024.
Pixtral Large - 124B multimodal model with 1B vision encoder, 128K context, processes 30+ high-res images.
Ministral 3B/8B/14B - Compact models optimized for edge deployment and efficiency.
Mistral Forge - 🆕 March 2026 platform for training custom LLMs on proprietary data.
Mistral Medium 3.5 - 🆕 April 29, 2026. Dense 128B open-weight model, 256K context, Modified MIT license. Unifies instruction-following, reasoning, and coding.
Voxtral TTS - 🆕 March 26, 2026. 4B-parameter open-weight TTS built on Ministral 3B; multilingual, optimised for voice agents.

DeepSeek

DeepSeek Agent Harness team - 🆕 May 19, 2026. DeepSeek hires a former Jane Street engineer to lead a new "AI harness" team building the deterministic scaffolding that turns DeepSeek V4 into autonomous, revenue-generating agents — first major signal DeepSeek is moving past raw-model R&D into agentic productisation.
DeepSeek-V4-Pro - 🆕 April 24, 2026. 1.6T total / 49B active MoE, 1M-token context. MIT license. Leadership in agent capabilities, world knowledge, reasoning. Tops open-source benchmarks.
DeepSeek-V4-Flash - 🆕 April 24, 2026. 284B total / 13B active MoE, 1M context. MIT. Cost-efficient tier.
DeepSeek-V3.2 - Released Dec 2025. Advanced MoE architecture with 671B total parameters. V3.2 Speciale variant for enhanced reasoning.
DeepSeek-R2 - 2026 advanced reasoning model. Successor to R1, competes with GPT-5 and Gemini 3 Pro.
DeepSeek-R1 - Reasoning-focused model with chain-of-thought capabilities. Released Jan 2025.
DeepSeek-Coder-V2 - Code generation model competitive with GPT-4 on coding benchmarks.

Meta (Llama)

Llama 5 - 🆕 April 8, 2026. 600B+ parameter open-source flagship from Meta Superintelligence Labs; "recursive self-improvement" research line. Marketed as exceeding leading proprietary models on reasoning, coding, autonomous agentic behaviour.
Meta Muse Spark - 🆕 April 8-9, 2026. First public model from Meta Superintelligence Labs; long-context multimodal foundation.
Llama 4 Scout / Maverick - 10M-token context (Scout) MoE flagship line shipped April 2025; still the production fallback for many enterprise stacks.

Alibaba (Qwen)

Qwen3.7-Max - 🆕 May 20, 2026 — Alibaba Cloud Summit Hangzhou. New Qwen flagship purpose-built as the foundation for AI agents: agentic coding, complex reasoning, and long-horizon multi-step missions with sustained decision-making. Released alongside a full-stack AI infrastructure upgrade and new T-Head Zhenwu M890 AI accelerator chip. Worldwide developer/enterprise availability rolling.
Qwen3.7-Max-Preview / Qwen3.7-Plus-Preview - 🆕 May 18, 2026. Preview ladder before the Hangzhou unveil. Ranked the highest of any Chinese model on LM Arena in both text and vision; sustained 1M-context evaluations.
Qwen3.6-27B - 🆕 April 22, 2026. Dense 27B multimodal. Open-sourced. Focus: agentic coding + thinking-context preservation.
Qwen3.6-Max-Preview - 🆕 April 18, 2026. Proprietary frontier preview. High coding/reasoning performance, 1M context window. Top-tier among Chinese models on coding benchmarks.
Qwen3.6-35B-A3B - 🆕 April 15, 2026. MoE, 35B total / 3B active. Apache 2.0. Stability and real-world utility improvements.
Qwen3.6-Plus - 🆕 April 2, 2026. Proprietary flagship. High value-per-token general model. Strong long-context, tool-calling, agentic behavior.
Tianma (天马) AI - 🆕 April 27, 2026 (beta). Alibaba's image-to-video generation model. Strong character consistency and motion quality.
Qwen3.5 Max Pro - April 2026. High-performance flagship. Enhanced coding and math reasoning, long context.
Qwen3.5 Omni Plus - April 2026. Proprietary full-modal foundation model unifying text and image input.
Qwen3-Max-Thinking - Alibaba's strongest thinking model. 1T+ parameters, enhanced agentic capabilities.
Qwen3.5-Omni - March 2026. Fully omni-modal: language, vision, sound, motion. Speech recognition in 113 languages, 256K context.
Qwen3-Coder-Next - Feb 2026. Open-weight coding agent model, MoE 80B total / 3B active.
Qwen3 235B-A22B - MoE with dual-mode reasoning. Strong math, code, and commonsense reasoning.
Qwen2.5 Coder 32B - Top open-source coding model.

MiniMax (extra)

MiniMax M2.7 - 🆇 🆕 March 2026. Proprietary self-evolving LLM tuned for agent harness construction, memory updates, iterative workflow improvement; major gains on SWE-bench-style tasks.
MiniMax M2.5 - 🆇 February 2026. 230B-parameter cost-efficient flagship for "real-world productivity".
Hailuo 02 - 🆇 🆕 March 2026. Native 1080p text/image-to-video with longer training corpus.
MiniMax Music 2.6 - 🆇 🆕 April 2026. Cover-generation focus with improved low-frequency reproduction; global beta.

ByteDance (Doubao / Seedance)

Doubao 2.0 - 🆇 🆕 February 2026. Agent-era upgrade focused on real-world task execution; powers ByteDance's consumer AI apps.
Seedance 2.0 - 🆇 🆕 February 2026. Multi-modal cinematic video generation, 2K resolution, ~30% faster than Seedance 1.5.

StepFun

Step 3.5 Flash - 🆇 🆕 February 2026. ~196B-parameter compact reasoning + agent model; punches above its weight against larger Chinese rivals.

Baichuan

Baichuan-M3 Plus - 🆇 🆕 January 2026. Evidence-anchored medical LLM with low hallucination rate; free API for Chinese medical institutions.

xAI (Grok)

Grok 4.3 Beta - 🆕 April 2026. Latest iteration with improved reasoning and coding benchmarks. See 2026.4 benchmark snapshot.
Grok 4.3 GA - 🆕 May 2026. Grok 4.3 reached general availability on Microsoft Foundry and OCI Generative AI; xAI's flagship for agentic workloads with improved tool-calling and long-horizon reasoning.
Grok 4.20 - Feb 2026. Multi-agent system (4 standard + 16 specialized agents in Heavy mode), 2M token context.
Grok 4 / 4 Heavy - Released July 2025. 3T parameters. xAI's frontier model.
Grok 3 / 3 Mini - Feb 2025. First reasoning models with "Think Mode".

Microsoft (Phi)

Phi-4-reasoning-vision-15B - 🆕 Released March 2026. 15B multimodal model with selective chain-of-thought reasoning. Edge-deployable.
Phi-4 - 14B parameter SLM with reasoning rivaling much larger models. Open-source under MIT License.
Phi-4-mini - 3.8B parameter dense model. 128K context. Excels in reasoning, math, coding, and function-calling.
Phi-4-multimodal - 5.6B parameter. First multimodal Phi model — integrates speech, vision, and text in unified architecture.

Cohere

Command A - 🆕 Released April 2026. 111B open-weights model, 256K context. Agentic, multilingual, and coding focused.
Command R+ - Enterprise RAG model, 128K context, multilingual (10 languages), grounded generation with citations.
Command R - Cost-efficient model for retrieval-augmented generation and enterprise workloads.

Baidu (ERNIE / 文心)

ERNIE 5.0 - 🆕 Released Jan 2026. 2.4T parameters MoE (activates <3% per query). Native full-modal. #1 Chinese model on LMArena.
ERNIE 4.5 - Multimodal predecessor released 2025. Strong reasoning and Chinese language capabilities.

Zhipu AI / Z.ai (GLM)

GLM-5.1 - 🆕 April 7, 2026. 744B MoE / 40B active, 200K context. MIT license. Tops SWE-Bench Pro. Trained entirely on Huawei Ascend (no NVIDIA).
GLM-5 Reasoning - 🆕 April 2026. BenchLM 85 — top open-source score. SWE-Bench Pro surpasses GPT-5.4 and Claude Opus 4.6.
GLM-5V-Turbo - 🆕 April 2026. Native multimodal agent — vision, video clips, text inputs. Cost-performance balanced.
GLM-5 - Released Feb 2026. 744B parameters, advanced agentic intelligence. MIT license.
GLM-4.7 - Released late 2025. Matches Claude Opus 4 on SWE-Bench.

MiniMax

MiniMax-M2.7 (Open Weights) - 🆕 April 2026. Ultra-long context (1M+ window). Top-tier performance on coding and Agent tasks.
MiniMax-M1-80k - Open-weight hybrid-attention reasoning model. 456B parameters, 1M token context.
Hailuo AI (Video) - Text/image-to-video generation with AI avatars, voiceovers, and character consistency.
Kilo Code Integration - 🆕 MiniMax powers Kilo Code (new AI coding editor). Default model for its code-generation pipeline.

Moonshot AI (Kimi)

Kimi K2.6 - 🆕 April 20-21, 2026. 1T MoE / 32B active, 256K context. Enhanced coding, long multi-step execution, agent swarm up to 1,000 collaborating agents. Supports thinking.keep="all" persistent reasoning. Default in OpenClaw v2026.4.20+.
Kimi K2.5 - Jan-Feb 2026. 1T total / 32B active MoE. Native multimodal, Agent Swarm (up to 100 parallel sub-agents). Open-source. ⚠️ Support ending May 25, 2026 — migrate to K2.6.
Kimi Code - Premium coding tier powered by K2.5/K2.6, terminal-based developer workflows.

ByteDance (Doubao / 豆包)

Doubao-Seed-2.0 Pro - 🆕 Released Feb 2026. Frontier reasoning and complex agents. Competes with GPT-5.2 at ~90% lower cost.
Doubao-Seed-2.0 Lite - 🆕 General production workloads. Balanced performance and efficiency.
Doubao-Seed-2.0 Code - 🆕 Software development — code generation, debugging, and review.
BAGEL - 🆕 Open-source multimodal model for text, image, and video understanding and generation.

Amazon (Nova)

Nova 2 Pro - 🆕 Amazon's most intelligent reasoning model. Text, image, video, speech input. Agentic coding and long-range planning.
Nova 2 Lite - 🆕 Fast, cost-effective reasoning with 1M-token context. Adjustable "thinking effort" controls.
Nova 2 Sonic - 🆕 Speech-to-speech model for real-time conversational AI. 1M token context, multilingual.
Nova Act - 🆕 Browser-based AI agent service for web task automation. Powered by Nova 2 Lite.
Nova Forge - 🆕 "Open training" service for building custom Nova model variants with proprietary data.

NVIDIA (Nemotron)

Nemotron 3 Ultra - 🆕 Released March 2026 (GTC). Frontier-level reasoning, 5x throughput efficiency on Blackwell platform.
Nemotron 3 Super - 🆕 Released March 2026. 120B total / 12B active. 1M context. 5x higher throughput vs predecessor.
Nemotron 3 Nano - Cost-efficient hybrid Transformer-Mamba MoE. Optimized for targeted agentic tasks.
Nemotron 3 Nano Omni - 🆕 April 28, 2026. 30B-A3B hybrid MoE (Mamba + Transformer). Natively multimodal: text, image, audio, video, charts, and documents in one model. 9x higher throughput than comparable open omni models. Topped 6 leaderboards (MMlongbench-Doc, OCRBenchV2, WorldSense, DailyOmni, VoiceBench). Open weights on Hugging Face, OpenRouter, Amazon SageMaker JumpStart.

Tencent (Hunyuan)

Hunyuan Hy3 Preview - 🆕 April 23, 2026. 295B total / 21B active MoE, 256K context. Open-sourced on GitHub, Hugging Face, ModelScope, GitCode. Fast-slow thinking fusion architecture, 40% improved inference efficiency. Supports vLLM and SGLang. Integrated in Yuanbao, CodeBuddy, QQ, Tencent Docs. Available on OpenRouter (free preview period).

Apple

Apple Foundation Models (AFM) - On-device (~3B) and server-based models powering Apple Intelligence. Privacy-first, offline capable.
OpenELM - Open-source efficient language models (270M–3B). Designed for on-device processing on Apple silicon.

Samsung

Samsung Gauss 2.3 - 🆕 2026 on-device AI model for Galaxy S26. Includes Gauss 2.3 Think and Gauss O Flash variants. Agentic AI capable.

Inflection AI

Inflection 2.5 / Pi - Empathetic conversational AI model. Known for emotional intelligence and human-centered interactions.

01.AI

Yi-Lightning - MoE architecture, 200+ tokens/s on RTX 4090. Strong multilingual (Chinese/English), open-source Apache 2.0. Released Oct 2024.

Chinese Academy of Sciences

ScienceOne 100 / 磐石100 - 🆕 April 28-29, 2026. AI model system for scientific research from CAS. Core "ScienceOne" foundation model with literature compass, innovation evaluation engine, and 2,000+ tool agent factory. Supports math, physics, biology, materials science, astronomy, aerospace, and geosciences. In use across 50+ CAS institutes and 100+ research scenarios.

🎨 Multimodal & Generative AI

Tools and models for generating and editing images, videos, audio, and music.

Image Generation

Midjourney V8.1 - 🆕 April 30, 2026. HD 2K image support, new Raw mode options. V8 (3D model generation) reportedly later in 2026.
Flux 2 Pro / Flex / Dev / Klein - 🆕 November 2025. Black Forest Labs' next-generation family. SOTA image quality, multi-reference consistency, dramatically improved text rendering.
Recraft V4 - 🆕 February 17, 2026. Ground-up rebuild; major prompt-accuracy improvements; editable SVG vector output.
Stable Diffusion 3.5 - Open-source image generation with improved coherence and prompt following.
Ideogram 3.0 - Excels at text rendering in images; March 2025 release with style references and in-platform canvas editor.
ChatGPT Images 2.0 - 🆕 April 2026. Free tier. Improved image detail, text understanding, and multi-turn editing for iterative refinement.
gpt-image-2 - 🆕 OpenAI's latest image generation API. Supports 2K/4K resolution hints. Default in OpenClaw v2026.4.21.
DALL·E 3 - OpenAI's text-to-image model integrated with ChatGPT for iterative refinement.
Gemini 3 Pro Image - Google's native image generation within Gemini.
Nano Banana 2 (Gemini 3 Pro Image) - 🆕 Google's transparent-background-friendly image model exposed via OpenClaw image_generate.
Kling IMAGE 3.0 - 🆇 🆕 April 23, 2026. Cinema-grade native 4K image generation from Kuaishou.
Flux - 💤 Stale (last update 2025-07). Black Forest Labs' original open-source repo — superseded by Flux 2 family.
Seedance 2.0 (image side) - 🆇 🆕 ByteDance's next-gen image/animation generation API; pairs with the video model of the same name.

Video Generation

Veo 3.1 - 🆕 October 2025. Google DeepMind's flagship video model. Veo 4 rumoured for late-April / late-May 2026.
Runway Gen-4 - 🆕 Professional video generation and editing with character and style consistency. Now exposes Kling 3.0 / Sora 2 Pro inside the platform (April 2026).
Kling VIDEO 3.0 - 🆇 🆕 February 4-7, 2026. Kuaishou's new generation; realistic human motion, lip-sync, narrative production with audio sync.
Sora 2 (via Runway) - 🆕 OpenAI's Sora app shut down 2026-04, but Sora 2 Pro is integrated into Runway as of April 7, 2026.
Seedance 2.0 - 🆇 🆕 February 2026. ByteDance multi-modal cinematic video generation, 2K resolution, ~30% faster than 1.5.
Hailuo 02 - 🆇 🆕 March 2026. MiniMax video model now native 1080p with expanded training data.
Pika 2.0 - 🆕 Creative video generation with scene and effects control.
LTX Studio - 🆕 AI-powered cinematic video creation platform.
Tianma (天马) AI - 🆇 🆕 April 27, 2026 (beta). Alibaba's image-to-video model.
Sora - 📦 Discontinued (April 26, 2026). OpenAI's text-to-video app shut down; Sora 2 Pro lives on inside Runway.
Runway Agent - 🆕 May 13, 2026. Conversational agent that takes a written brief and ships a complete multi-shot finished video: storyboard → generation → cut → voiceover. Pipes through Gen-4 / Gen-4 Turbo / Aleph editing under the hood; first credible end-to-end "prompt-to-rough-cut" production agent.

Audio & Music

ElevenLabs Eleven v3 + ElevenAgents - 🆕 2026 "audio layer of the internet" — 70+ language TTS with emotional Audio Tags, plus the AIUC-1-certified ElevenAgents voice-agent platform with multimodal messages, conversation topic discovery, and pre-tool speech controls.
Eleven Music + Scribe v2 Realtime - 🆕 ElevenLabs' music generation and live transcription stack.
Cartesia Sonic 3 / 3.5 - 🆕 2026. State-space-model TTS hitting ~40-90ms time-to-first-audio; powers the Line Agents voice-agent platform launched April 2026.
Deepgram Nova-3 + Aura-2 + Flux Multilingual - 🆕 April 2026. Speech-to-text in 45+ languages, sub-200ms TTS, conversational STT with mid-call language switching across 10 languages.
MiniMax Music 2.6 - 🆇 🆕 April 2026. Cover generation focus with improved low-frequency reproduction.
Voxtral TTS - 🆕 March 26, 2026. Mistral's open-weight 4B TTS built for voice-agent latency.
Suno V4 - 🆕 AI music generation from text prompts with high-quality vocals and instruments.
Udio - 🆕 Text-to-music generation with professional audio quality.
OpenAI Audio Models - Native audio understanding and generation within GPT-4o, GPT-Realtime-2 (May 8, 2026).
Stability Audio - Open-source audio and music generation.
Bark - 💤 Stale (no commits since 2024-08). Open-source text-to-audio model supporting speech, music, and sound effects.

🔗 Agent Protocols & Standards

Open standards enabling agent interoperability, tool access, and cross-platform communication.

Model Context Protocol (MCP)

CorpusIQ - 🆕 Official MCP Registry — Multi-source business data connector with 25+ integrations (GA4, Google Ads, TikTok, YouTube, Shopify, Stripe, Airtable, Slack, HubSpot, Calendly, Klaviyo, and more). Intelligent query routing, cross-source attribution, unified business intelligence. Live as io.corpusiq/multi-source-mcp. HTTP transport with Ed25519 signature auth.
MCP Specification - 🆕 The "USB-C for AI" — open protocol by Anthropic for connecting LLMs to tools and data sources. Donated to Agentic AI Foundation (Linux Foundation) in Dec 2025.
MCP 2026-07 Release Candidate - 🆕 May 2026 (final July 28, 2026). Release candidate for the next major MCP spec revision: stateless protocol core (scalability + simpler servers), an extensions framework, the new MCP Apps capability for server-rendered UI, Tasks graduated to an extension, and hardened authorization aligned with OAuth / OpenID Connect.
MCP Servers - Official reference implementations of MCP servers for popular services.
MCP TypeScript SDK - Official TypeScript SDK for building MCP clients and servers.
MCP Python SDK - Official Python SDK for MCP implementation.
mcp.so - 🆕 Community directory of MCP servers and tools.
mcp-gateway - Gateway server for routing and managing MCP connections.

Agent-to-Agent Protocol (A2A)

A2A Protocol - 🆕 Google's open standard for agent-to-agent communication. Enables agents to discover, delegate, and collaborate regardless of framework. Now governed by Linux Foundation with 150+ partner organizations.
A2A Course (DeepLearning.AI) - 🆕 Free course on building multi-agent systems with A2A.

Other Standards

OpenAI Agents SDK - 🆕 Major update April 15, 2026: native sandbox execution, first-class MCP integration, sub-agent / handoff patterns, and Codex-style filesystem tools for production-ready multi-agent workflows.
Agentic AI Foundation - 🆕 Linux Foundation fund co-founded by Anthropic, Block, and OpenAI to govern open agent standards.
Kuberna Labs - ⚠️ Unverified. Cross-chain intent execution protocol for AI agents. Claims ERC-8004 on-chain identity, zkTLS/TEE attestation, and a typed intent schema enabling agents to autonomously execute transactions across NEAR, Base, and Mantle with verifiable execution proofs. New repo, independent adoption unverified — listed for visibility, evaluate before depending on it.

🏗️ Agent Frameworks

Frameworks and libraries for building autonomous AI agents.

Koog 1.0 - 🆕 May 28, 2026 — KotlinConf 2026. JetBrains' open-source agent framework for Kotlin + Java hits a stable 1.0 with a 1-year API stability guarantee. Kotlin Multiplatform deployment (JVM / Android / iOS / JS / WASM), Java interop without wrapper modules, local Android LiteRT, OpenTelemetry across all targets, graph-based workflows, Spring Boot / Ktor integration, and providers for OpenAI / Anthropic / Google / Bedrock. Apache-2.0.
LangChain - Build context-aware reasoning applications with LLMs.
LangGraph - Build resilient language agents as graphs with stateful, multi-actor orchestration. v0.3.19 (April 27, 2026) split prebuilt agents into langgraph-prebuilt (Supervisor, Swarm, LangMem, Trustcall). v1.2 (May 2026) adds per-node timeouts / error recovery / graceful shutdown, a new DeltaChannel to cut checkpoint overhead on long threads, and a content-block-centric streaming API v3.
CrewAI - Framework for orchestrating role-playing autonomous AI agents in collaborative teams.
Microsoft Agent Framework - 🆕 Unified framework merging AutoGen + Semantic Kernel. Multi-agent conversations with enterprise features. GA Q1 2026.
Microsoft Agent 365 - 🆕 GA May 1, 2026. Enterprise observability + governance + security for AI agents across environments; May 2026 update adds Secure Access Service Edge (SASE) for agents, threat detection / blocking, and agent-threat-hunting workflows.
AutoGen - Multi-agent conversation framework by Microsoft (now part of Microsoft Agent Framework).
Google Agent Development Kit (ADK) - 🆕 Modular framework integrated with Gemini and Vertex AI. Hierarchical agent compositions.
OpenAI Agents SDK - 🆕 Next evolution shipped April 15, 2026 — native sandbox execution, MCP-native tool use, sub-agent handoffs, Codex-style filesystem ops. Production-ready multi-agent workflows.
MetaGPT - Multi-agent framework assigning different roles to GPTs for collaborative software entities.
Mastra - 🆕 TypeScript-first agent framework with workflow-driven development and built-in observability.
Ontheia - Self-hosted, open-source AI agent platform. Multi-provider (Claude, OpenAI, Gemini, Ollama), MCP-native, Chain Engine for visual workflow automation, long-term memory (pgvector), multi-user RBAC, GDPR-compliant by architecture. AGPL-3.0.
AgentGPT - 📦 Archived (2025-04). Assemble, configure, and deploy autonomous AI agents in your browser. Influential first-wave project, kept for historical reference; no longer maintained.
BabyAGI - AI-powered task management system using LLMs to create, prioritize, and execute tasks.
SuperAGI - 💤 Stale (no commits since 2025-01). Open-source autonomous AI agent framework to build, manage & run agents.
Semantic Kernel - Integrate LLM technology into apps. C#, Python, Java support.
Phidata (Agno) - Build multi-modal agents with memory, knowledge, tools and reasoning.
DSPy - The framework for programming—not prompting—language models.
OpenClaw - 🆕 Personal AI agent platform with skills, memory, multi-channel messaging, Dreaming (3-stage memory consolidation), Canvas/A2UI, ACP coding harness integration, and Standing Orders. v2026.5.12 (May 14, 2026) with Claude Opus 4.7, Kimi K2.6, /think xhigh support, native model identity injection, isolated Telegram polling worker, and tightened protected-config paths.
Dify - Open-source LLM app development platform with visual agent builder.
Haystack Agents - End-to-end LLM framework for agentic pipelines.
Vellum AI - 🆕 Production-grade agent framework with prompt-based building, evaluations, versioning, and observability.
FastAgency - 🆕 High-speed inference and production scaling framework for agents.
Rasa - Open-source conversational AI with strong intent recognition and dialogue management.
Lindy - 🆕 Top no-code agent framework for business users with visual workflow builder.
Octomind - 🆕 Rust-based open-source AI agent runtime. Model-agnostic (13+ providers), community-built specialist agents (developer, medical, legal, DevOps), MCP support with runtime self-extension, zero-config setup. Apache 2.0.
Microsoft AI Agent Governance Toolkit - 🆕 April 3, 2026. Open-source toolkit for enforcing runtime security policies across agent frameworks including LangChain and AutoGen. Policy-as-code approach for enterprise AI governance.
Bernstein - 🆕 Python orchestrator for 40+ CLI coding agents (Claude Code, Codex, Gemini CLI, Cursor, Aider). One LLM plan call up front; scheduling, git worktree isolation, quality gates, and HMAC-chained audit are deterministic. Apache-2.0.
Genkit Middleware - 🆕 May 14, 2026. New middleware system for Google's open-source Genkit framework. Composable hooks at the generate / model / tool layers — retries with exponential backoff, model fallbacks, tool approval gates, scoped filesystem access, skill injection from SKILL.md. TypeScript / Go / Dart; Python next.
LlamaIndex ↔ Google Agents API integration - 🆕 May 20, 2026. LlamaIndex ships a template for Google's newly launched Agents API exposing LlamaParse / LiteParse over unstructured documents inside a sandboxed Linux environment. Companion Sandboxed-Lit runtime and ParseBench (first OCR benchmark designed for agents) introduced in the same release wave.

🛠️ Agent IDEs & Visual Builders

Visual environments for designing, debugging, and shipping agent workflows without (or with minimal) code.

LangGraph Studio - Visual debugger and trace inspector for LangGraph agents — step through state, replay turns, edit messages mid-flight. Companion to the LangGraph runtime.
Dify - Open-source LLM app development platform with drag-and-drop agent workflow builder. Mainstream production deployments.
Agenta - 🆕 Open-source LLMOps platform combining a prompt playground, prompt management, evaluation runs, and observability in one UI.
Vellum AI - Production-grade agent IDE with prompt building, evaluations, versioning, and observability — closed-source SaaS.
Cozeloop - 🆕 🇨🇳 ByteDance's open-source agent optimization platform: full-lifecycle development, debugging, evaluation, and monitoring. Apache-2.0.
Restack - Durable agent runtime + visual workflow editor (built on Temporal-style replay). Open-source examples in restackio/examples-python.
Bisheng - 🇨🇳 Open enterprise LLM DevOps platform: workflow editor, RAG, agent orchestration, fine-tuning, dataset management, observability. Apache-2.0.
n8n - General-purpose visual workflow automation that has become a popular agent canvas — 400+ integrations + native AI nodes. Fair-code license.
Mastra - 🆕 Opinionated TypeScript agent framework with RAG, observability, MCP, and visual workflow builder; 21K+ stars.
VoltAgent - 🆕 End-to-end TypeScript AI Agent Engineering Platform with memory, RAG, guardrails, MCP, voice, and workflow capabilities.

🧠 Agent Memory

Systems for giving agents persistent memory and context management.

Letta (MemGPT) - Create LLM services with long-term memory and custom tools.
Mem0 - The Memory layer for your AI apps — self-improving memory for LLM applications. April 2026 algorithm upgrade: single-pass add-only extraction, entity linking, multi-signal retrieval; benchmark wins on LoCoMo, LongMemEval, BEAM. 55K+ stars, 21+ official framework integrations.
Zep - Long-term memory for AI assistants and agents.
agent-memory - Lightweight agent memory framework for persistent context across sessions.
Mem0g (graph variant) - 🆕 Graph-enhanced sibling of Mem0 for multi-hop questions; 21+ framework integrations as of early 2026.
Graphiti - 🆕 Zep's open-source temporal knowledge graph engine; every fact is timestamped so agents can reason about "when" as well as "what".
LangMem - 🆕 Spun out of LangGraph 0.3.19 (April 2026). Long-term episodic + procedural memory primitive for agents.
Claude Managed Agents Memory - 🆕 April 23, 2026 public beta. Persistent cross-session memory baked into Anthropic's hosted agent runtime.
LangMem - Long-term memory library for LangChain agents.
Motorhead - 💤 Stale (no commits since 2025-07). Memory and context management server for LLMs.
ChromaDB - AI-native open-source embedding database for memory-augmented agents.
Cognee - Deterministic LLM outputs using graphs, LLMs, and vector retrieval.
LangGraph Memory - 🆕 Built-in persistence and checkpointing for stateful agent workflows.
Graphiti - 🆕 Build and query knowledge graphs for agent memory using temporal awareness.
Claude Managed Agents Memory - 🆕 April 23, 2026 (public beta). Anthropic's persistent memory feature for Claude Managed Agents. Agents retain information across sessions by mounting read/write memory stores to a filesystem. Enables long-running agents to learn and adapt without resetting context.

🔌 Tool & API Integration

Protocols and tools for connecting agents to external services and APIs.

Model Context Protocol (MCP) - Open protocol for connecting AI models to external tools and data sources.
mcp-gateway - Gateway server for routing and managing MCP protocol connections.
Composio - Integration platform for AI agents — 150+ tools with managed auth.
Toolhouse - Cloud infrastructure for AI tool use — store, manage, and execute tools.
LangChain Tools - Extensive collection of tool integrations within the LangChain ecosystem.
Arcade AI - Tool calling platform for AI agents and assistants.
E2B - Open-source cloud runtime for AI agents — secure sandboxed environments for code execution.
Browser Use - Make websites accessible for AI agents with browser automation.
Firecrawl - 🆕 Turn websites into LLM-ready data. Crawl and convert any website for AI.
Crawl4AI - 🆕 Open-source LLM-friendly web crawler and scraper.
Stagehand - 🆕 AI-powered browser automation framework by Browserbase.
AgentQL - 🆕 Query language for AI agents to interact with web pages semantically.
StackOne - 🆕 Unified API for AI agent integrations across HR, CRM, and ATS platforms.
AWS MCP Server - 🆕 GA May 6, 2026. AWS-managed MCP server giving coding agents secure, auditable access to any AWS API; sandboxed Python execution for multi-step ops; replaces "agent SOPs" with agent skills. First-party from AWS.
Google Workspace MCP Server - 🆕 Rollout from May 1, 2026. Workspace-native MCP server exposing Gmail / Drive / Calendar / Docs / Sheets to MCP clients, with admin-controlled OAuth scopes and audit trails.
iManage MCP Server - 🆕 May 14, 2026. Native MCP endpoint for the iManage knowledge-work platform — lets any AI client securely read/write iManage documents without custom integration. First major legal/professional-services SaaS to ship a public MCP server.
Power Platform Canvas Authoring MCP Server - 🆕 May 14, 2026. Microsoft Power Platform feature exposing Canvas Apps authoring as an MCP server; lets Copilot / Claude Code drive natural-language InfoPath → Canvas Apps migration.
The Colony - ⚠️ Unverified. Self-described public agent-first social network with REST API for agent posts/votes/DMs and SDKs in Python (colony-sdk-python), TypeScript (colony-sdk-js) and Go (colony-sdk-go). Organisation and SDK repos are <30 days old, all 0–2 stars, single-maintainer; same submission was sent to 15+ awesome lists in parallel — listed for visibility, evaluate before depending on it.

🧪 Agent Sandboxing & Compute Isolation

Secure runtimes that let agents execute generated code and shell commands without compromising the host. Critical infrastructure once you let an agent off the leash.

E2B - Open-source secure cloud sandbox for AI-generated code. Used as the execution layer in OpenAI Agents SDK and many production agents.
Daytona - 🆕 Secure, elastic infrastructure for running AI-generated code. Spin up isolated dev environments per agent task; AGPL-3.0.
Modal - Serverless cloud platform popular for agent compute, GPU jobs, and sandboxed Python — modal-client is the official SDK.
Microsandbox - 🆕 Local, programmable microVM sandboxes for AI agents — secure code execution on your own machine, no cloud dependency.
SandboxFusion - ByteDance's multi-language code-execution sandbox built for agent / model evaluation pipelines. Apache-2.0.
Northflank - General-purpose container PaaS used as an agent runtime backend (per-task ephemeral environments, GPU pools).
Firecracker - The microVM kernel underneath E2B, Daytona and most agent sandboxes. Useful as a primitive when building your own.

🛡️ Agent Security

Tools and frameworks for securing AI agents against prompt injection, data leaks, and misuse.

AgentGate - 🆕 Pre-execution authorization PDP for autonomous AI agents. Scores trust across 4 dimensions per request, detects 24h kill-chain patterns (BULK_READ_THEN_EXFIL, SENSITIVITY_RAMP), Merkle-chained audit trail. MIT license, drop-in with LangGraph, LangChain, and AutoGen. tryagentgate.com
prompt-firewall - Firewall for LLM prompts — detect and block prompt injection attacks.
LLM Guard - The Security Toolkit for LLM Interactions — input/output scanners for AI.
Rebuff - 📦 Archived (2024-08). Self-hardening prompt injection detector — detect, deflect, and report. Listed for historical reference; no longer maintained.
Guardrails AI - Adding guardrails to large language models — validate and correct LLM outputs.
NeMo Guardrails - Toolkit for adding programmable guardrails to LLM-based conversational systems.
Vigil - 💤 Stale (no commits since 2024-01). LLM security scanner — detect prompt injections, jailbreaks, and data leakage.
Lakera Guard - Enterprise-grade AI security platform for prompt injection defense.
Garak - LLM vulnerability scanner by NVIDIA — probe for weaknesses in language models.
Invariant Guardrails - 🆕 Runtime guardrails for AI agents — policy enforcement and safety checks.
Prompt Armor - 🆕 Enterprise prompt injection protection with real-time detection.
Descope MCP Auth - 🆕 Authentication and authorization layer for MCP server security.
AgentDojo - 🆕 ETH Zürich research benchmark for evaluating prompt-injection attacks and defenses against tool-using LLM agents.
ModelScan - Scan ML model files (Pickle, PyTorch, TF) for serialization-based code-execution attacks.
PyRIT - Microsoft's Python Risk Identification Tool for generative AI — automated red-teaming framework.
RAMPART - 🆕 May 20, 2026. Microsoft's pytest-native safety + security testing framework for agentic AI. Developer-facing white-box counterpart to PyRIT — cross-prompt-injection probes, benign-failure asserts, harm-category coverage, statistical thresholds (e.g. safe in 80%+ runs). Integrates straight into CI/CD. MIT.
Clarity (Microsoft) - 🆕 May 20, 2026. Companion to RAMPART. Structured design-review tool for AI agents — "living artifacts" documenting intent, risks, and behavior before code is written. Open-sourced from Microsoft AI Red Team's internal practice.
Nobulex - ⚠️ Unverified. Cryptographic receipts for AI agent actions (Ed25519 dual signatures, hash-chained audit logs). MIT. Bilateral-receipt primitive merged into Microsoft's Agent Governance Toolkit (PRs #1302, #1333). Same submission sent to 15+ awesome lists in parallel; submitter's claim of "4,500 npm downloads" doesn't match registry data (@nobulex/mcp-server ~19/month at audit time). Listed for visibility on the strength of the Microsoft adoption.

🔍 RAG & Knowledge

Retrieval-augmented generation and knowledge management systems for agents.

LlamaIndex - Data framework for LLM-based applications — ingest, structure, and access private data.
Haystack - End-to-end LLM framework for building RAG pipelines and search systems.
Unstructured - Open-source components for pre-processing documents for LLMs and RAG.
Chroma - AI-native open-source embedding database.
Weaviate - Open-source vector database for AI-native applications.
Qdrant - High-performance vector similarity search engine and database.
Pinecone - Managed vector database for high-performance AI applications.
Milvus - Cloud-native vector database for scalable similarity search.
RAGFlow - Open-source RAG engine based on deep document understanding.
Docling - Document parsing and conversion for RAG and generative AI.
Kotaemon - 🆕 Open-source RAG-based tool for chatting with documents.
LightRAG - 🆕 Simple and fast RAG engine with graph-based knowledge indexing.
R2R - 🆕 Production-ready RAG engine with built-in auth, observability, and ingestion.
Vanna - 📦 Archived (2026-02). RAG for SQL — chat with your database using natural language.
Morphik - 🆕 Multimodal RAG engine for documents containing tables, figures, and charts; rapidly-rising 2026 alternative to LlamaIndex for complex PDFs.
Cognee - 🆕 Memory + reasoning engine that builds a knowledge graph as agents ingest documents; 2026 darling for "long-running research agent" stacks.

💻 Coding Agents

AI-powered coding assistants and autonomous software engineering agents.

Terminal & CLI Agents

Claude Code - Anthropic's agentic coding tool. 80.9% SWE-bench score, handles complex multi-file bugs. May 2026 (v2.1.128–2.1.141): new /goal command for cross-turn completion conditions, agent view, plugin loading from .zip archives + URLs, Ctrl+R global history search, broader MCP/hook handling, enterprise feedback surveys.
Codex CLI - OpenAI's open-source terminal coding agent (Rust, Apache-2.0, 82K+ stars). 77.3% Terminal-Bench score. May 2026 adds Codex Chrome extension for in-browser DevTools workflows, codex remote-control headless app-server, plugin-detail bundled-hook display, and Codex on Mobile preview (May 14) that lets ChatGPT iOS/Android remote-control the macOS Codex app.
Codex Security - 🆕 March 2026. Application-security agent that finds and fixes software vulnerabilities; available to OSS maintainers via the Codex-for-OSS program.
Aider - AI pair programming in your terminal — works with any LLM, with first-class git commit handling.
Gemini CLI - 🆕 Google's terminal-first coding agent for large-context refactors.
Grok Build - 🆕 May 14, 2026 (early beta). xAI's agentic CLI coding agent powered by grok-code-fast-1. Parallel sub-agents in isolated environments, daily release notes, available to SuperGrok Heavy subscribers ($99/mo intro for 6 months, then $300/mo). xAI's reply to Claude Code and Codex CLI.
Antigravity CLI - 🆕 May 19, 2026 (Google I/O 2026). Lightweight CLI companion to Antigravity 2.0 — create and interact with Google agent harnesses directly from the terminal. macOS / Linux / Windows.

IDE-Based Agents

Cursor 3.09 - 🆕 April 3, 2026 update. Strengthened Agent mode for true Vibe Coding workflows. Core AI code editor in 2026 landscape.
Kilo Code - 🆕 April 2026 rising challenger to Cursor. Default model: MiniMax. Viral on Chinese developer communities (Bilibili).
Cursor - The AI code editor with Feb 2026 update supporting up to 8 parallel agents.
Windsurf - Agentic IDE by Codeium — AI-first code editing experience.
Cline - Autonomous coding agent in your IDE — VS Code extension.
Roo Code - 🆕 Open-source VS Code extension that reads/writes across multiple files, runs commands, model-agnostic; free except for the LLM API you bring.
Void - 🆕 Fork of VS Code positioned as the open-source Cursor alternative; data stays with you, BYO model.
Continue - Open-source AI code assistant for VS Code and JetBrains.
GitHub Copilot - Agent mode with expanded model access and gh copilot shell integration in early 2026.
Cursor 3.3 - 🆕 May 2026. PR-review experience, parallel agents, enterprise model controls; previous 3.1 in April.
Cursor SDK - 🆕 May 4, 2026. TypeScript SDK exposing Cursor's runtime, harness, and models so developers can build programmatic agents on top of the Cursor stack; ships with the v2.5 security patch fixing an arbitrary-code-execution vulnerability via malicious git repos.
Cursor 3.4 (Teams + PR review) - 🆕 May 11–13, 2026. Microsoft Teams integration (@Cursor in Teams delegates to cloud agents), faster parallel-agent plan execution, multi-repo / Dockerfile-based dev-environment configs for agents, /multitask async sub-agents, Vulnerability Scanner, granular per-model access controls.
Kiro - AWS autonomous agent. Spec-driven development, manages up to 10 simultaneous tasks.
Amazon Q Developer - AI coding companion deeply integrated with AWS ecosystem.
Visual Studio 2026 Agent Mode + Skills - 🆕 VS 2026 Insiders May 12-15, 2026. Copilot Chat "Agent Mode" now ships a guided Skills workflow inside Visual Studio 2026: discover, manage, and author reusable Copilot Skills with whole-solution context, plus terminal command execution and tool invocation.
JetBrains Rider AI Test-Writing Skill - 🆕 May 22, 2026. New AI Assistant skill for JetBrains Rider that surfaces .NET coverage data to Claude Code / Codex so agents target untested branches, reducing AI cost for test generation.

Autonomous Software Engineers

Cursor 3.4 Cloud Agent Environments - 🆕 May 13, 2026. New dev environments for cloud agents: multi-repo workspaces, Dockerfile-based config with build secrets, 70% faster cached image layers, per-environment version history with rollback, audit logs, scoped egress and secrets. Companion to the Cursor 3.4 release.
Devin 3.0 - 🆕 By Cognition. Dynamic re-planning, self-healing code, legacy codebase migration, multi-modal input (UI mockups, video recordings).
Devin 2.2 - 🆕 February 2026. Sandboxed terminal + editor + browser; commercial product (Core $20/mo, Team $500/mo).
OpenHands - Open-source platform for AI software developers as autonomous agents.
SWE-agent - Turn LLMs into software engineering agents that fix real GitHub issues.
Devika - 💤 Stale (no commits since 2025-09). Agentic AI software engineer — open-source alternative to Devin.
GPT Engineer - 📦 Archived (2025-05). Specify what you want built, AI asks for clarification, then builds it. Foundational project of the autonomous-coding era, kept for historical reference.
Codegen - 🆕 Programmatic code manipulation and multi-file refactoring SDK.
Qodo - 🆕 AI Code Review Platform focused on quality, security, and test generation.
Google Antigravity 2.0 - 🆕 May 19, 2026 (Google I/O 2026). Standalone desktop application (macOS / Linux / Windows) for orchestrating multiple agents in parallel. Adds scheduled cron-style runs, async long-running tasks, dynamic sub-agents, and integrations with AI Studio / Android / Firebase. Companion Antigravity SDK lets you host the harness on your own infra; enterprise edition lands inside Gemini Enterprise Agent Platform.

🤖 Physical AI & Embodied Agents

AI systems that perceive, reason about, and act in the physical world — humanoid robots, factory automation, Physical AI infrastructure. The next wave after language agents.

Foundational Models & Research

Google Gemini Robotics ER-1.6 - 🆕 April 14, 2026. Robotics AI model with enhanced spatial and physical reasoning. Integrated into real robots via Agile Robotics partnership.
Project Prometheus (Bezos) - 🆕 Jeff Bezos-led Physical AI venture. Raising $10B at $38B valuation to embed AI into physical systems and robotics.
NVIDIA Isaac GR00T - NVIDIA's foundation model platform for humanoid robots. Unveiled at GTC, expanded at Hannover Messe 2026.
NVIDIA Industrial AI Cloud - 🆕 April 2026 (Hannover Messe). Deutsche Telekom-built AI factory infrastructure for industrial AI workloads.

Humanoid Robots

Tesla Optimus Gen3 (V3) - 🆕 AWE 2026 Shanghai debut. First mass-produced Optimus; Fremont line started January 2026, 50K-100K units/year initial target, ~$30K USD initial price, late-2026 limited external sales. 37 joints, 1.2 m/s walking, 22-DoF hands.
Figure 03 (Helix AI) - 🆕 Late 2025 announcement, ramping in 2026. First Figure model designed for the home: soft textile coverings, wireless charging, tactile sensors. May 2026 demo: two F.03 robots autonomously cleaning a room and making a bed in <2 minutes via visual coordination only.
Figure 04 - 🆕 May 13, 2026. Founder Brett Adcock announces Figure 04 design finalized; component deliveries underway. Successor to F.03 with the Helix VLA model.
Helix 02 package-sort 72h run - 🆕 May 13-16, 2026. Live-streamed Figure F.03 fleet runs Helix 02 fully autonomously on a package-sort line — 8-hour shift on day one (~22K packages), then ~30K in the first 24 hours, ending in a stress test that hit ~88K packages over ~72 hours before mechanical failure. First public continuous-run evidence for a home-form-factor humanoid stack.
Figure F.03 vs human 8-hour sort challenge - 🆕 May 18, 2026. Figure runs the first public head-to-head: one F.03 robot vs one trained human, 8-hour shift on the same package-sort line. Human wins narrowly — 12,924 parcels (2.79 s/item) vs 12,732 parcels (2.83 s/item). Tightest published gap to human throughput on a real industrial task to date.
Boston Dynamics Atlas 100-lb manipulation + Hyundai 25K plan - 🆕 May 18-19, 2026. Boston Dynamics publishes video + technical blog showing Atlas lifting and carrying >100 lb loads (mini-fridge / washing machine) via RL + large-scale sim training; whole-body control adapts to weight shifts without per-object identification. Hyundai Motor Group commits to deploying 25,000+ Atlas units across Hyundai/Kia plants starting 2028 in Georgia.
Unitree G1 deployed at JAL Haneda - 🆕 May 2026. Japan Airlines starts a Haneda ground-operations trial with Unitree G1 humanoids (baggage loading, container transport, cabin cleaning) — marketed as the first commercial airline trial of bipedal robots in active aviation service. US Congress separately moves to add Unitree to the entity list on national-security grounds the same week, underscoring how fast the embodied-AI supply chain is becoming geopolitical.
Figure 02 + Helix 02 - 🆕 January 2026. Helix 02 expands whole-body autonomy (load/unload dishwashers, fold laundry); BotQ facility rated for 12K units/year.
Unitree G1 + H1-2 - 🆕 CES 2026. G1 dance/boxing/skating demos, autonomous kung fu (February), 5'8" H1-2 industrial unit at 7.4 mph. 20K humanoid shipments targeted in 2026.
Unitree R1 Air - 🆕 Consumer humanoid at $4,900 — runs, flips, walks on hands.
Unitree Gen 2 (lifelike skin) - 🆕 Realistic human-like skin with embedded pressure / temperature / touch sensors.
Unitree GD01 - 🆕 May 2026. Nearly 10-foot manned mecha; pilot-driven, switches between bipedal and quadrupedal modes. Priced from ¥3.9M (~$650K). Tracks how the embodied-agent stack is starting to fork into operator-piloted form factors.
Honor (荣耀) Humanoid - 🆕 Set world record at 2026 half-marathon for humanoid robots.
Zhiyuan (智元) AGIBOT - 🆕 April 2026. New humanoid body, foundation model, and solution suite. Calls 2026 "Deployment Year Zero."
Unitree H-series - Boston Dynamics competitor from China. Ongoing 2026 iterations.
Agile Robotics - 🆕 Gemini Robotics ER-1.6 deployment partner. German robotics company.
Shenzhen Humanoid Pilot Line - 🆕 🇨🇳 Shenzhen launched its first pilot production line for humanoid robots on April 12, 2026 (Leju Robotics + Dongfang Precision in Longhua District). 2-hour assembly cycle, 500–1,000 units/year, with mass production moving to a 10,000-units/year Foshan facility.

Consumer Robotics & Wearables

Doubao AI Glasses (ByteDance) - 🆕 Q2 2026 launch. Real-time translation, object recognition, Doubao LLM integration.
Nothing AI Glasses/Earbuds - 🆕 Announced April 2026. AI-integrated smart wearables.
Samsung Galaxy S26 (Gauss 2.3) - On-device agentic AI. Gauss 2.3 Think and Gauss O Flash variants.
Meta Ray-Ban Stories 3 - Continued iteration with deeper Llama integration.

Autonomous Driving

Tesla FSD v13 - Expanding L4-capable deployment across major markets.
Waymo - Continuing commercial L4 rollout in US cities through 2026.
WeRide / Pony.ai / Baidu Apollo - 🇨🇳 Chinese L4 fleets expanding operational zones.

🎮 Agent Simulation & World Models

Research environments where agents are trained, observed, or stress-tested in simulated worlds. Increasingly relevant as world-model and embodied research bleeds into language-agent design.

Generative Agents - 💤 Stanford's seminal Smallville simulacrum (Park et al., 2023). Memory + reflection + planning in a town of 25 LLM-driven characters. Reference implementation that influenced almost every multi-agent paper since.
Voyager - 💤 First lifelong-learning agent in Minecraft — GPT-4 with skill library and curriculum (Wang et al., 2023). Still the canonical open-ended agent benchmark.
SWE-Gym - Open environment to train SWE agents on real GitHub issues; companion to SWE-bench.
WebArena - Realistic, reproducible web environment (Reddit / shopping / GitLab clones) used by OSWorld and most browser-agent papers.
WorkArena - ServiceNow's enterprise workplace benchmark for browser agents.
Genie 3 / Genie 4 - Google DeepMind's interactive video world models — generate playable 3D worlds from a prompt. Closed-weights research, no public code.
NVIDIA Cosmos - NVIDIA's foundation world model for embodied AI / robotics — generate physically plausible video futures.

📊 Benchmarks & Leaderboards

Standard evaluation suites and live leaderboards tracking frontier AI capability as of 2026.

BenchLM - 🆕 Composite leaderboard that aggregates multiple benchmark families. April 2026 top: Claude Mythos Preview 99, Gemini 3.1 Pro / GPT-5.4 tied at 94, Claude Opus 4.6 / GPT-5.4 Pro at 92, GLM-5 Reasoning 85 (top open).
SWE-bench Verified - Real-world GitHub issue resolution benchmark. April 2026 top: Claude Mythos 93.9%, Claude Opus 4.7 87.6%.
GPQA Diamond - 💤 Stale dataset repo (last update 2024-09). Expert-level science reasoning. April 2026 top: Gemini 3.1 Pro 94.3% (world-record), Claude Opus 4.7 94.2%.
ARC-AGI 2 - Abstract reasoning over novel tasks. Gemini 3.1 Pro 77.1%.
OSWorld - Desktop GUI manipulation. GPT-5.4 at 75% (exceeded human baseline).
LMArena (formerly Chatbot Arena) - Crowdsourced chat preference battles. Opus 4.6 currently leads.
MMLU-Pro - Multi-task language understanding, harder successor to MMLU.
LiveCodeBench - Contest-style coding benchmark, updated continuously to resist contamination.
AIME 2025 / Humanity's Last Exam (HLE) - Elite math / PhD-level general reasoning.
Terminal-Bench - CLI agent evaluation. Codex CLI 77.3%.
Wolfram LLM Benchmarking Project - Code generation benchmark from English spec to Wolfram Language. Updated continuously.
Terminal-Bench 2.0 - 🆕 Late 2025 / early 2026. 89 curated terminal tasks (compile, train, configure, debug). May 2026 leader: GPT-5.5 82.7%, Claude Opus 4.7 69.4%.
GDPval / GDPval-MM - 🆕 Feb 2026. OpenAI economic-value benchmark across 44 occupations / 9 industries; 1,320 expert-built tasks. May 2026 leader: GPT-5.5 84.9% on GDPval-MM.
SWE-bench Pro - 🆕 Repository-level engineering successor to Verified. Claude Opus 4.7 64.3% > GPT-5.5 58.6% (Claude leads on long-horizon repo work).
Hieroglyphic Benchmark - 🆕 Lateral / abstract-reasoning benchmark; Gemini 3.5 "Snowbunny" 80% (leaked).
LLM-Stats Live Leaderboard - 🆕 Continuously-refreshed cross-benchmark dashboard for newly-released models.

🖥️ Computer Use & Desktop Agents

AI agents that can see, control, and automate desktop environments at the OS level. For purely browser-based agents see 🌐 Browser & Web Agents.

Claude Computer Use - 🆕 Anthropic's "Desktop Intelligence" — Claude sees your screen and uses mouse/keyboard to automate any software.
OpenAI Operator - 🆕 Browser agent for booking, form-filling, and web task automation.
Google Project Mariner - 📦 Discontinued (2026-05-04). Browser-agent research project; capabilities merged into Gemini Agent.
Microsoft Copilot Agents - 🆕 Autonomous background agents across the Microsoft 365 stack. Beyond sidebar — executes tasks and surfaces for approvals.
Open Interpreter - A natural language interface for computers — let LLMs run code locally.
Manus AI - 🆕 🇨🇳 Autonomous general-purpose AI agent with cloud-to-local hybrid model. Handles research, coding, and complex multi-step tasks.
Genspark - 🆕 All-in-one autonomous work agent with mixture-of-agents architecture. Can make phone calls.
Perplexity Computer - 🆕 Research-focused desktop agent with multi-model orchestration and local file access.
Beam AI - 🆕 Self-learning desktop agents that refine logic based on successful outcomes.
ChatGPT Workspace Agents - 🆕 Research preview April 22, 2026; credit-based pricing May 6, 2026; EKM support May 7, 2026. OpenAI's successor to Custom GPTs for enterprises — cloud-side agents with file access, code execution, scheduled runs and built-in connectors for Slack, Google Drive, Salesforce. Available on Business / Enterprise / Edu / Teachers; powered by Codex.

🌐 Browser & Web Agents

Frameworks and infrastructure for agents that interact with the web through real browsers — navigate, click, scrape, and complete multi-page workflows.

Browser Use - Make websites accessible for AI agents with browser automation. The de-facto open-source choice in 2026, 92K stars.
Stagehand - The SDK for browser agents — typed act/extract/observe primitives over Playwright by Browserbase. MIT.
Steel Browser - 🆕 Open-source browser API for AI agents — batteries-included sandboxed Chromium with session persistence and proxy rotation. Apache-2.0.
Skyvern - Automate browser-based workflows with LLMs and computer vision. AGPL-3.0.
AgentQL - Query language + Playwright integration for semantic web extraction. Reliable on dynamic, cluttered pages.
Hyperbrowser MCP - 🆕 Hosted headless-browser fleet exposed as an MCP server — plug into Claude/GPT/LangChain via the standard tool interface.
Playwright MCP - 🆕 Microsoft's official Playwright server exposed as an MCP tool. Production-grade automation primitives without rolling your own bridge.
MultiOn - Hosted browser agent platform with native Reasoning + Memory. Closed-source.
Browserbase - Headless browser infrastructure built specifically for AI agents — stealth, persistence, captcha solving, observability.

🗣️ Voice & Multimodal Agents

Voice-enabled and multimodal AI agent platforms.

ElevenLabs - AI voice platform with conversational AI agents and realistic speech synthesis.
Vapi - Enterprise voice AI platform — build, test, and deploy voice agents. $50M Series B announced May 12, 2026 after crossing 1B platform calls; May 2026 updates ship Squads v2 (multi-assistant orchestration), Composer alpha (prompt-built agents), Simulations alpha (systematic AI-powered testing), and GA of the Soniox low-latency multilingual transcriber.
Retell AI - Build production-ready conversational voice AI agents.
Bland AI - AI phone calling platform — enterprise-grade conversational AI.
LiveKit Agents - Build real-time multimodal AI agents with voice, video, and data.
Pipecat - Open-source framework for voice and multimodal conversational AI.
Vocode - 💤 Stale (no commits since 2024-11). Open-source library for building voice-based LLM agents.
Bolna - End-to-end open-source voice AI agents framework.
Cartesia - 🆕 Ultra-low-latency voice AI for real-time conversational agents.
Meta Voice AI - 🆕 Former PlayHT/Play.ai team's tech, integrated into Meta AI, AI Characters, and Meta wearables after July 2025 acquisition. Original Play.ai platform shut down Dec 31, 2025.
Sesame - 🆕 Voice AI companion with emotional understanding and natural conversation.
ElevenAgents - 🆕 ElevenLabs' full-stack voice-agent platform (April-May 2026 updates): MCP, multimodal messages, conversation topic discovery, knowledge-base search, pre-tool speech controls. First voice-agent platform to earn AIUC-1 certification.
Cartesia Line - 🆕 April 2026. Code-first voice-agent platform built on Sonic 3 TTS + Ink STT; ~40-90ms time-to-first-audio.
Deepgram Voice Agent API - 🆕 Single endpoint bundling STT (Nova-3) + LLM routing + TTS (Aura-2) + Flux conversational STT with mid-call language switching across 10 languages.
OpenAI Realtime API (GPT-Realtime-2) - 🆕 May 8, 2026. GPT-5-class reasoning over voice with parallel tool calls; supersedes the previous Realtime models for production voice agents.
OpenYabby - 🆕 Open-source macOS voice-driven multi-agent orchestrator — Realtime API + CLI runners + multi-channel orchestration. A lead agent plans the work and delegates to sub-agents for review and QA. MIT.

📱 Personal AI Agents

AI agents designed for personal use, productivity, and daily life assistance.

OpenClaw - 🆕 Personal AI agent platform with skills, memory, Dreaming, Canvas/A2UI, ACP coding harness integration. Runs on your machine with multi-channel messaging.
Rabbit R1 - Dedicated AI hardware device with a large action model for personal assistance.
Limitless - Personalized AI powered by what you've seen, said, and heard (formerly Rewind).
Open Interpreter - A natural language interface for computers — let LLMs run code locally.
01 Light - 💤 Stale (no commits since 2024-11). Open-source voice interface for computers.
Leon - Open-source personal assistant — lives on your server.
Khoj - Personal AI second brain — search and chat with your notes, docs, and images.
Humane AI Pin - Wearable AI device with a screenless, ambient computing experience.
Arahi AI - 🆕 Personal productivity and business automation assistant.
Lindy AI - 🆕 No-code AI agent for email, calendar, and workflow automation.
MuleRun - 🆕 Always-on agents for recurring tasks and background automation.
Gemini Intelligence - 🆕 May 12, 2026 (Android Show: I/O Edition). Proactive agentic AI features integrated into Googlebooks laptops, Wear OS, Android Auto, Android XR, and starting on the latest Samsung Galaxy + Pixel devices. Auto-creates shopping carts from grocery lists, books spin classes, filler-word removal via the Rambler speech-to-text.
Gemini Spark - 🆕 May 14, 2026 (pre-I/O leak / insight). Upcoming branded agent capability inside the Gemini app for autonomously running multi-step processes; sits above Gemini 3.1 Pro reasoning stack.
QwenPaw - 🆕 🇨🇳 May 2026 rebrand from CoPaw. Self-hostable personal assistant in the Qwen / AgentScope family. Local-first memory, hot-loadable skills, multi-agent collaboration, multi-channel (DingTalk / Feishu / WeChat / Discord / Telegram), tool guard + skill scanner. Apache-2.0.
AI Growth Agents for Marketers - 🆕 Growth marketing prompts and Python agents built from real fintech campaigns in Southeast Asia. Covers campaign briefs, MEU planning, and A/B test analysis with multi-agent workflows. Agent Skills format — installable via npx skills add. Bilingual VI + EN. MIT.

📱 Mobile Agents

GUI agents that drive Android/iOS phones — the next frontier after desktop computer-use. Most major model providers now ship a mobile-grounded variant.

Mobile-Agent - 🇨🇳 Alibaba's flagship multimodal phone-control agent family (v1 → v3, plus Mobile-Agent-E and Mobile-Agent-V). State-of-the-art on Android benchmarks.
AppAgent - 💤 Tencent's multimodal agent that operates smartphone apps by tapping/swiping. Influential early implementation.
Apple Intelligence - On-device agent layer in iOS / iPadOS / macOS. App Intents and screen-aware actions across the OS.
Samsung Galaxy AI / Bixby 2.0 - On-device Gauss-powered agentic capabilities baked into the Galaxy S26 line.
Google Gemini for Android - Replaces Google Assistant on Android with full Gemini-powered, app-aware actions including system intents and Workspace.
Magma - Microsoft Research foundation model for multimodal agents — grounds across UI, robotics, and physical action; targets phones, web, and embodied tasks.

🏢 Enterprise Agent Platforms

Enterprise-grade platforms for deploying AI agents at scale.

Salesforce Agentforce - Autonomous AI agents for enterprise CRM — sales, service, and marketing.
Microsoft Copilot Studio - Build and customize AI agents and copilots for your organization.
Gemini Enterprise Agent Platform - 🆕 April 22, 2026 (Google Cloud Next '26). Evolution of Vertex AI into a unified hub for building, scaling, governing, and optimizing enterprise agents. Supports Gemini 3.1 Pro/Flash, Lyria 3, plus third-party models (Claude Opus/Sonnet/Haiku). Integrated agent DevOps, security, and orchestration.
Google Vertex AI Agent Builder - Build and deploy enterprise-ready generative AI agents on Google Cloud.
Amazon Bedrock Agents - Build AI agents that can execute multi-step tasks across company systems.
ServiceNow AI Agents - AI agents for enterprise IT service management with AI Control Tower. 🆕
IBM watsonx Orchestrate - AI assistant platform to automate work across enterprise applications.
Oracle AI Agents - Enterprise AI agents integrated with Oracle Fusion Cloud ERP. 🆕
Moveworks - Enterprise copilot platform — AI that works across every system.
UiPath Agentic Automation - 🆕 Agentic reasoning layered onto RPA bot estates for intelligent process automation.
AgentX - 🆕 Agentic enterprise solution for scalable AI automation with plug-and-play chatbots.
Sistava - AI agent orchestration platform for deploying and operating multiple AI agents that run sales, marketing, finance, and customer support. Reachable via Slack, WhatsApp, email, voice, Telegram, API, MCP, A2A, and webhooks, with full computer use on your own OS.
OutSystems - 🆕 AI development platform for rapidly building mission-critical apps and agent governance.
Sema4.ai - 🆕 Enterprise AI agent platform with Python-first approach and built-in governance.
SAP Business AI Platform + Joule Studio 2.0 - 🆕 SAP Sapphire 2026 (May 11-13). SAP unifies BTP + Business Data Cloud + Business AI into one platform and reframes Joule as an agentic operating layer. Joule Studio 2.0 (rolling out June 2026) lets enterprises build with LangGraph / AutoGen-style frameworks against live SAP business data; the new Autonomous Suite ships 50+ domain Joule Assistants and 200+ specialised agents across finance, supply chain, procurement, HCM, and CX.
Microsoft Agent 365 + Microsoft 365 E7 - 🆕 May 1, 2026 GA with extended May rollouts. Identity-first control plane for governing and securing AI agents across enterprise environments; $15/user/month standalone, $99/user/month inside the new Microsoft 365 E7 "Frontier" suite. May 2026 update adds AWS Bedrock + Google Cloud registry sync, Intune/Defender preview policies, and SASE for agents.
OpenAI Guaranteed Capacity (Compute Annual Pass) - 🆕 May 19, 2026. Long-term enterprise compute reservations (1 / 2 / 3-year terms, larger discounts at longer terms) sold as a structured product. Designed to derisk enterprise rollout of GPT-5.5-class agents — OpenAI's reply to the Anthropic Priority Tier model.
Bristol Myers Squibb ↔ Claude Enterprise - 🆕 May 20, 2026. BMS standardises on Claude Enterprise as its shared intelligence platform for 30,000+ employees, embedding agentic Claude into drug-discovery / development / delivery pipelines. First top-5 pharma to make a public, company-wide Claude commitment.
Kore.ai Artemis Agent Platform - 🆕 May 22, 2026 (launched on Azure). AI-native enterprise agent platform built around the new YAML-style Agent Blueprint Language (ABL) for declarative multi-agent workflows. Kore.ai's structural challenge to Copilot Studio and Agentforce.
FPT Flezi Foundry™ - 🆕 May 22, 2026. AI-augmented delivery platform with two governed Service-as-a-Software modes — Agentic Development Lifecycle (ADLC) for full SDLC agent crews and Agentic Managed Services (AMS) for incident-resolution agents on top of existing ITOps.
GreenOps Agent - 🆕 A 4-agent GCP cost and carbon optimization pipeline built on Google ADK, Gemini Flash, and Cloud Run. Detects idle VMs, unattached disks, and unused reserved IPs to calculate CO₂ footprint and execute cleanups with human approval.

📊 Agent Evaluation & Observability

Tools for testing, evaluating, and monitoring AI agents in production.

AgentBench - Multi-dimensional benchmark for evaluating LLMs as agents.
LangSmith - Platform for debugging, testing, evaluating, and monitoring LLM applications.
Helicone - Open-source LLM observability platform — logs, metrics, and traces.
Braintrust - Enterprise-grade stack for building AI products — eval, prompt playground, logging.
Arize Phoenix - AI observability & evaluation — traces, evals, and datasets.
Langfuse - Open-source LLM engineering platform — traces, evals, prompt management. Acquired by ClickHouse Jan 2026; March 2026 shift to an observations-centric data model, April 2026 added Langfuse Cloud Japan + Experiments + Langfuse Academy + LLM-as-a-Judge API; v4 self-host release queued.
OpenLLMetry - Open-source observability for LLM applications based on OpenTelemetry.
Weights & Biases Weave - Toolkit for developing, evaluating, and monitoring AI applications.
SWE-bench - Benchmark for evaluating LLMs on real-world software engineering problems.
Terminal-Bench - 🆕 Benchmark for terminal-based coding agent evaluation. Maintained by Harbor Framework.
LMArena (formerly LMSYS Chatbot Arena) - 🆕 Crowdsourced LLM benchmark using human preference voting. LMSYS rebranded to LMArena in 2025.
Patronus AI - 🆕 Automated LLM evaluation and red-teaming platform.
DeepEval - Pytest-style LLM eval framework with 14+ built-in metrics (G-Eval, hallucination, faithfulness). Most-starred open-source eval lib in 2026. Apache-2.0.
Agenta - 🆕 Open-source LLMOps platform combining prompt playground, prompt management, evaluation, and observability.
LangSmith SDK - Official client SDK for LangChain's hosted observability platform.
AutoEvals - Standalone library of best-practice LLM eval scorers (factuality, JSON validity, semantic similarity, etc.) by Braintrust. Drop-in for any framework.
BenchClaw - ⚠️ Unverified. Self-described multi-dimensional agent evaluation harness (17-judge tribunal, deception detectors, 10 scoring dimensions). Repo is single-maintainer with very low independent adoption; the same submission was sent to 8+ awesome lists in parallel — one was merged at eudk/awesome-ai-tools, the rest are pending or declined. Listed for visibility, evaluate before relying on its scores.
PromptEden - ⚠️ Unverified. Commercial AI-visibility monitoring service — tracks how ChatGPT, Claude, Gemini, Perplexity, Copilot, and Grok describe brands and which competitors they recommend, refreshed daily across 9+ platforms. Submitted to 10 awesome lists on the same day — promising category but listed for visibility only, evaluate before purchasing.

🔬 AI Research Tools

Tools and platforms for AI/ML research, experimentation, and development.

Hugging Face - The AI community's platform — models, datasets, and Spaces for ML research.
vLLM - 🆕 High-throughput LLM serving engine with PagedAttention.
Ollama - Run LLMs locally with a simple API. Supports Llama, Mistral, Qwen, and more.
LM Studio - Desktop app for running local LLMs with a user-friendly interface.
SGLang - 🆕 Fast serving framework for large language and vision models.
llama.cpp - LLM inference in C/C++ — run models on consumer hardware.
MLX - 🆕 Apple's array framework for ML on Apple silicon.
Unsloth - 🆕 Fine-tune LLMs 2x faster with 70% less memory.
OpenRouter - 🆕 Unified API for accessing 200+ AI models from all major providers.
Weights & Biases - ML experiment tracking, dataset versioning, and model management.
Label Studio - Multi-type data labeling and annotation tool.

📚 Learning Resources

Papers, courses, tutorials, and guides for understanding and building AI agents.

Papers

ReAct: Synergizing Reasoning and Acting in Language Models - The foundational paper on reasoning + acting in LLMs.
Toolformer: Language Models Can Teach Themselves to Use Tools - Teaching LLMs to use external tools autonomously.
Generative Agents: Interactive Simulacra of Human Behavior - Stanford's generative agent architecture with memory and reflection.
A Survey on Large Language Model based Autonomous Agents - Comprehensive survey of LLM-based autonomous agents.
The Rise and Potential of Large Language Model Based Agents - In-depth analysis of LLM agent capabilities and future directions.
Agent Hospital - A simulacrum of hospital with evolvable medical agents.
Multimodal Intelligence as the Dominant Paradigm in 2026 AI Systems - 🆕 Research on multimodal AI becoming the default paradigm.

Courses & Tutorials

DeepLearning.AI — AI Agents in LangGraph - Short course on building agents with LangGraph.
DeepLearning.AI — Multi AI Agent Systems with crewAI - Course on building multi-agent systems.
DeepLearning.AI — A2A Protocol - 🆕 Free course on Google's Agent-to-Agent protocol.
LangChain Academy - Free courses on LangChain, LangGraph, and agent development.
Hugging Face — Building AI Agents - Open course on building AI agents with open-source tools.
LLM Agents MOOC (Berkeley) - UC Berkeley course on LLM agents.
Microsoft Agent Framework Docs - 🆕 Official documentation for Microsoft's unified agent framework.
Hugging Face Agents Course - Free 5-unit course (notebooks + videos) on building production agents with smolagents, LangGraph, and Llama-Index.
Anthropic Cookbook - Official notebooks for tool use, computer use, agent patterns, prompt engineering, and Claude Code recipes.
Google Gemini Cookbook - Official Gemini API examples covering grounding, function calling, multimodal, and live audio.
LLM Course (Maxime Labonne) - End-to-end LLM curriculum from fundamentals to fine-tuning, with Colab notebooks. 79K stars.
Anthropic Courses - Anthropic's official educational courses on prompt engineering, real-world prompts, evals, and tool use.

Curated Lists

awesome-ai-agents - 💤 Stale (last update 2025-02). Curated list of AI autonomous agents by E2B — pre-2026 reference.
awesome-llm-agents - Curated list of LLM-powered agent resources.
awesome-mcp-servers - 🆕 Curated list of MCP server implementations.

🇨🇳 Chinese AI Ecosystem

Major projects from mainland-China teams or primarily targeting the Chinese market. Listed because the China stack is increasingly its own parallel ecosystem with distinct frameworks, models, and developer culture.

Foundation models from Chinese labs (Qwen, DeepSeek, GLM, Doubao, Kimi, Hunyuan, ERNIE) are listed under 🧠 Foundation Models directly.

Agent Platforms & Frameworks

Dify - Open-source LLM app development platform with visual agent builder. The dominant low-code agent canvas in Chinese tech.
Lobe Chat - Multi-agent chat workspace + plugin/agent marketplace. One of the highest-starred TypeScript AI projects. Apache-2.0.
Cozeloop - 🆕 ByteDance's open-source agent optimization platform from the Coze team.
AgentScope - Alibaba ModelScope's multi-agent framework with visual debugging and distributed execution. Apache-2.0.
Bisheng - Open enterprise LLM DevOps platform: workflows, RAG, agents, fine-tuning, evals. Apache-2.0.
MetaGPT - Multi-agent collaboration framework that assigns SOP roles (PM, architect, engineer) to LLMs. DeepWisdom.

RAG / Knowledge

FastGPT - Knowledge-base-first platform on top of LLMs: data ingestion, RAG retrieval, visual workflow orchestration.
QAnything - 💤 NetEase Youdao's question-answering engine over arbitrary local documents (PDF/Word/Excel/PPT).
RAGFlow - Deep-document-understanding RAG engine — strong on scanned PDFs, tables, and charts.
LightRAG - HKU Data Science Lab's lightweight graph-based RAG engine.

Personal & Productivity

AppFlowy - Open-source Notion alternative with AI workspace agents. AGPL-3.0.
Manus AI - General-purpose autonomous agent (Beijing-based Butterfly Effect). One of the most-watched 2026 agent products in Chinese tech.
Coze (扣才) - ByteDance's no-code agent builder. Mainland-only consumer surface; international counterpart is coze.com.
Tongyi Qianwen Agent - Alibaba's mass-market consumer agent, integrated across Taobao / DingTalk / Quark.
Doubao Agents - ByteDance's flagship consumer assistant on top of the Doubao model family.

Developer Tools

Kilo Code - 2026 viral Chinese-community challenger to Cursor. Default model: MiniMax.
Cherry Studio - Most-installed open-source desktop client for LLMs in Chinese dev circles — multi-provider chat with knowledge base.
ScienceOne 100 / 磐石100 - 🆕 Chinese Academy of Sciences scientific reasoning agent system, 50+ CAS institutes, 2,000+ research tools.

📝 Compare — Side-by-Side Tables

Quick decision matrices for the most common "which one do I pick?" questions in 2026.

🏗️ Agent Frameworks (open-source)

Framework	Language	Multi-Agent	State / Graph	Streaming	License	Best For
LangGraph	Python / JS	✅ native	✅ first-class	✅	MIT	Production stateful workflows
CrewAI	Python	✅ role-based	⚠️ task graph	✅	MIT	Role-playing agent teams
AutoGen / Microsoft Agent Framework	Python / .NET	✅ conversational	⚠️ group chat	✅	CC-BY-4.0 / MIT	Enterprise multi-agent chat
OpenAI Agents SDK	Python	✅ handoffs	❌	✅	MIT	OpenAI-native production
Mastra	TypeScript	✅	✅ workflows	✅	Elastic-2.0	TypeScript-first stack
Google ADK	Python / Java	✅ hierarchical	⚠️	✅	Apache-2.0	Gemini + Vertex AI
DSPy	Python	⚠️ via modules	⚠️ programmatic	✅	MIT	Programmatic prompt optimization
Phidata / Agno	Python	✅ teams	❌	✅	MPL-2.0	Multi-modal agents w/ memory

🧪 Sandboxes (running agent-generated code)

Sandbox	Hosting	Cold Start	Languages	Persistence	License	Best For
E2B	Cloud (managed)	~150ms	Python / Node / shell	per-session	Apache-2.0	OpenAI Agents SDK / production
Daytona	Cloud / self-host	~500ms	Polyglot	persistent workspaces	AGPL-3.0	Long-running dev tasks
Modal	Cloud (managed)	~200ms	Python	function-scoped	proprietary	GPU + serverless agents
Microsandbox	Local microVM	~100ms	Polyglot	per-session	Apache-2.0	Privacy-first local dev
SandboxFusion	Self-host	~300ms	20+ languages	ephemeral	Apache-2.0	Eval / benchmark pipelines

🌐 Browser-Use Stacks

Stack	Approach	Hosting	Strengths	License
Browser Use	Vision + DOM, Playwright	Self-host	Largest community, MIT, 92K stars	MIT
Stagehand	Typed `act/extract/observe`	Browserbase or self	Strong typing, structured output	MIT
Steel Browser	Headless API	Self-host or cloud	Sessions + proxy + captcha	Apache-2.0
Skyvern	Vision-first	Self-host	Robust to dynamic pages	AGPL-3.0
AgentQL	Query language	SDK + self-host	Semantic selectors	MIT
Playwright MCP	MCP-native	Self-host	Drop-in MCP tool for any client	Apache-2.0

📊 Eval & Observability

Tool	Self-host	OpenTelemetry	Eval Suite	Prompt Mgmt	License
Langfuse	✅	✅	✅	✅	MIT
Helicone	✅	✅	⚠️ basic	✅	Apache-2.0
Arize Phoenix	✅	✅	✅	⚠️	Elastic-2.0
LangSmith	❌ (cloud only)	✅	✅	✅	proprietary
Braintrust	❌ (cloud only)	✅	✅	✅	proprietary
DeepEval	✅ (library)	⚠️ via Confident	✅	❌	Apache-2.0
Agenta	✅	✅	✅	✅	Apache-2.0
OpenLLMetry	✅ (instrumentation)	✅ native	❌	❌	Apache-2.0

💻 Coding Agents — Headline Picks

Tool	Surface	Open Source	Free Tier	SWE-bench	Best For
Claude Code	CLI / IDE	❌	⚠️ Pro plan	80.9%	Long-horizon engineering
Codex CLI	CLI	✅	✅	n/a (Terminal-Bench 77.3%)	OpenAI-native shells
Cursor	IDE	❌	✅ (limited)	n/a	Pair-programming UX
Cline	VS Code ext	✅	✅ (BYO key)	n/a	OSS IDE alternative
Aider	CLI	✅	✅ (BYO key)	strong on Polyglot	Git-aware refactors
Devin 3.0	Cloud	❌	❌	leading	Hands-off long tasks
OpenHands	Self-host	✅	✅	competitive	Self-hosted SWE agent

Tables verified 2026-05-05. Send PRs with sources when figures change.

💰 Foundation Models — API Cost & Context

Prices in USD per 1M tokens. Data: 2026-05-20.

Model	Provider	Context Window	Input $/1M	Output $/1M	Best For
GPT-4o	OpenAI	128K	$2.50	$10.00	Broad tool-use, vision, broad ecosystem
GPT-4o-mini	OpenAI	128K	$0.15	$0.60	High-volume simple tasks
Claude Sonnet 4.6	Anthropic	200K	$3.00	$15.00	Coding agents, complex reasoning
Claude Opus 4.7	Anthropic	200K	$5.00	$25.00	Hardest reasoning tasks
Claude Haiku 4.5	Anthropic	200K	$1.00	$5.00	Fast Anthropic-ecosystem tasks
Gemini 2.5 Flash	Google	1M	$0.30	$2.50	Cost-effective multimodal
Gemini 2.5 Pro	Google	2M	$1.25	$10.00	Long-context, multimodal
Gemini 2.5 Flash-Lite	Google	1M	$0.10	$0.40	Ultra-cheap high-volume
DeepSeek V3.2	DeepSeek	128K	$0.14	$0.28	Budget-friendly coding + reasoning
Qwen3 235B A22B	Alibaba	131K	~$0.29	~$1.15	Best Chinese + coding, MoE
Kimi K2.6	Moonshot AI	262K	~$0.60	~$2.50	Chinese + long-context tasks
Grok 4	xAI	256K	$3.00	$15.00	X/Twitter integration, reasoning
Grok 4.20	xAI	2M	$2.00	$6.00	Very long context, agentic tasks

Sources: Anthropic, OpenAI, Google, DeepSeek, Alibaba, Moonshot, xAI official pricing pages — May 2026.

💻 Foundation Models — Local Deployment

Estimated VRAM at Q4_K_M quantization. Speed varies by hardware.

Model	Params	Min VRAM (Q4)	Speed (tok/s)*	Best Quantization	Chinese Support	Best For
Qwen3.6-27B	27B dense	~17 GB	~23 (M5 Max)	Q4_K_M / FP8	⭐⭐⭐⭐⭐	Coding, Chinese, agentic tasks
Qwen3 235B A22B	235B MoE	~40 GB (active)	~15–20	Q2_K / Q4_K_M	⭐⭐⭐⭐⭐	Best local quality, huge context
Llama 3.3 70B	70B dense	~42 GB	~12–18	Q4_K_M	⭐⭐☆☆☆	Best English open-weight
DeepSeek V3-671B	671B MoE	~40 GB (active)	~10–15	Q2_K	⭐⭐⭐⭐☆	Open-weight coding champion
Gemma 4 27B	27B dense	~17 GB	~20–25	Q4_K_M	⭐⭐⭐☆☆	Multilingual reasoning, Apache-2.0
Phi-4 14B	14B dense	~9 GB	~35–45	Q4_K_M	⭐⭐☆☆☆	Best 8–16GB VRAM coding model
Mistral Small 4 24B	24B dense	~14 GB	~25–30	Q4_K_M	⭐⭐⭐☆☆	Multilingual, function calling

* tok/s measured at typical decode context; varies with hardware, context length, and batch size.

🧠 Agent Memory Systems

System	Storage	Retrieval	Local	Self-host	Temporal	License	Best For
Mem0	Vector + Graph	Semantic	✅	✅	✅	Apache-2.0	Drop-in memory for any LLM app
Basic Memory	Markdown files	Keyword + embedding	✅	✅	⚠️	MIT	Human-readable, Obsidian-compatible
Graphiti	Temporal knowledge graph	Graph traversal	✅	✅	⭐ native	Apache-2.0	Time-aware agent memory
Zep	Vector + summary	Semantic	✅	✅	✅	Apache-2.0	Production memory for chat agents
Memary	Knowledge graph	Graph + semantic	✅	✅	⚠️	MIT	Open-source agent memory layer
CORE	Episodic + semantic	Hybrid	✅	✅	✅	Apache-2.0	Structured episodic + semantic memory
Letta (fka MemGPT)	Tiered (core/archival)	Paged retrieval	✅	✅	✅	Apache-2.0	Long-term memory with infinite context illusion

🎙️ Voice & Audio Models

Model / Service	STT	TTS	Realtime	Local	Latency	Languages	License
ElevenLabs v3	❌	⭐⭐⭐⭐⭐	✅	❌	~200ms	32+	Proprietary
Whisper v3 (local)	⭐⭐⭐⭐★	❌	❌	✅	~1s (large)	99	MIT
Deepgram Nova-3	⭐⭐⭐⭐⭐	✅	✅	❌	<100ms	30+	Proprietary
Gemini Live API	✅	✅	⭐ native	❌	<300ms	30+	Proprietary
OpenAI Realtime API	✅	✅	⭐ native	❌	~300ms	57	Proprietary
MiniMax TTS	❌	⭐⭐⭐⭐☆	✅	❌	~200ms	20+	Proprietary
Kokoro	❌	⭐⭐⭐⭐☆	❌	✅	~100ms	8	Apache-2.0
Voxtral	⭐⭐⭐⭐☆	❌	❌	✅	batch	20+	Apache-2.0

🎨 Image Generation Models

Model	Max Resolution	API / Local	Photorealism	Best For	Pricing (approx)
DALL-E 3	1024×1024	API	High	Instruction-following, broad	$0.04/image (std)
gpt-image-2	2048×2048	API	Very high	API workflows, 4K output	$0.04–$0.17/image
Flux 2 Pro	2K+	API	⭐ high	Photorealistic, fast generation	~$0.05/image
Midjourney V8	2K+	Web only	Artistic	Best artistic quality	$10–$120/mo plan
Stable Diffusion 3.5	2K	Local + API	Good	Open-weight, self-hostable	Open weights (Apache-2.0)
Ideogram 3	2K	API + Web	Good	Typography + text in images	Freemium
Gemini 3 Pro Image	1K	API	High	Native multimodal edit	Vertex AI pricing

🎥 Video Generation Models

Model	Max Length	Resolution	API / Local	Best For	Status (2026-05)
Veo 3.1	2 min	4K	API (Vertex)	Highest fidelity, physics-aware	GA (Google)
Kling VIDEO 3.0	3 min	1080p	API + Web	Cinematic style, leading post-Sora	GA (Kuaishou)
Runway Gen-4	10s/clip	1080p	API + Web	Precise motion control, professional	GA
Pika 2.0	10s	1080p	Web	Creative / social media	GA
Seedance 2.0	60s	1080p	API	Fast, cost-effective, social media	GA (ByteDance)
Hailuo 02	60s	1080p	Web + API	Smooth motion, accessible	GA (MiniMax)
~~Sora~~	❌	❌	❌	—	Discontinued Apr 2026

🔍 RAG Frameworks

Framework	Language	Vector DB	Hybrid Search	Streaming	License	Best For
LlamaIndex	Python	Any	✅	✅	MIT	Production RAG, document pipelines
Haystack	Python	Any	✅	✅	Apache-2.0	Pipelines, search-heavy RAG
LangChain LCEL	Python / JS	Any	✅	✅	MIT	Flexible chaining, large ecosystem
RAGFlow	Python	Built-in	✅	✅	Apache-2.0	Deep document parsing, OCR-aware
Cognee	Python	Vector + Graph	✅	⚠️	Apache-2.0	Knowledge graph + RAG hybrid
txtai	Python	Built-in	✅	❌	Apache-2.0	Lightweight, embeddings-first
Verba	Python	Weaviate	⚠️	❌	BSD-3	Weaviate-native RAG chatbot

🗄️ Vector Databases

Database	Self-host	Cloud	Scale	Hybrid Search	License	Best For
Qdrant	✅	✅	Very large	✅	Apache-2.0	Best all-round OSS vector DB
Weaviate	✅	✅	Large	✅	BSD-3	Multi-modal, GraphQL API
Pinecone	❌	✅	Very large	✅	Proprietary	Managed, easiest setup
Chroma	✅	⚠️	Medium	❌	Apache-2.0	Fast prototyping, Python-native
Milvus	✅	✅	Very large	✅	Apache-2.0	Billion-scale production
pgvector	✅	✅	Medium	⚠️	PostgreSQL	Existing Postgres stack
FAISS	✅	❌	Large	❌	MIT	In-memory, GPU-accelerated search

📱 Personal AI Assistants (2026)

Tool	Open Source	Local LLM	Memory	Multi-channel	Self-host	Best For
OpenClaw	❌	✅	✅ native	✅ (TG/Discord/WA)	✅	All-in-one personal agent platform
Khoj	✅	✅	✅	⚠️ (web/app)	✅	Research, notes, calendar integration
Jan.ai	✅	✅	❌	❌	✅	Offline ChatGPT replacement, GUI
LM Studio	❌	✅	❌	❌	✅	Easy local model runner, non-technical
Perplexity	❌	❌	⚠️	❌	❌	Search-first, cited answers
Claude.ai Pro	❌	❌	✅ Projects	❌	❌	Best reasoning, MCP tools
Zo Computer	❌	❌	✅	❌	❌	Autonomous computer use assistant

🔌 MCP Servers — Top Integrations

Stars data approximate, 2026-05.

MCP Server	Category	Stars	Auth	Security Audit	License
GitHub MCP	Dev / Code	🔥 High	OAuth	✅ (GitHub)	MIT
Playwright MCP	Browser	🔥 High	None (local)	⚠️	Apache-2.0
Filesystem MCP	Files	🔥 High	None (local)	⚠️ sandboxing	MIT
Brave Search MCP	Search	High	API key	❌	MIT
Slack MCP	Comms	Medium	OAuth	❌	MIT
Notion MCP	Notes	Medium	OAuth	❌	MIT
PostgreSQL MCP	Database	Medium	Conn string	⚠️ read-only mode	MIT
Google Maps MCP	Location	Medium	API key	❌	MIT

Use mcp-scan (Invariant Labs) to audit any MCP server before production deployment.

🏢 Enterprise AI Agent Platforms

Platform	Open Source	MCP Support	A2A Support	Self-host	Compliance	Best For
Microsoft Agent Framework	⚠️ (AutoGen OSS)	✅	✅	⚠️ (Azure)	SOC2, ISO 27001	Azure-native enterprise
Salesforce Agentforce	❌	⚠️	❌	❌	SOC2, GDPR	Salesforce CRM orgs
SAP Joule	❌	❌	❌	⚠️	SOC2, ISO	SAP ERP environments
Google Gemini Enterprise	❌	✅	✅	❌ (cloud)	SOC2, FedRAMP	Google Workspace orgs
IBM watsonx	⚠️	✅	⚠️	✅ (on-prem)	FedRAMP, HIPAA	Regulated / on-prem enterprise
ServiceNow AI Agents	❌	✅	⚠️	❌	SOC2	IT service management
Dify Enterprise	✅ (CE)	✅	✅	✅	SOC2 (cloud)	Multi-model, low-code agent platform

📏 Embedding Models

MTEB = Massive Text Embedding Benchmark leaderboard score (EN, 2026-05 approx).

Model	Dims	Context	Local	API	Languages	License	MTEB ≈
OpenAI text-embedding-3-large	3072	8K	❌	✅	Multi	Proprietary	~64
Cohere embed-v4	1024	512	❌	✅	Multi	Proprietary	~66
Gemini text-embedding-004	768	2K	❌	✅	Multi	Proprietary	~63
BGE-M3	1024	8K	✅	❌	Multi	MIT	~65
Jina-embeddings-v3	1024	8K	✅	✅	Multi	CC-BY-NC	~65
Nomic-embed-text-v2	768	8K	✅	✅	Multi	Apache-2.0	~62
Voyage-3	1024	32K	❌	✅	Multi	Proprietary	~67

🛡️ Agent Security Tools

Tool	MCP Scan	Prompt Injection Defense	Audit Logs	Self-host	License
mcp-scan	⭐ native	✅	❌	✅	MIT
Lakera Guard	❌	⭐⭐⭐⭐⭐	✅	❌	Proprietary
Zenity	✅	✅	✅	❌	Proprietary
Prompt Armor	❌	⭐⭐⭐⭐☆	✅	❌	Proprietary
Azure AI Content Safety	❌	✅	✅	❌ (Azure)	Proprietary
Rebuff	❌	⭐⭐⭐⭐☆	❌	✅	MIT

🖥️ Computer Use & Desktop Agents

Tool	OS	Vision	Local	API	Open Source	Best For
Claude Desktop Intelligence	Mac / Linux	✅	❌	✅	❌	Best all-round screen agent
UFO	Windows	✅	✅	Optional	✅	Windows native automation
OSWorld	Mac/Win/Linux	✅	✅	Optional	✅	Cross-platform benchmark + agent
Nemo Agent	Linux	✅	✅	Optional	✅	Open desktop control
Screenpipe	Mac / Linux	✅	✅	❌	✅	Screen + audio memory, privacy-first
Claude Computer Use (API)	Any (via API)	✅	❌	✅	❌	API-driven desktop control

🤖 Physical AI Platforms

Platform	Type	Open Source	SDK	Simulation	Best For
NVIDIA Isaac GR00T N1.5	Humanoid foundation	⚠️ (weights)	✅	✅ (Isaac Sim)	Universal humanoid robot foundation model
ROS 2 Jazzy	Robot OS	✅	✅	✅ (Gazebo)	Standard robot middleware
Gemini Robotics	Manipulation	❌	⚠️	✅	Vision + language + dexterous manipulation
Unitree SDK2	Quadruped / Humanoid	✅	✅	⚠️	Go2, H1, G1 robot dev
Boston Dynamics API	Quadruped	❌	✅	❌	Spot industrial deployment
Genesis Sim	Simulation	✅	✅	⭐ native	Ultra-fast physics sim for embodied AI

🇨🇳 Chinese AI Models — Head-to-Head

Chinese language capability benchmarks are approximate. API prices in USD/1M tokens, May 2026.

Model	Provider	Context	Chinese Bench≈	Coding	Open Weight	Input $/1M
Qwen3 235B A22B	Alibaba	131K	Top	⭐⭐⭐⭐⭐	✅ Apache-2.0	~$0.29
DeepSeek V3.2	DeepSeek	128K	Very high	⭐⭐⭐⭐⭐	✅ MIT	$0.14
Kimi K2.6	Moonshot AI	262K	High	⭐⭐⭐⭐☆	❌	~$0.60
GLM-5.1	Zhipu AI	128K	High	⭐⭐⭐⭐☆	⚠️ partial	~$0.50
Hunyuan Pro	Tencent	256K	High	⭐⭐⭐⭐☆	❌	~$0.45
Doubao Pro 256K	ByteDance	256K	High	⭐⭐⭐☆☆	❌	~$0.80
ERNIE 5	Baidu	128K	High	⭐⭐⭐☆☆	❌	~$0.70

📦 Agent Frameworks — TypeScript / JavaScript

Framework	Multi-Agent	Streaming	MCP	A2A	Stars≈	License
Mastra	✅	✅	✅	✅	~12K	Elastic-2.0
Vercel AI SDK	⚠️	✅	✅	❌	~12K	Apache-2.0
LangChain.js	✅	✅	✅	❌	~14K	MIT
Genkit	✅	✅	✅	❌	~3K	Apache-2.0
OpenAI Agents SDK (Node)	✅	✅	✅	❌	~2K	MIT
Rivet	✅	✅	⚠️	❌	~4K	MIT
Flowise	✅	✅	✅	❌	~35K	Apache-2.0

📊 Meta-Comparison — Orchestration vs Framework vs IDE

Category	Example Tools	Best For	Abstraction Level	Flexibility
Orchestration Platform	Dify, n8n, Flowise, Langflow	Non-engineers, fast deployment	Very high	Low-medium
Agent Framework	LangGraph, CrewAI, Mastra, OpenAI Agents SDK	Engineers building custom agents	Medium	High
Agent IDE / Coding Agent	Claude Code, Cursor, Cline, Devin	Developers pair-programming	Low	Very high
Low-code Builder	Voiceflow, Botpress, Microsoft Copilot Studio	Business / product teams	Very high	Low
AI-native App Platform	Vertex AI Agent Builder, Azure AI Foundry	Enterprise with managed infra	High	Medium

📱 Mobile AI Frameworks

Framework	iOS	Android	Local LLM	On-device Inference	License	Best For
MLX	✅	❌	✅	⭐ Apple Silicon	MIT	Apple-native, fast LLM on Mac/iPhone
llama.cpp (mobile)	✅	✅	✅	✅ (arm/x86)	MIT	Universal local LLM, all platforms
MediaPipe	✅	✅	✅	✅	Apache-2.0	On-device ML tasks (vision, NLP)
Core ML	✅	❌	✅	✅ (ANE)	Apple SDK	iOS/macOS native model inference
Google AI Edge	✅	✅	✅	✅	Apache-2.0	LiteRT + Gemma Nano on-device
Ollama (mobile proxy)	⚠️ via API	⚠️ via API	✅	❌ (server-side)	MIT	Run Ollama server, hit from mobile
Qualcomm AI Hub	❌	✅	✅	✅ (Snapdragon NPU)	SDK	Snapdragon-optimized model deployment

All Compare tables data: 2026-05-20. Send PRs with sources when figures change.

🗺️ Scenario Guide — What Should I Use For…

50+ curated scenarios matching your goal to the right tool or stack. Updated weekly.

🏗️ Building: Coding Agents

I want to build a coding agent for my startup (lowest cost, high quality) → Claude Code (CLI) + E2B sandbox + Langfuse observability. SWE-bench 80.9%. ~$200/mo at moderate usage.

I want an enterprise coding agent with security controls

GitHub Copilot Enterprise — Deep GitHub integration, IP indemnity, SSO/SAML, SOC 2. → best if already on GitHub Enterprise
Cursor Business — Privacy mode, code never leaves your infra, admin dashboard. → best for teams needing IDE-first UX
Devin 3.0 (Cognition) — Fully autonomous PR-to-merge with re-planning. → best for hands-off long-horizon tasks

I want an open-source self-hosted coding agent (no vendor lock-in)

OpenHands (All-Hands-AI) — MIT, competitive SWE-bench, BYO model. → best if you need full control
Cline (VS Code ext) — BYO key, large community, free. → best for VS Code users
Aider — Git-aware CLI refactoring, excellent polyglot support. → best for terminal-based git workflows

I want a browser automation / web scraping agent

Browser Use — 92K stars, vision + DOM, MIT. → best for general web automation
Stagehand (Browserbase) — Typed act/extract/observe API, structured output. → best for reliability-critical scraping
Skyvern — Vision-first, handles dynamic pages without CSS selectors. → best for changing / heavily JS-rendered sites

I want a document processing / PDF analysis agent → LlamaIndex (document pipeline) + Gemini 2.5 Pro (2M context) or Claude Opus 4.7 (200K, best reasoning) + Unstructured.io for ingestion. For local: Ollama + Qwen3.6-27B.

I want a customer service / support agent

Dify — No-code LLM workflow builder, self-hostable, RAG built-in. → best for non-technical teams
LangGraph + Zendesk MCP — Stateful workflows, ticket resolution loop. → best for engineering-led teams
Salesforce Agentforce — CRM-native, works within existing Salesforce data. → best for Salesforce-first orgs

I want a research / deep-research agent → Perplexity Deep Research (managed) or OpenDevin / OpenHands + Tavily Search + Claude Opus 4.7. For local: Khoj (self-hosted). Expect multi-minute runs and $1–5 per deep report at cloud rates.

I want a data analysis / BI agent

Julius AI / Code Interpreter (ChatGPT) — Managed, no setup. → best for analysts without eng support
LangChain + Pandas Agent + Langfuse — Fully custom, code-gen for queries. → best for eng teams with custom data
Metabase AI / Tableau Pulse — Embedded BI copilot. → best inside existing BI stack

I want a computer use / desktop automation agent

Claude Desktop Intelligence (Anthropic) — Screen-aware, controls any GUI app. → best all-around for macOS/Linux
UFO (Microsoft, open-source) — Windows-native, Win32 + UI Automation APIs. → best for Windows automation
Screenpipe — Continuous screen + audio recording + local LLM inference. → best for local privacy-first

I want a voice / conversational agent

Gemini Live API — Real-time voice, <300ms latency, Google cloud. → best for Google ecosystem
OpenAI Realtime API (GPT-4o Audio) — Native voice with tool calling. → best for OpenAI ecosystem
LiveKit + Whisper + ElevenLabs v3 — Self-hostable voice pipeline. → best for custom, brand-specific voice

I want a multi-agent orchestration system

LangGraph — Stateful graph workflows, best Python production option. → best for complex state machines
OpenAI Swarm / Agents SDK — Lightweight handoffs, OpenAI-native. → best for simple OpenAI agent networks
Google ADK — Hierarchical agent coordination, Gemini-native. → best for Google/Vertex stack
Mastra (TypeScript) — Type-safe workflows, TS-first teams. → best for TypeScript stacks

I want a personal AI assistant (self-hosted)

OpenClaw — Multi-channel (Telegram/Discord/WhatsApp), memory, cron, MCP support, full local LLM option. → best all-in-one self-hosted
Khoj — Search + research + calendar, open-source, self-host. → best for knowledge workers
Jan.ai / LM Studio — GUI-first local model runners. → best for non-technical local LLM

I want a personal AI assistant (managed / easy setup)

Claude.ai (Pro) — Projects, memory, MCP tools, best reasoning. → best for power users
Perplexity Pro — Search-first, cites sources. → best for research-heavy use
ChatGPT Plus — Code interpreter, image gen, broad tools. → best for general-purpose

I want to build a RAG application → LlamaIndex (orchestration) + Qdrant (vector DB) + Cohere embed-v4 (embeddings) + BGE reranker (reranking). Managed alternative: Ragie or Cognee. Production telemetry: Langfuse.

I want a financial analysis agent → LangChain + yfinance / Alpha Vantage MCP + Claude Sonnet 4.6 (Excel/table reasoning) + Langfuse. Avoid: don’t use hallucination-prone models for numbers — always validate with structured output + code execution.

I want a legal document agent → Claude Opus 4.7 (200K context, best contract analysis) + LlamaIndex (ingestion) + pgvector (self-hosted vector). Important: always have a human-in-the-loop for final legal decisions.

I want an education / tutoring agent

Khanmigo (Khan Academy) — Purpose-built for K–12, COPPA-compliant. → best for K–12 safe deployment
Custom with GPT-4o + LangGraph state machine + spaced repetition logic. → best for HEd or corporate training

I want a creative writing assistant → Claude Opus 4.7 (best prose quality) or Gemini 2.5 Pro (long-form, 2M context) + Notion / Obsidian MCP for knowledge base. For structured fiction: Sudowrite (managed).

I want an IoT / physical AI agent → ROS 2 (robot OS) + NVIDIA Isaac GR00T (humanoid foundation model) + Genesis Sim (simulation). For home automation: Home Assistant + custom LLM backend.

I want a game playing / simulation agent → PettingZoo (multi-agent RL env) + Gymnasium + GPT-4o Vision for game-state parsing. For LLM-in-the-loop games: Concordia (Google DeepMind).

I want a security scanning / vulnerability agent → Semgrep (static analysis) + Claude Sonnet 4.6 (explain + triage findings) + mcp-scan (MCP server audit). See also: Agent Security table.

I want a healthcare AI tool (non-clinical / administrative) → Claude Opus 4.7 + RAG on medical knowledge base + strict output validation. Always: disclose AI, human oversight for any clinical decision, check HIPAA/GDPR compliance. Never automate clinical diagnosis.

I want a code review / PR security agent → CodeRabbit (managed, instant PR reviews) or Claude Code in CI + Semgrep + custom rules. For enterprise: Copilot Code Review (GitHub).

I want a social media / content creation agent → n8n (workflow automation) + Claude Sonnet 4.6 (drafting) + gpt-image-2 (images) + Buffer/Later MCP (scheduling). Self-hosted option: all via n8n + Ollama.

I want a translation / localization agent → DeepL API (best quality for EU languages) or Claude Sonnet 4.6 (nuanced context-aware) + Weblate (open-source TMS). For Chinese: Qwen3 235B + human review loop.

🧠 Model Selection

I need the smartest model for complex multi-step reasoning

Claude Opus 4.7 (/think xhigh) — Best in class for math, logic, long-horizon reasoning. $5/$25 per M tokens.
Gemini 2.5 Pro — 2M context, strong multimodal, competitive pricing. $1.25/$10 per M tokens.
GPT-4o — Broadly capable, strong tool-use ecosystem. $2.50/$10 per M tokens.

I need the fastest + cheapest model for simple, high-volume tasks

Gemini 2.5 Flash-Lite — $0.10/$0.40 per M tokens, 1M context.
DeepSeek V3.2 — $0.14/$0.28 per M tokens, surprisingly strong quality.
Claude Haiku 4.5 — $1/$5 per M tokens, Anthropic ecosystem integration.
GPT-4o-mini — $0.15/$0.60 per M tokens, broad OpenAI tooling.

I need the best Chinese language support

Qwen3 235B A22B (Alibaba) — Strongest Chinese benchmark, MoE architecture, $0.29/$1.15 per M. → cloud API
Kimi K2.6 (Moonshot) — 262K context, great Chinese instruction-following. → both API + local
DeepSeek V3.2 — Open weights, excellent Chinese coding. → self-host or API
GLM-5.1 (Zhipu AI) — Strong long-context, Chinese-first. → API or local

I need the best local/offline model with ~16GB VRAM

Qwen3.6-27B Q4_K_M — ~17GB VRAM, ~23 tok/s, excellent coding + Chinese. Best overall 16GB pick.
Gemma 4 27B (Google) — Strong reasoning, multilingual, Apache-2.0.
Phi-4 14B (Microsoft) — ~9GB VRAM (Q4), punches above its weight on coding.
Mistral Small 4 24B — Fast, multilingual, well-rounded.

I need the best local/offline model with 40GB+ VRAM

Llama 3.3 70B Q4_K_M — ~42GB VRAM, strong English + coding, Meta Apache-2.0.
DeepSeek V3-671B Q2 — MoE, only 40GB active params in Q2 but requires 2×A100 setup.
Qwen3 235B A22B Q2 — MoE flagship, 40-48GB VRAM at Q2, best local quality.

I need the best coding capability → Claude Sonnet 4.6 (SWE-bench 80.9% via Claude Code) for agentic coding. GPT-4o for code generation + explanation. DeepSeek V3.2 for open-weight coding. For IDE use: Cursor (Claude backend) or Cline.

I need multimodal understanding (vision + text)

Gemini 2.5 Pro — Native vision, PDF, audio, video understanding. 2M context.
GPT-4o — Mature vision API, strong diagram/chart understanding.
Claude Opus 4.7 — Best for complex document image reasoning.
Qwen3-VL 72B — Best open-weight multimodal, self-hostable.

I need very long context (500K+ tokens)

Gemini 2.5 Pro — 2M context window, best for entire codebase or book analysis.
Gemini 2.5 Flash — 1M context, cheaper option.
Kimi K2.6 — 262K context, strong Chinese.
Claude Opus 4.7 — 200K context, best quality within that window.

I need real-time voice / audio model

Gemini Live API — <300ms latency, native Google cloud.
OpenAI Realtime API — GPT-4o Audio, native function calling during voice.
ElevenLabs v3 — Best TTS quality, 32+ languages.
Voxtral (Mistral) — Open-weight audio model, transcription + understanding.

I need the best image generation

gpt-image-2 (OpenAI) — Best instruction-following, 2K/4K, $0.04–0.17/image.
Flux 2 Pro (Black Forest Labs) — Photorealistic, fast, API available.
Midjourney V8 — Best artistic quality, no API (web only).
Stable Diffusion 3.5 — Open weights, local deployment, Apache-2.0.

I need the best video generation

Veo 3.1 (Google) — High fidelity, physics-aware, best quality 2026.
Kling VIDEO 3.0 (Kuaishou) — Leading post-Sora, strong cinematic style.
Runway Gen-4 — Precise motion control, professional use.
Seedance 2.0 (ByteDance) — Fast, cost-effective, strong for social media.

I need an open-weight model (MIT or Apache license)

Llama 3.3 70B (Meta, Apache-2.0) — Best English open-weight.
Qwen3 235B A22B (Alibaba, Apache-2.0) — Best Chinese + coding open-weight.
Mistral Small 4 (Mistral AI, Apache-2.0) — Fast, multilingual.
DeepSeek V3.2 (MIT) — Best open-weight coding.
Gemma 4 27B (Google, Apache-2.0) — Strong multilingual reasoning.

🏗️ Infrastructure

I want to run everything locally (privacy-first, zero cloud) → Ollama (model runner) + Open WebUI (UI) + Qdrant (local vector DB) + Qwen3.6-27B (16GB VRAM) or Llama 3.3 70B (40GB+). Full stack: OpenClaw (local mode) or AnythingLLM. No data leaves your machine.

I want to minimize API costs (budget <$50/month) → Use DeepSeek V3.2 ($0.14/$0.28) or Gemini 2.5 Flash ($0.30/$2.50) for high-volume. Reserve Claude Sonnet 4.6 for complex tasks only. Use Anthropic Batch API (50% off) for non-real-time work. Cache aggressively (prompt caching saves ~70% on repeated context).

I want to scale to enterprise (millions of requests/month) → Google Vertex AI (managed Gemini, auto-scale, SLAs) or Azure OpenAI (GPT-4o, compliance, dedicated capacity). Add LangFuse for observability. For routing: PortKey or LiteLLM as unified gateway.

I want to deploy in an air-gapped / regulated environment → Ollama (local inference) + Qwen3 235B A22B / Llama 3.3 70B (open weights) + Qdrant (local vector DB). For enterprise needs: IBM watsonx (on-prem) or Azure Government (FedRAMP). Compliance certifications matter more than model quality here.

I want to build for edge / mobile deployment → Core ML (Apple, iOS/macOS) + Phi-4 14B or Gemma 4 2B (quantized). For Android: MediaPipe + Gemma 4. For cross-platform: llama.cpp + GGUF models. Check Qualcomm AI Hub for Snapdragon-optimized models.

I need multi-cloud / want to avoid vendor lock-in → LiteLLM (unified API proxy for 100+ providers) + LangGraph (framework-agnostic) + provider-agnostic embeddings (BGE-M3 open-weight). Store all state in self-hosted Qdrant or Postgres+pgvector.

I want to self-host everything (no managed services at all) → Ollama (models) + Qdrant (vectors) + Langfuse (observability, self-host Docker) + n8n (workflows) + OpenClaw (agent runtime). GPU recommendations: 2× RTX 4090 (24GB each = 48GB) for 70B models; single RTX 4090 for 27B.

📊 Evaluation & Monitoring

I want to evaluate agent output quality → DeepEval (rich metric suite: faithfulness, relevancy, hallucination) + Langfuse (traces + evals). For custom: LangSmith (tight LangChain integration). Open-source: Agenta (self-host).

I want to debug why my agent is failing → Langfuse (trace each tool call + LLM call with timing) + Arize Phoenix (root cause analysis). Enable verbose logging in your framework (LangGraph / CrewAI all support it). Use LLM-as-judge to flag low-confidence steps.

I want to monitor production agents in real-time → Langfuse (OpenTelemetry-native, self-host) or Helicone (zero-latency proxy logging). Set up cost + latency + error-rate dashboards. Alert on error spikes via Grafana or Datadog integration.

I want to A/B test different models or prompts → Braintrust (experiment tracking, online/offline eval) or LangSmith (prompt playground + evals). For open-source: Agenta + Langfuse experiments feature.

I want to benchmark models on my specific tasks → LMSYS Chatbot Arena custom eval + Evals by OpenAI (open framework) + DeepEval (custom metric). Run your own eval harness: prepare 50–200 golden examples, measure precision/recall on your actual task.

I want to evaluate MCP server security → mcp-scan (Invariant Labs) — Detects prompt injection, tool poisoning, shadow tools in MCP servers. Run before production deployment. See also: Agent Security.

🌍 Ecosystem Choices

I want to build within the OpenAI ecosystem → OpenAI Agents SDK + GPT-4o + E2B sandbox + LangSmith eval. Benefits: widest third-party tooling, most community examples. Cost: premium pricing.

I want to build within the Anthropic Claude ecosystem → Claude Code (agentic IDE/CLI) + Claude Sonnet 4.6 / Opus 4.7 + MCP protocol (Claude Desktop) + Langfuse (observability). Benefits: best coding quality, MCP is Claude-native. Cost: ~mid-tier.

I want to build within the Google Gemini ecosystem → Google ADK (Agent Development Kit) + Gemini 2.5 Pro/Flash + Vertex AI (deployment) + Vertex AI Eval + AlloyDB / BigQuery (data). Benefits: 2M context, multimodal, cheap Flash tier. Cost: scales well.

I want to build for the Chinese market (domestic cloud / regulation) → Qwen3 235B (Alibaba Cloud DashScope) + Baidu ERNIE 5 or Kimi K2.6 (Moonshot) as fallback + Alibaba Cloud PAI (deployment). All data stays within China borders. ICP-compliant.

I want a TypeScript-first stack → Mastra (TS workflows, MCP, A2A, Elastic-2.0) + Vercel AI SDK (streaming, RSC-friendly) + Qdrant JS client + Langfuse JS SDK. Alternative: LangChain.js + LangGraph.js.

I want an open-source-only stack (zero proprietary) → Ollama + Llama 3.3 70B or DeepSeek V3.2 (model) + LangGraph (MIT, framework) + Qdrant (Apache-2.0, vector DB) + Langfuse (MIT, observability) + E2B (Apache-2.0, sandbox). Fully self-hosted, no vendor dependencies.

📋 Stack Recipes — Curated Tool Combinations

8 battle-tested multi-tool setups for common use cases. Copy and adapt.

#	Recipe Name	Stack	Best For
1	Lean Coding Agent	Claude Code + E2B + Langfuse	Solo dev / startup, best quality per dollar
2	Open-Source SWE Agent	OpenHands + Ollama + Qwen3.6-27B + Qdrant	Full local, privacy-first coding
3	Enterprise RAG	LlamaIndex + Qdrant + Cohere embed-v4 + Langfuse + Claude Sonnet 4.6	Production Q&A on internal docs
4	Voice Assistant Pipeline	LiveKit + Whisper (STT) + Claude Sonnet 4.6 + ElevenLabs v3 (TTS)	Custom branded voice AI
5	Browser Automation	Browser Use + Stagehand + Claude Sonnet 4.6 + Langfuse	Reliable web scraping + form filling
6	Local-Only Privacy Stack	Ollama + Qwen3.6-27B + Open WebUI + Qdrant + n8n	Zero cloud, air-gapped use
7	TypeScript Agent	Mastra + Vercel AI SDK + Gemini 2.5 Flash + Qdrant + Langfuse	TS-first production SaaS
8	Chinese Market Stack	Qwen3 235B API + RAGFlow + Milvus + Langfuse	Domestic China deployment, ICP-compliant

⚠️ Anti-Picks — What NOT to Use For…

Avoid common mistakes. These recommendations are based on observed production failures in 2026.

❌ Don’t Use	❌ For This	✅ Use Instead	Why
LangChain v0.x	New production agents	LangGraph	LangChain chains are deprecated; LangGraph has proper state management
AutoGPT (legacy)	Production workloads	OpenHands or LangGraph	AutoGPT’s 2023 architecture has poor reliability at scale
GPT-3.5-Turbo	Complex reasoning	Gemini 2.5 Flash or Claude Haiku 4.5	GPT-3.5 deprecated, same cost range as modern models
Pinecone Starter	Self-hosted / cost-sensitive	Qdrant or pgvector	Pinecone Starter tier removed 2025; OSS alternatives are cheaper
LLM for real-time stock trading	Financial execution	Deterministic rule engine	LLMs hallucinate numbers; catastrophic for live trading
ChatGPT Plus	Production API workflows	OpenAI API direct	ChatGPT Plus is consumer; no SLA, no rate control, no observability
Hugging Face Inference API (free)	Production load	Modal or self-hosted Ollama	Free tier has extreme rate limits, cold starts >30s
Autonomous agents without human-in-loop	Medical / legal decisions	Any model + mandatory human review step	No current model is reliable enough for high-stakes autonomous decisions
PDF viewer MCP for sensitive docs	Compliance environments	Local LlamaIndex + on-prem Qdrant	Sending sensitive PDFs to cloud MCP servers violates data residency rules
CrewAI for single-agent tasks	Simple one-shot tasks	Direct API call	CrewAI’s multi-agent overhead adds latency and cost when only one agent is needed
Midjourney	Programmatic / API image gen	gpt-image-2 or Flux 2 Pro API	Midjourney has no public API; requires Discord bot workaround
GPT-4o Vision for OCR	High-accuracy document OCR	Tesseract 5 + Azure Document Intelligence	LLM OCR has ~2-5% error rate; dedicated OCR is 10x cheaper and more accurate
Sora	Any video generation (2026)	Kling VIDEO 3.0 or Veo 3.1	Sora discontinued by OpenAI, April 2026
Vector DB without reranking	High-precision RAG	Vector DB + BGE reranker or Cohere Rerank	Raw vector search recall is ~70%; reranking brings it to ~90%+
Gemini 2.5 Flash-Lite	Complex legal/medical reasoning	Claude Opus 4.7 or Gemini 2.5 Pro	Flash-Lite optimized for speed, not accuracy on high-stakes tasks

🌟 Notable Agent Projects of 2026

Standout projects and developments that shaped the AI agent landscape in 2026.

Model Context Protocol (MCP) - Became the universal standard for agent-tool interoperability. Donated to Linux Foundation.
A2A Protocol - 🆕 Google's Agent-to-Agent protocol enabled cross-framework agent collaboration with 150+ partners.
Claude Code - Anthropic's agentic coding tool became the go-to terminal-based coding agent with 80.9% SWE-bench.
Kiro - 🆕 AWS launched an autonomous coding agent capable of managing 10 simultaneous development tasks.
Devin 3.0 - 🆕 Evolved to include dynamic re-planning, self-healing code, and legacy codebase migration.
Microsoft Agent Framework - 🆕 AutoGen + Semantic Kernel merged into unified enterprise agent platform.
OpenAI Codex CLI - OpenAI entered the agentic coding space with an open-source terminal agent.
Browser Use - Breakthrough in making AI agents interact with the web naturally.
Claude Computer Use - 🆕 Desktop Intelligence let Claude control any software by seeing the screen.
Manus AI - 🆕 General-purpose autonomous agent that can handle research, coding, and complex workflows.
OpenHands - Open-source AI software engineering platform gained massive adoption.
Dify - Low-code LLM agent platform reached mainstream adoption.
Cline - VS Code autonomous coding agent with rapid community growth.
Mem0 - Memory layer for AI became essential component of agent architectures.
Sora Discontinuation - 🆕 OpenAI shut down Sora (April 2026), signaling strategic pivot to enterprise AI and reasoning.
Kling VIDEO 3.0 - 🆕 Kuaishou's video generation became the leading AI video platform post-Sora.
Cohere + Aleph Alpha Merger - 🆕 April 24, 2026. Canadian AI firm Cohere merged with Germany's Aleph Alpha at ~$20B valuation. $600M Series E from Schwarz Group. Creates transatlantic "sovereign AI" powerhouse with dual HQ in Toronto and Germany.
ScienceOne 100 / 磐石100 - 🆕 April 28-29, 2026. Chinese Academy of Sciences launches specialized scientific AI system. 2,000+ research tools, 50+ CAS institutes. Flagship-level scientific reasoning and agent capabilities.
Google Invests $40B in Anthropic - 🆕 April 2026. $10B initial + up to $30B contingent on performance milestones. Includes 5GW compute capacity over 5 years. Largest AI partnership investment to date.
OpenAI Deployment Company (DeployCo) - 🆕 May 11, 2026. OpenAI spins out a $4B+ enterprise-deployment services unit (TPG / Bain Capital / Brookfield / Advent / Goldman Sachs / SoftBank + Bain & Company / Capgemini / McKinsey) and absorbs the Tomoro consulting acquisition. Signals the AI vendor race shifting toward services + Forward Deployed Engineers.
Anthropic ↔ SpaceX Colossus 1 - 🆕 May 6, 2026. Anthropic takes all available capacity on the 300+ MW / 220K-GPU Colossus 1 Memphis cluster. SpaceX repositions itself as an AI infrastructure provider after its xAI acquisition; Anthropic doubles Claude Code rate limits for paid plans.
DeepSeek $4B state-backed round - 🆕 May 16, 2026. China's National AI Industry Investment Fund + Big Fund III + Tencent close in on a ~$4B first external round for DeepSeek at a ~$50B valuation — first known LLM investment from Big Fund III, signalling Beijing's bet on efficient open-weight frontier models and domestic silicon.
Pope Leo XIV → Vatican AI Commission - 🆕 May 16, 2026. Pope Leo XIV publishes the rescriptum establishing an inter-dicasterial Vatican commission on artificial intelligence (Dicastery for Integral Human Development coordinating, with Doctrine of the Faith, Culture & Education, Communication, Pontifical Academies for Life / Sciences / Social Sciences). One-year renewable mandate. First AI-focused encyclical expected to follow.
Google I/O 2026 — Gemini 3.5 + Omni + Spark + AI Ultra - 🆕 May 19, 2026. Google's biggest agent-and-AGI keynote of the year: Gemini 3.5 Flash GA (default model), Gemini Omni world-model family, Gemini Spark 24/7 personal agent with ~30+ MCP-based tool integrations, and a new Google AI Ultra $100/mo tier. Pichai confirms Google now processes 3.2 quadrillion tokens / month.
Alibaba Cloud Summit Hangzhou — Qwen 3.7-Max + Zhenwu M890 - 🆕 May 20, 2026. Alibaba unveils Qwen 3.7-Max (agentic-coding flagship for long-horizon missions), the T-Head Zhenwu M890 AI accelerator, and a full-stack AI infrastructure upgrade — China's most aggressive bid yet to position itself as the country's "AI factory."
OpenAI Guaranteed Capacity (Compute Annual Pass) - 🆕 May 19, 2026. Long-term enterprise compute reservations (1/2/3-year terms) sold as a structured product — OpenAI's structural answer to Anthropic's Priority Tier and the wider supply crunch for frontier-model inference.
Google Antigravity 2.0 + Microsoft RAMPART + xAI Grok Build - 🆕 May 14–22, 2026. Three structural agent-stack shifts in one week: Google's standalone multi-agent desktop + SDK at I/O 2026, Microsoft open-sourcing agentic-AI safety testing (RAMPART + Clarity), and xAI entering the CLI-agent race with Grok Build on grok-code-fast-1. Major / Anthropic-Google-Microsoft / xAI all show up with agent platforms within the same 8-day window.

📅 2026 AI Timeline

Key milestones and events in the AI landscape of 2026.

Date	Event	Category
Jan 2026	AMD Ryzen AI 400 Series unveiled at CES — mainstream AI PCs with 60 TOPS NPU	Hardware
Feb 2026	Claude Opus 4.6 released — agent team capabilities	Models
Feb 2026	Claude Sonnet 4.6 released — 1M token context, agentic search	Models
Feb 2026	Gemini 3.1 Pro released	Models
Feb 2026	Qwen3.5 Series launched — native multimodal, agentic coding	Models
Feb 2026	Qwen3-Coder-Next released — 80B MoE coding agent model	Models
Feb 2026	Cursor updated with 8 parallel agents	Tools
Feb 2026	GitHub Copilot expanded agent mode and model access	Tools
Mar 2026	Gemini 3.1 Flash Lite released to developers	Models
Mar 2026	Mistral Forge launched — custom LLM training platform	Platforms
Mar 2026	Microsoft Agent Framework (AutoGen + Semantic Kernel) targets GA	Frameworks
Mar 2026	DeepSeek announces new model trained on latest Nvidia chips	Models
Mar 2026	MCP 2026 roadmap published — focus on production scaling and governance	Protocols
Mar 2026	Sora shutdown announced (app closes April 26)	Events
Apr 2, 2026	Qwen3.6-Plus proprietary flagship launched by Alibaba	Models
Apr 3, 2026	Microsoft AI Agent Governance Toolkit released (open-source)	Tools
Apr 6, 2026	Microsoft Agent Framework officially announced (AutoGen + Semantic Kernel unified)	Frameworks
Apr 7, 2026	GLM-5.1 open-sourced by Zhipu AI — 744B MoE, trained on Huawei Ascend	Models
Apr 8-9, 2026	Meta Muse Spark released — first model from Meta Superintelligence Labs	Models
Apr 2026	Claude Mythos Preview — gated cybersecurity research model (BenchLM 99, SWE-bench 93.9%)	Models
Apr 2026	Sora app officially shuts down	Events
Apr 14, 2026	Gemini Robotics ER-1.6 upgraded robotics AI with enhanced spatial reasoning	Robotics
Apr 15, 2026	Qwen3.6-35B-A3B open-sourced (Apache 2.0) by Alibaba	Models
Apr 16, 2026	Claude Opus 4.7 released — SWE-bench Verified 87.6%, `/think xhigh` reasoning	Models
Apr 18, 2026	Qwen3.6-Max-Preview launched — top Chinese model on coding benchmarks	Models
Apr 20-21, 2026	Kimi K2.6 released by Moonshot AI — 1T MoE, 1,000-agent swarm	Models
Apr 22, 2026	Qwen3.6-27B open-sourced by Alibaba — dense 27B multimodal	Models
Apr 23, 2026	Tencent open-sources Hunyuan Hy3 Preview — 295B/21B MoE, 256K context	Models
Apr 23, 2026	Claude Managed Agents Memory public beta — persistent cross-session agent memory	Tools
Apr 23, 2026	GPT-5.5 released by OpenAI — major agentic coding and reasoning upgrade	Models
Apr 24, 2026	DeepSeek V4 Pro & Flash released — 1.6T MoE, 1M context, MIT license	Models
Apr 24, 2026	Cohere merges with Germany's Aleph Alpha at ~$20B valuation + $600M funding	Industry
Apr 27, 2026	Alibaba Tianma AI image-to-video model enters beta	Models
Apr 27, 2026	LangGraph v0.3.19 released; LangGraph Swarm prebuilt agents	Frameworks
Apr 28, 2026	NVIDIA Nemotron 3 Nano Omni released — 30B multimodal (text/image/audio/video)	Models
Apr 28-29, 2026	CAS ScienceOne 100 / 磐石100 launched — scientific AI for 50+ research institutes	Models
Apr 30, 2026	OpenAI begins rollout of GPT-5.5-Cyber via the Trusted Access for Cyber (TAC) program	Models
Apr 30, 2026	OpenAI publishes "A practical guide to building agents"	Resources
May 1, 2026	Anthropic launches Claude Security in public beta — Opus 4.7-powered codebase vulnerability scanner with auto-patches	Tools
May 2026	Macquarie Bank reports 130,000 hours saved in 7 months using Gemini Enterprise	Industry
May 2026	Google starts rolling Gemini into eligible vehicles, replacing Google Assistant (English-first, U.S. rollout)	Industry
May 4, 2026	Google retires Project Mariner; browser-agent tech folded into Gemini Agent	Tools
May 4, 2026	Anthropic + Goldman Sachs + Blackstone announce $1.5B Claude deployment JV to embed Anthropic engineers in mid-market Wall Street firms	Industry
May 5, 2026	OpenAI rolls out GPT-5.5 Instant as the new default ChatGPT model — efficiency-first upgrade, hallucination rate down ~50%	Models
May 5, 2026	Anthropic launches Claude Finance Agents — 10 specialised agents for pitchbooks, KYC, month-end close, available as Claude Cowork plugins / Claude Code skills / Managed-Agents cookbooks	Tools
May 5, 2026	OpenAI ↔ PwC partnership announced for financial-services agents (forecasting, payments)	Industry
May 7, 2026	Google preparing Agent Mode for Flow (Veo-based AI filmmaking) — automated video production pipeline	Tools
May 8, 2026	OpenAI launches GPT-Realtime-2 / Realtime-Translate / Realtime-Whisper — voice agents, live translation, real-time transcription	Models
May 9, 2026	OpenAI rolls out Workspace Agents in ChatGPT Enterprise — repeatable workflow automation across connected apps	Tools
May 11–13, 2026	Cursor 3.4 + SDK — Microsoft Teams integration, parallel-agent plan execution, multi-repo / Dockerfile dev environments, async sub-agents (`/multitask`), Vulnerability Scanner, granular model controls; Cursor SDK ships v2.5 security patch	Tools
May 12, 2026	OpenAI Daybreak — cyber-defense platform bundling GPT-5.5 + GPT-5.5-Cyber + Trusted Access for Cyber for AI-powered vuln detection / patch validation; EU preview to governments and security vendors	Tools
May 12, 2026	Gemini Intelligence revealed at Android Show: I/O Edition — proactive agentic AI across Googlebooks, Wear OS, Android Auto, Android XR; first on Samsung Galaxy + Pixel	Industry
May 12, 2026	Vapi raises $50M Series B after crossing 1B platform calls; Squads v2 + Composer + Simulations + Soniox transcriber GA	Industry
May 13, 2026	Figure 04 design finalized; component deliveries underway. Helix VLA-powered, follows F.03 home-focused build	Robotics
May 14, 2026	Claude Code v2.1.141 — `/goal` cross-turn completion conditions, agent view, plugin loading from .zip / URL, `Ctrl+R` global history search, enterprise feedback surveys	Tools
May 14, 2026	Codex on Mobile (preview) — ChatGPT iOS/Android can remote-control the macOS Codex app; OpenAI also issues TanStack supply-chain security patch	Tools
May 14, 2026	Gemini Spark pre-I/O leak — upcoming branded agent capability inside the Gemini app for autonomous multi-step processes	Tools
May 14, 2026	OpenClaw v2026.5.12 shipped — native model identity injected into system prompt, isolated Telegram polling worker, MEMORY.md auto-compaction, protected config paths for owner/exec approvals	Tools
May 11, 2026	OpenAI Deployment Company launched — $4B+ enterprise services unit with TPG / Bain Capital / Brookfield + Bain & Company / Capgemini / McKinsey; Tomoro consulting acquisition folded in	Industry
May 11-13, 2026	SAP Sapphire 2026 Orlando — SAP Business AI Platform, Joule Studio 2.0, Autonomous Suite with 50+ Joule Assistants and 200+ agents; Joule Studio 2.0 GA from June 2026	Industry
May 12, 2026	Claude for Legal — 20+ MCP connectors (iManage, NetDocuments, DocuSign, LexisNexis, Westlaw, Harvey, Everlaw, Relativity…) + 12 practice-area plugins on Claude Cowork	Tools
May 12-15, 2026	Visual Studio 2026 Insiders — Copilot Chat "Agent Mode" with guided Agent Skills authoring inside the IDE	Tools
May 13, 2026	Claude for Small Business — 15 pre-built agentic workflows + connectors for QuickBooks / PayPal / HubSpot / Canva / DocuSign / Google Workspace / Microsoft 365; 10-city US workshop tour	Tools
May 13, 2026	Cursor 3.4 cloud agent environments — multi-repo, Dockerfile-based config with build secrets, 70% faster cached layers, env version history, audit logs, scoped egress / secrets	Tools
May 13-16, 2026	Figure Helix 02 live-stream — F.03 + Helix 02 stress-test on a package-sort line, ~22K in 8h, ~30K in 24h, ~88K over ~72h until mechanical failure	Robotics
May 14, 2026	Anthropic ↔ Gates Foundation $200M partnership — 4-year grants + Claude credits + Anthropic engineering on global health, life sciences, education, agriculture	Industry
May 14, 2026	Anthropic ↔ PwC alliance expansion — global Claude Code + Cowork rollout, 30,000 PwC professionals certified, joint Agentic Enterprise Center of Excellence	Industry
May 14, 2026	Genkit Middleware — Google releases composable middleware for the open-source Genkit agent framework (TS / Go / Dart)	Frameworks
May 14, 2026	Zyphra ZAYA1-8B-Diffusion-Preview — first MoE diffusion LM converted from an autoregressive LLM; first diffusion LM trained on AMD GPUs; up to 7.7× inference speedup	Models
May 16, 2026	Pope Leo XIV establishes Vatican AI Commission — inter-dicasterial body to coordinate the Church's response to AI; first AI-focused encyclical expected next	Industry
May 16, 2026	OpenAI ↔ Malta partnership — every Maltese resident 14+ gets free 1-year ChatGPT Plus after a 2-hour AI literacy course ("OpenAI for Countries")	Industry
May 16, 2026	DeepSeek state-backed $4B raise at ~$50B valuation — National AI Industry Investment Fund + Big Fund III + Tencent close in on first external round	Industry
May 2026	LangGraph v1.2 — per-node timeouts/error-recovery/graceful shutdown, `DeltaChannel` checkpoint optimisation, content-block streaming API v3	Frameworks
May 2026	Grok 4.3 GA on Microsoft Foundry + Oracle OCI Generative AI; xAI flagship for agentic workloads	Models
May 1, 2026	Microsoft Agent 365 GA — enterprise observability + governance + security for AI agents across environments; May adds SASE for agents + threat detection	Industry
May 8, 2026	Code with Claude 2026 — Anthropic introduces Add-ins, Dreaming (scheduled memory review), Outcomes (rubric-driven generation), lead+sub-agent orchestration with shared filesystem audit	Tools
May 18, 2026	OpenAI ↔ Dell Codex partnership — Codex extended to hybrid/on-prem enterprise environments via Dell Technologies; first major non-cloud Codex distribution	Industry
May 18, 2026	Alibaba Qwen 3.7-Max-Preview / Plus-Preview — highest-ranked Chinese models on LM Arena in text + vision	Models
May 18, 2026	Boston Dynamics Atlas 100-lb manipulation + Hyundai commits to 25K+ Atlas units across Hyundai/Kia plants starting 2028 (GA)	Robotics
May 18, 2026	Figure F.03 vs human 8h sort challenge — human wins narrowly 12,924 vs 12,732 packages (2.79 vs 2.83 s/item)	Robotics
May 18, 2026	Anthropic briefs FSB on Claude Mythos — first frontier-lab briefing to a G20 financial-stability regulator on offensive-cyber model capabilities	Industry
May 18, 2026	ChatGPT safety systems update — OpenAI adds cross-session risk tracking for suicide / self-harm / harm-to-others escalation cues	Industry
May 19, 2026	Google I/O 2026 — Gemini 3.5 Flash launches as the new default Gemini app + Search AI Mode model (~4× faster than peers); Gemini 3.5 Pro slated for June	Models
May 19, 2026	Google I/O 2026 — Gemini Omni / Omni Flash, Google DeepMind's new multimodal world-model line aimed at AGI (any input, any output, video first)	Models
May 19, 2026	Google I/O 2026 — Gemini Spark, a 24/7 personal AI agent integrating ~30+ third-party tools via MCP, gated behind the new Google AI Ultra ($100/mo) tier	Tools
May 19, 2026	OpenAI Guaranteed Capacity / Compute Annual Pass launches — 1/2/3-year long-term compute reservations for enterprise AI products & agents	Industry
May 19, 2026	OpenAI ↔ Google SynthID + C2PA content provenance — first major frontier-lab interop on durable cross-platform AI image watermarking and a public verifier preview	Industry
May 19, 2026	Anthropic: Widening the conversation on frontier AI — framework for engaging wisdom traditions in frontier-AI safety dialogue	Industry
May 19, 2026	DeepSeek hires former Jane Street engineer to build AI harness team — DeepSeek pivoting from model R&D toward autonomous, revenue-generating agents	Industry
May 13, 2026	Runway Agent launches — conversational agent that takes a written brief and ships a multi-shot finished video end-to-end on Gen-4 / Aleph	Tools
May 20, 2026	Alibaba Cloud Summit Hangzhou — Qwen 3.7-Max GA, agentic-coding flagship for long-horizon multi-step missions; new T-Head Zhenwu M890 AI chip + full-stack AI infrastructure upgrade	Models
May 20, 2026	Bristol Myers Squibb ↔ Anthropic Claude Enterprise — 30K+ employees standardise on Claude Enterprise for drug discovery / development / delivery; first top-5 pharma full Claude deployment	Industry
May 20, 2026	LlamaIndex ↔ Google Agents API — LlamaParse / LiteParse exposed inside the new Google Agents API sandbox; Sandboxed-Lit runtime + ParseBench (first OCR benchmark for agents) ship in the same wave	Frameworks
May 20, 2026	Microsoft RAMPART + Clarity open-sourced — pytest-native white-box safety/security testing framework for agentic AI + structured design-review companion; CI/CD-friendly successor to PyRIT	Tools
May 6, 2026	AWS MCP Server GA — AWS-managed MCP endpoint exposes every AWS API with sandboxed Python and agent skills; first hyperscaler-first-party MCP server	Protocols
May 1, 2026	Google Workspace MCP Server rolls out — Workspace-native MCP for Gmail / Drive / Calendar / Docs / Sheets with admin-scoped OAuth	Protocols
May 14, 2026	Grok Build (early beta) — xAI's agentic CLI coding agent powered by grok-code-fast-1; parallel sub-agents in isolated envs, SuperGrok Heavy gating	Tools
May 14, 2026	iManage MCP Server launched — first major legal/professional-services SaaS to ship a public MCP endpoint	Tools
May 19, 2026	Google Antigravity 2.0 at I/O 2026 — standalone desktop app for multi-agent orchestration, scheduled / async runs, dynamic sub-agents, Antigravity CLI + SDK, enterprise edition inside Gemini Enterprise Agent Platform	Tools
May 22, 2026	Kore.ai Artemis Agent Platform launched on Azure — AI-native enterprise platform with Agent Blueprint Language (ABL) for declarative multi-agent workflows	Industry
May 22, 2026	FPT Flezi Foundry™ launched — AI-augmented delivery platform with Agentic Development Lifecycle (ADLC) and Agentic Managed Services (AMS) modes under "Service-as-a-Software" governance	Industry
May 22, 2026	JetBrains Rider AI test-writing skill — surfaces .NET coverage data to Claude Code / Codex so agents focus tests on untested branches	Tools
May 28, 2026	Claude Opus 4.8 released by Anthropic — codebase-scale migrations, dynamic-workflows research preview (hundreds of parallel sub-agents), effort-control panel, 3× cheaper Fast mode; teases upcoming Mythos-class models	Models
May 28, 2026	Koog 1.0 released at KotlinConf 2026 — JetBrains' open-source Kotlin/Java AI-agent framework hits stable, Kotlin Multiplatform deployment, OpenTelemetry across targets	Frameworks
May 28, 2026	Gemini Omni Flash conversational video editing starts rolling out via Gemini app / Google Flow / YouTube Shorts — voice-and-text-driven cinematic edits replace NLEs	Tools
May 29, 2026	MCP 2026-07 Release Candidate published — stateless protocol core, extensions framework, MCP Apps server-rendered UI, hardened OAuth/OIDC alignment; final spec target July 28, 2026	Protocols
Apr 2026	Gartner predicts 40% of enterprise apps will embed AI agents by end of 2026	Industry
Apr 2026	Google commits up to $40B investment in Anthropic (initial $10B)	Industry
2026 (ongoing)	A2A Protocol grows to 150+ partner organizations	Protocols
2026 (ongoing)	85% of developers regularly use AI coding tools	Industry
2026 (ongoing)	Enterprise agentic AI adoption accelerates — "Agents as a Service" emerges	Industry

Contributing

Contributions welcome! Please read the contributing guidelines first.

License

This list is released under MIT License.

⭐ If you find this list useful, please give it a star! ⭐

440+ resources across 25 categories — from foundation models to agent protocols to generative AI.

Made with ❤️ by Zijian Ni

Last updated: May 30, 2026

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
.lycheeignore		.lycheeignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.ja.md		README.ja.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md

Folders and files

Latest commit

History

Repository files navigation

🤖 Awesome AI Agents 2026

🏷️ Status Legend

🚀 Start Here

Quick Navigation

Contents

🧠 Foundation Models 2026

OpenAI

Anthropic

Google DeepMind

Meta

Sakana AI

Zyphra

Mistral AI

DeepSeek

Meta (Llama)

Alibaba (Qwen)

MiniMax (extra)

ByteDance (Doubao / Seedance)

StepFun

Baichuan

xAI (Grok)

Microsoft (Phi)

Cohere

Baidu (ERNIE / 文心)

Zhipu AI / Z.ai (GLM)

MiniMax

Moonshot AI (Kimi)

ByteDance (Doubao / 豆包)

Amazon (Nova)

NVIDIA (Nemotron)

Tencent (Hunyuan)

Apple

Samsung

Inflection AI

01.AI

Chinese Academy of Sciences

🎨 Multimodal & Generative AI

Image Generation

Video Generation

Audio & Music

🔗 Agent Protocols & Standards

Model Context Protocol (MCP)

Agent-to-Agent Protocol (A2A)

Other Standards

🏗️ Agent Frameworks

🛠️ Agent IDEs & Visual Builders

🧠 Agent Memory

🔌 Tool & API Integration

🧪 Agent Sandboxing & Compute Isolation

🛡️ Agent Security

🔍 RAG & Knowledge

💻 Coding Agents

Terminal & CLI Agents

IDE-Based Agents

Autonomous Software Engineers

🤖 Physical AI & Embodied Agents

Foundational Models & Research

Humanoid Robots

Consumer Robotics & Wearables

Autonomous Driving

🎮 Agent Simulation & World Models

📊 Benchmarks & Leaderboards

🖥️ Computer Use & Desktop Agents

🌐 Browser & Web Agents

🗣️ Voice & Multimodal Agents

📱 Personal AI Agents

📱 Mobile Agents

🏢 Enterprise Agent Platforms

📊 Agent Evaluation & Observability

🔬 AI Research Tools

📚 Learning Resources

Papers

Courses & Tutorials

Curated Lists

🇨🇳 Chinese AI Ecosystem

Agent Platforms & Frameworks

Packages