Add tutorial: Qwen3.6-27B with MTP (Multi-Token Prediction) on Jetson Thor / AGX Orin for local agentic coding

## Community signal

Multiple high-engagement posts this week show massive interest in **Qwen3.6-27B with MTP (Multi-Token Prediction)** for local agentic coding:
- [2.5x faster inference with Qwen 3.6 27B using MTP](https://reddit.com/r/LocalLLaMA/comments/1t57xuu/25x_faster_inference_with_qwen_36_27b_using_mtp/) — **596 upvotes, 186 comments**
- [Qwen3.6-27B MTP grafted on Unsloth UD XL: 2.5x throughput](https://reddit.com/r/LocalLLaMA/comments/1t5ageq/qwen3627b_with_mtp_grafted_on_unsloth_ud_xl_25x/) — 89 upvotes
- [Qwen 3.6 27B MTP on V100 32GB: 54 t/s](https://reddit.com/r/LocalLLaMA/comments/1t4zu88/qwen_36_27b_mtp_on_v100_32gb_54_ts/) — 69 upvotes
- [Quality comparison of Qwen 3.6 27B quantizations](https://reddit.com/r/LocalLLaMA/comments/1t53dhp/quality_comparison_between_qwen_36_27b/) — **383 upvotes**

The narrative is clear: Qwen3.6-27B + MTP + a 48GB budget = **first viable local replacement for Claude Code / Codex** at 262k context.

## Why this matters for Jetson

This is a perfect fit for **Jetson Thor (128GB)** and **AGX Orin 64GB** — the memory and bandwidth make 27B dense at Q4–Q8 with speculative/MTP decoding a headline use case. Jetson AI Lab already has a Qwen3.6 27B model card but does not cover the MTP draft-model flow, which is what's unlocking 2.5x throughput and making agentic coding usable locally.

## Suggested tutorial scope

- Build **llama.cpp with the MTP PR (#22673)** on JetPack 7.x for Thor and JetPack 6.x for AGX Orin
- Run **Qwen3.6-27B Q4_K_XL / Q5_K_XL / Q8** with MTP draft head; measure tok/s on Thor, AGX Orin 64GB, and Orin NX 16GB (where feasible)
- Compare against **non-MTP baseline** and against **Gemma 4 31B MTP** for agentic coding
- Wire up drop-in **OpenAI / Anthropic API endpoints** so users can plug it into OpenCode, Aider, Continue.dev
- Report **262k-context memory footprint**, prefill latency, and slot-reuse tricks (see `--slots` trick from the Ralph-loop post)
- Include quality-vs-quant comparison aligned with the community's BF16/Q8/Q6/Q4/IQ4/IQ3 matrix

---
*Filed by JetsonPulse*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tutorial: Qwen3.6-27B with MTP (Multi-Token Prediction) on Jetson Thor / AGX Orin for local agentic coding #397

Community signal

Why this matters for Jetson

Suggested tutorial scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add tutorial: Qwen3.6-27B with MTP (Multi-Token Prediction) on Jetson Thor / AGX Orin for local agentic coding #397

Description

Community signal

Why this matters for Jetson

Suggested tutorial scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions