MNN LLM Chat and Agent Framework Usage Guide

This repository contains examples and documentation for using MNN (Mobile Neural Network) for on-device LLM chat and agent implementations, as well as examples for consuming APIs from the MnnLlmChat Android application. This guide will help you understand how to leverage these powerful tools for building AI-powered applications that run entirely on-device or connect to AI services.

Qwen3 Examples - Practical Python code examples for consuming the MnnLlmChat Android application API that runs Qwen3 models
MCP Client Examples - Examples of MCP clients that consume external MCP services
MCP Server Examples - Examples of MCP servers that provide external MCP services
Qwen-Agent Framework Usage - Examples using the official Qwen-Agent framework from https://github.com/QwenLM/Qwen-Agent
RAG Examples - Examples of Retrieval-Augmented Generation systems that consume the MnnLlmChat API
MNN LLM Guide - Detailed explanation and examples of using MNN for LLM implementations
Advanced Use Cases - Complex implementations combining multiple capabilities

What is MNN?

MNN (Mobile Neural Network) is a lightweight deep learning framework developed by Alibaba that enables efficient on-device inference. With MNN, you can run large language models directly on mobile devices or edge hardware without requiring cloud connectivity.

What is MnnLlmChat?

MnnLlmChat is an Android application from the MNN repository that provides a local API for interacting with LLMs (including Qwen3 models) running on the Android device. The API can be consumed by external Python clients.

Key Features

On-device execution: No internet connection required after model deployment
API Consumption: Connect to external AI services like MnnLlmChat
Privacy-focused: Data never leaves the device (for on-device models)
Low latency: Immediate response without network overhead (for on-device models)
Cost-effective: No per-API-call charges (for on-device models)
Customizable: Full control over model and inference process
RAG Support: Enhanced responses with external knowledge sources
MCP Integration: Standardized access to external tools and resources

What are Qwen-Agent Framework Applications?

Qwen-Agent is a framework for developing LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen. It serves as the backend for Qwen Chat and includes example applications such as Browser Assistant, Code Interpreter, and Custom Assistant.

What is MCP (Model Context Protocol)?

Model Context Protocol (MCP) is a standardized protocol that allows AI models to securely interact with external tools, data sources, and services. MCP enables:

Standardized tool interfaces
Secure access to external resources
Improved context management
Seamless integration with various systems

MCP includes both client implementations (for consuming external services) and server implementations (for providing external services).

What is RAG (Retrieval-Augmented Generation)?

RAG (Retrieval-Augmented Generation) is a technique that enhances language models by retrieving relevant documents or information from external knowledge sources before generating a response. This allows models to provide more accurate, factual, and up-to-date information.

Getting Started

To get started with these examples, you'll need:

A converted MNN model file (.mnn format) for on-device inference
Access to an MnnLlmChat Android application running the API
Python 3.8 or higher
Required Python packages (see individual guides)

Installation

First, install the required packages:

# For MNN-based examples
pip install mnn transformers torch numpy

# For API consumption examples
pip install requests

# For MCP client examples
pip install requests

# For MCP server examples  
pip install flask fastapi uvicorn aiofiles

# For RAG examples
pip install requests scikit-learn

# For Qwen-Agent examples
pip install -U qwen-agent

Model Preparation

To use MNN-based examples, you'll need to convert your LLM model to MNN format:

# First, convert your model to ONNX format (example for HuggingFace models)
python -c "
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = 'microsoft/DialoGPT-medium'  # Replace with your model
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Add pad token if needed
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Create a sample input
inputs = tokenizer('Hello, how are you?', return_tensors='pt')

# Export to ONNX
torch.onnx.export(
    model,
    (inputs['input_ids'], inputs['attention_mask']),
    'model.onnx',
    export_params=True,
    opset_version=11,
    input_names=['input_ids', 'attention_mask'],
    output_names=['logits'],
    dynamic_axes={
        'input_ids': {0: 'batch_size', 1: 'sequence'},
        'attention_mask': {0: 'batch_size', 1: 'sequence'},
        'logits': {0: 'batch_size', 1: 'sequence', 2: 'vocab_size'}
    }
)
"

# Then convert ONNX to MNN using the MNN converter
# Make sure you have the MNN converter tool available
./MNNConvert -f ONNX --modelFile model.onnx --MNNModel model.mnn --bizCode biz

Setting Up MnnLlmChat API Access

To use examples that consume the MnnLlmChat API:

Install the MnnLlmChat Android application on your device
Ensure the application is running and the API server is active
Find your Android device's IP address on the local network
Configure your Python client to connect to the device IP

Next Steps

To dive deeper into each topic, check the linked guides in the table of contents above.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
advanced_examples.md		advanced_examples.md
mcp_client_examples.md		mcp_client_examples.md
mcp_server_examples.md		mcp_server_examples.md
qwen3_examples.md		qwen3_examples.md
qwen_agent_framework_usage.md		qwen_agent_framework_usage.md
qwen_agents_guide.md		qwen_agents_guide.md
rag_examples.md		rag_examples.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MNN LLM Chat and Agent Framework Usage Guide

Table of Contents

What is MNN?

What is MnnLlmChat?

Key Features

What are Qwen-Agent Framework Applications?

What is MCP (Model Context Protocol)?

What is RAG (Retrieval-Augmented Generation)?

Getting Started

Installation

Model Preparation

Setting Up MnnLlmChat API Access

Next Steps

About

Uh oh!

Releases

Packages

NicasioSirvent/mnn_llm_chat_api_usage

Folders and files

Latest commit

History

Repository files navigation

MNN LLM Chat and Agent Framework Usage Guide

Table of Contents

What is MNN?

What is MnnLlmChat?

Key Features

What are Qwen-Agent Framework Applications?

What is MCP (Model Context Protocol)?

What is RAG (Retrieval-Augmented Generation)?

Getting Started

Installation

Model Preparation

Setting Up MnnLlmChat API Access

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages