Skip to content

NicasioSirvent/mnn_llm_chat_api_usage

Repository files navigation

MNN LLM Chat and Agent Framework Usage Guide

This repository contains examples and documentation for using MNN (Mobile Neural Network) for on-device LLM chat and agent implementations, as well as examples for consuming APIs from the MnnLlmChat Android application. This guide will help you understand how to leverage these powerful tools for building AI-powered applications that run entirely on-device or connect to AI services.

Table of Contents

  1. Qwen3 Examples - Practical Python code examples for consuming the MnnLlmChat Android application API that runs Qwen3 models
  2. MCP Client Examples - Examples of MCP clients that consume external MCP services
  3. MCP Server Examples - Examples of MCP servers that provide external MCP services
  4. Qwen-Agent Framework Usage - Examples using the official Qwen-Agent framework from https://github.com/QwenLM/Qwen-Agent
  5. RAG Examples - Examples of Retrieval-Augmented Generation systems that consume the MnnLlmChat API
  6. MNN LLM Guide - Detailed explanation and examples of using MNN for LLM implementations
  7. Advanced Use Cases - Complex implementations combining multiple capabilities

What is MNN?

MNN (Mobile Neural Network) is a lightweight deep learning framework developed by Alibaba that enables efficient on-device inference. With MNN, you can run large language models directly on mobile devices or edge hardware without requiring cloud connectivity.

What is MnnLlmChat?

MnnLlmChat is an Android application from the MNN repository that provides a local API for interacting with LLMs (including Qwen3 models) running on the Android device. The API can be consumed by external Python clients.

Key Features

  • On-device execution: No internet connection required after model deployment
  • API Consumption: Connect to external AI services like MnnLlmChat
  • Privacy-focused: Data never leaves the device (for on-device models)
  • Low latency: Immediate response without network overhead (for on-device models)
  • Cost-effective: No per-API-call charges (for on-device models)
  • Customizable: Full control over model and inference process
  • RAG Support: Enhanced responses with external knowledge sources
  • MCP Integration: Standardized access to external tools and resources

What are Qwen-Agent Framework Applications?

Qwen-Agent is a framework for developing LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen. It serves as the backend for Qwen Chat and includes example applications such as Browser Assistant, Code Interpreter, and Custom Assistant.

What is MCP (Model Context Protocol)?

Model Context Protocol (MCP) is a standardized protocol that allows AI models to securely interact with external tools, data sources, and services. MCP enables:

  • Standardized tool interfaces
  • Secure access to external resources
  • Improved context management
  • Seamless integration with various systems

MCP includes both client implementations (for consuming external services) and server implementations (for providing external services).

What is RAG (Retrieval-Augmented Generation)?

RAG (Retrieval-Augmented Generation) is a technique that enhances language models by retrieving relevant documents or information from external knowledge sources before generating a response. This allows models to provide more accurate, factual, and up-to-date information.

Getting Started

To get started with these examples, you'll need:

  1. A converted MNN model file (.mnn format) for on-device inference
  2. Access to an MnnLlmChat Android application running the API
  3. Python 3.8 or higher
  4. Required Python packages (see individual guides)

Installation

First, install the required packages:

# For MNN-based examples
pip install mnn transformers torch numpy

# For API consumption examples
pip install requests

# For MCP client examples
pip install requests

# For MCP server examples  
pip install flask fastapi uvicorn aiofiles

# For RAG examples
pip install requests scikit-learn

# For Qwen-Agent examples
pip install -U qwen-agent

Model Preparation

To use MNN-based examples, you'll need to convert your LLM model to MNN format:

# First, convert your model to ONNX format (example for HuggingFace models)
python -c "
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = 'microsoft/DialoGPT-medium'  # Replace with your model
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Add pad token if needed
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Create a sample input
inputs = tokenizer('Hello, how are you?', return_tensors='pt')

# Export to ONNX
torch.onnx.export(
    model,
    (inputs['input_ids'], inputs['attention_mask']),
    'model.onnx',
    export_params=True,
    opset_version=11,
    input_names=['input_ids', 'attention_mask'],
    output_names=['logits'],
    dynamic_axes={
        'input_ids': {0: 'batch_size', 1: 'sequence'},
        'attention_mask': {0: 'batch_size', 1: 'sequence'},
        'logits': {0: 'batch_size', 1: 'sequence', 2: 'vocab_size'}
    }
)
"

# Then convert ONNX to MNN using the MNN converter
# Make sure you have the MNN converter tool available
./MNNConvert -f ONNX --modelFile model.onnx --MNNModel model.mnn --bizCode biz

Setting Up MnnLlmChat API Access

To use examples that consume the MnnLlmChat API:

  1. Install the MnnLlmChat Android application on your device
  2. Ensure the application is running and the API server is active
  3. Find your Android device's IP address on the local network
  4. Configure your Python client to connect to the device IP

Next Steps

To dive deeper into each topic, check the linked guides in the table of contents above.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published