Skip to content

Quick Start

mtecnic edited this page May 28, 2026 · 1 revision

Quick Start

60-second tour

git clone https://github.com/mtecnic/model-chat-cli.git
cd model-chat-cli
pip install -r requirements.txt
python main.py

That's everything. The next five things will happen, in this order:

  1. Subnet scan. Auto-detects your /24 and probes ports 11434, 1234, 5000, 8000, 8080. Discovery
  2. Model table. Lists every model on every server with latency. Discovery#discovered-models-table
  3. Picker. Arrow keys, Enter to chat. Type a /command from the chat prompt for everything else.
  4. Cache write. Server list goes to ~/.model_chat_cache.json so next launch is instant.
  5. You're chatting. Streaming response with decode TPS / TTFT in the status line. Chat

The five commands you'll use

  /stress       → 6 modes of load testing                 [[Stress-Testing]]
  /arena        → Multi-model head-to-head                 [[Arena]]
  /promptarena  → System-prompt tournament                 [[Prompt-Arena]]
  /think        → Toggle reasoning-token rendering         [[Chat#thinking-mode]]
  /export       → Dump conversation to markdown

Full command list: Chat#commands.


Common first-run gotchas

Symptom Fix
"No servers found" Your model host isn't on the same /24. Check firewall / VLAN. Troubleshooting#no-servers-found
Latency column shows huge numbers Often a cold-cache LM Studio model — first request loads weights. Re-run scan.
/think shows nothing Model doesn't emit <think> tags. Try a Qwen 3 / 3.5 reasoning variant.
Stress test exits immediately The selected server's model name has changed. Re-pick from /switch.
Arena tournament hangs at "Judging…" Judge model failed mid-response; check logs/stress_test_*.log for the trace.

What to try next

Clone this wiki locally