-
Notifications
You must be signed in to change notification settings - Fork 0
Quick Start
mtecnic edited this page May 28, 2026
·
1 revision
git clone https://github.com/mtecnic/model-chat-cli.git
cd model-chat-cli
pip install -r requirements.txt
python main.pyThat's everything. The next five things will happen, in this order:
-
Subnet scan. Auto-detects your
/24and probes ports11434, 1234, 5000, 8000, 8080. Discovery - Model table. Lists every model on every server with latency. Discovery#discovered-models-table
-
Picker. Arrow keys, Enter to chat. Type a
/commandfrom the chat prompt for everything else. -
Cache write. Server list goes to
~/.model_chat_cache.jsonso next launch is instant. - You're chatting. Streaming response with decode TPS / TTFT in the status line. Chat
/stress → 6 modes of load testing [[Stress-Testing]]
/arena → Multi-model head-to-head [[Arena]]
/promptarena → System-prompt tournament [[Prompt-Arena]]
/think → Toggle reasoning-token rendering [[Chat#thinking-mode]]
/export → Dump conversation to markdown
Full command list: Chat#commands.
| Symptom | Fix |
|---|---|
| "No servers found" | Your model host isn't on the same /24. Check firewall / VLAN. Troubleshooting#no-servers-found
|
| Latency column shows huge numbers | Often a cold-cache LM Studio model — first request loads weights. Re-run scan. |
/think shows nothing |
Model doesn't emit <think> tags. Try a Qwen 3 / 3.5 reasoning variant. |
| Stress test exits immediately | The selected server's model name has changed. Re-pick from /switch. |
| Arena tournament hangs at "Judging…" | Judge model failed mid-response; check logs/stress_test_*.log for the trace. |
- Run a Throughput stress test at concurrency 20 to see how your server scales: Stress-Testing#throughput.
- Try Tool Bench → Quick against any model that advertises tool-calling: Tool-Calling-Benchmark#quick-suite.
- Run an Arena → Quick Compare between two models you're choosing between: Arena#quick-compare.
Getting started
Features
Internals
Operating