A CLI tool to analyze Hugging Face model architectures without downloading the full model.
- Instant metadata analysis via the model repository
- Sharded model support (multi-safetensors files)
- Layer insights: tensor names, shapes, data types, memory estimates
- Full model metrics: parameter count and size calculation
- Mirror compatibility for downloads
Install:
pip install model-inspect
Analyze a model with verbose output:
model-inspect Qwen/Qwen2.5-0.5B
Output:
+-------------------------------------------------+---------------+-----------+--------------+
| Layer Name | Shape | Data Type | Size (bytes) |
+-------------------------------------------------+---------------+-----------+--------------+
| model.embed_tokens.weight | (151936, 896) | BF16 | 272,269,312 |
| model.layers.0.input_layernorm.weight | (896,) | BF16 | 1,792 |
| model.layers.0.mlp.down_proj.weight | (896, 4864) | BF16 | 8,716,288 |
| model.layers.0.mlp.gate_proj.weight | (4864, 896) | BF16 | 8,716,288 |
| ... |
| model.layers.23.self_attn.q_proj.weight | (896, 896) | BF16 | 1,605,632 |
| model.layers.23.self_attn.v_proj.bias | (128,) | BF16 | 256 |
| model.layers.23.self_attn.v_proj.weight | (128, 896) | BF16 | 229,376 |
| model.norm.weight | (896,) | BF16 | 1,792 |
+-------------------------------------------------+---------------+-----------+--------------+
Analyze a model with verbose output:
model-inspect Qwen/Qwen2.5-0.5B -v
Use a Hugging Face mirror:
model-inspect Qwen/Qwen2.5-0.5B --mirror https://hf-mirror.com
Use multiple threads for faster processing of sharded models:
model-inspect deepseek-ai/DeepSeek-V3 -j 8
usage: model-inspect [-h] [--revision REVISION] [-v] [-j JOBS] [--mirror MIRROR] [--timeout TIMEOUT] [--retries RETRIES] [--backoff BACKOFF] model
Hugging Face Model Layer Analyzer
positional arguments:
model Hugging Face model name (e.g. 'username/model')
optional arguments:
-h, --help show this help message and exit
--revision REVISION Model revision (default: main)
-v, --verbose Enable verbose output (-v for process, -vv for content)
-j JOBS, --jobs JOBS Number of concurrent jobs (default: 1)
--mirror MIRROR Custom Hugging Face mirror URL (e.g. 'https://hf-mirror.com')
--timeout TIMEOUT Request timeout in seconds (default: 30)
--retries RETRIES Number of retry attempts (default: 3)
--backoff BACKOFF Backoff factor for retries (default: 1)
Model Inspect works by:
- Fetching the model's safetensors index file (if available, for sharded models)
- Reading only the headers of the safetensors files to extract tensor metadata
- Computing tensor sizes based on shapes and data types
- Presenting a summary table of all tensors and their properties
This approach avoids downloading the actual model weights, making it much faster and resource-efficient than downloading the entire model.
- Python 3.7+
- requests
- prettytable
- tqdm