Environment
shell start_server.sh:
export APC_ENABLED=1
export APC_NUM_BLOCKS=4096
export APC_DISK_PATH=./.cache/mlx-vlm/apc
export APC_DISK_MAX_GB=3
export APC_DISK_SHARD_MAX_BLOCKS=256
export APC_MAX_POOL_TENSORS=450000
export APC_LAYER_MAJOR_MEMORY_MIN_TOKENS=50000
mlx_vlm.server --model /Users/neo/models/mlx-community/Qwen3.6-35B-A3B-mxfp4 --port 8888
--draft-model /Users/neo/models/mlx-community/Qwen3.6-35B-A3B-MTP-5bit
--draft-kind mtp --log-level DEBUG
stats:
health:
{ 'apc_enabled': True,
'configured_context_limit': None,
'continuous_batching_enabled': True,
'effective_context_limit': 262144,
'loaded_adapter': None,
'loaded_context_size': 262144,
'loaded_model': '/Users/neo/models/mlx-community/Qwen3.6-35B-A3B-mxfp4',
'loaded_tool_parser': 'qwen3_coder',
'status': 'healthy'}
apc:
{ 'block_size': 16,
'disk_blocks_indexed': 0,
'disk_bytes': 1088667890,
'disk_evictions': 0,
'disk_exact_indexed': 12,
'disk_files': 12,
'disk_hits': 0,
'disk_max_bytes': 3221225472,
'disk_writes': 0,
'enabled': True,
'evictions': 0,
'exact_hits': 0,
'exact_stores': 0,
'lookups_hit': 0,
'lookups_miss': 0,
'matched_tokens': 0,
'num_blocks': 4096,
'pool_used': 0,
'served_tokens': 0,
'stores': 0,
'token_hit_rate': 0.0}
It seemed apc is enabled, but no effect. What did I do wrong? or missed parameters?
Environment
M2 Max Macbook Pro mem 64GB
mlx-vlm 0.6.2 (installed from source code on June 8)
python 3.11
shell start_server.sh:
export APC_ENABLED=1
export APC_NUM_BLOCKS=4096
export APC_DISK_PATH=./.cache/mlx-vlm/apc
export APC_DISK_MAX_GB=3
export APC_DISK_SHARD_MAX_BLOCKS=256
export APC_MAX_POOL_TENSORS=450000
export APC_LAYER_MAJOR_MEMORY_MIN_TOKENS=50000
mlx_vlm.server --model /Users/neo/models/mlx-community/Qwen3.6-35B-A3B-mxfp4 --port 8888
--draft-model /Users/neo/models/mlx-community/Qwen3.6-35B-A3B-MTP-5bit
--draft-kind mtp --log-level DEBUG
stats:
health:
{ 'apc_enabled': True,
'configured_context_limit': None,
'continuous_batching_enabled': True,
'effective_context_limit': 262144,
'loaded_adapter': None,
'loaded_context_size': 262144,
'loaded_model': '/Users/neo/models/mlx-community/Qwen3.6-35B-A3B-mxfp4',
'loaded_tool_parser': 'qwen3_coder',
'status': 'healthy'}
apc:
{ 'block_size': 16,
'disk_blocks_indexed': 0,
'disk_bytes': 1088667890,
'disk_evictions': 0,
'disk_exact_indexed': 12,
'disk_files': 12,
'disk_hits': 0,
'disk_max_bytes': 3221225472,
'disk_writes': 0,
'enabled': True,
'evictions': 0,
'exact_hits': 0,
'exact_stores': 0,
'lookups_hit': 0,
'lookups_miss': 0,
'matched_tokens': 0,
'num_blocks': 4096,
'pool_used': 0,
'served_tokens': 0,
'stores': 0,
'token_hit_rate': 0.0}
It seemed apc is enabled, but no effect. What did I do wrong? or missed parameters?