Skip to content

0.6.2 newest main branch source, APC enabled but not APC effects [Qwen 3.6 35B A3B mxfp4 with 5bit draft MTP model] #1339

@NeoInBJ

Description

@NeoInBJ

Environment

  • M2 Max Macbook Pro mem 64GB

    mlx-vlm 0.6.2 (installed from source code on June 8)
    python 3.11

shell start_server.sh:
export APC_ENABLED=1
export APC_NUM_BLOCKS=4096
export APC_DISK_PATH=./.cache/mlx-vlm/apc
export APC_DISK_MAX_GB=3
export APC_DISK_SHARD_MAX_BLOCKS=256
export APC_MAX_POOL_TENSORS=450000
export APC_LAYER_MAJOR_MEMORY_MIN_TOKENS=50000
mlx_vlm.server --model /Users/neo/models/mlx-community/Qwen3.6-35B-A3B-mxfp4 --port 8888
--draft-model /Users/neo/models/mlx-community/Qwen3.6-35B-A3B-MTP-5bit
--draft-kind mtp --log-level DEBUG

stats:
health:

{ 'apc_enabled': True,

'configured_context_limit': None,

'continuous_batching_enabled': True,

'effective_context_limit': 262144,

'loaded_adapter': None,

'loaded_context_size': 262144,

'loaded_model': '/Users/neo/models/mlx-community/Qwen3.6-35B-A3B-mxfp4',

'loaded_tool_parser': 'qwen3_coder',

'status': 'healthy'}

apc:

{ 'block_size': 16,

'disk_blocks_indexed': 0,

'disk_bytes': 1088667890,

'disk_evictions': 0,

'disk_exact_indexed': 12,

'disk_files': 12,

'disk_hits': 0,

'disk_max_bytes': 3221225472,

'disk_writes': 0,

'enabled': True,

'evictions': 0,

'exact_hits': 0,

'exact_stores': 0,

'lookups_hit': 0,

'lookups_miss': 0,

'matched_tokens': 0,

'num_blocks': 4096,

'pool_used': 0,

'served_tokens': 0,

'stores': 0,

'token_hit_rate': 0.0}

It seemed apc is enabled, but no effect. What did I do wrong? or missed parameters?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions