coremltools 8.0b1
Pre-release
      Pre-release
    
        For all the new features, find the updated documentation in the docs-guides
- New utilities  
coremltools.utils.MultiFunctionDescriptor()andcoremltools.utils.save_multifunction, for creating anmlprogramwith multiple functions in it, that can share weights. Updated the model loading API to load specific functions for prediction. - Stateful Core ML models: updates to the converter to produce Core ML models with the State Type (new type introduced in iOS18/macOS15).
 coremltools.optimize- Updates to model representation (
mlprogram) pertaining to compression:- Support compression with more granularities: blockwise quantization, grouped channel wise palettization
 - 4 bit weight quantization (in addition to 8 bit quantization that was already supported)
 - 3 bit palettization (in addition to 1,2,4,6,8 bit palettization that was already supported)
 - Support joint compression modes:
- 8 bit Look-up-tables for palettization
 - ability to combine weight pruning and palettization
 - ability to combine weight pruning and quantization
 
 
 - API updates:
coremltools.optimize.coreml- Updated existing APIs to account for features mentioned above
 - Support joint compression by applying compression techniques on an already compressed model
 - A new API to support activation quantization using calibration data, which can be used to take a W16A16 Core ML model and produce a W8A8 model: 
ct.optimize.coreml.experimental.linear_quantize_activations- (to be upgraded from experimental to the official name space in a future release)
 
 
coremltools.optimize.torch- Updated existing APIs to account for features mentioned above
 - Added new APIs for data free compression (
PostTrainingPalettizer,PostTrainingQuantizer - Added new APIs for calibration data based compression (
SKMPalettizerfor sensitive k-means palettization algorithm,layerwise_compressionfor GPTQ/sparseGPT quantization/pruning algorithm) - Updated the APIs + the 
coremltools.convertimplementation, so that for converting torch models compressed withct.optimize.torch, there is no longer a need to provide additional pass pipeline arguments. 
 
- Updates to model representation (
 - iOS18 / macOS15 ops
- compression related ops: 
constexpr_blockwise_shift_scale,constexpr_lut_to_dense,constexpr_sparse_to_dense, etc - updates to the GRU op
 - PyTorch op 
scaled_dot_product_attention 
 - compression related ops: 
 - Experimental 
torch.exportconversion support 
import torch
import torchvision
import coremltools as ct
torch_model = torchvision.models.vit_b_16(weights="IMAGENET1K_V1")
x = torch.rand((1, 3, 224, 224))
example_inputs = (x,)
exported_program = torch.export.export(torch_model, example_inputs)
coreml_model = ct.convert(exported_program)
- Various other bug fixes, enhancements, clean ups and optimizations
 
Known Issues
- Conversion will fail when using certain palettization modes (e.g. int8 LUT, vector palettization) with torch models using 
ct.optimize.torch - Some of the joint compression modes when used with the training time APIs in 
ct.optimize.torchwill result in a torch model that is not correctly converted - The post-training palettization config for mlpackage models (
ct.optimize.coreml.``OpPalettizerConfig) does not yet have all the arguments that are supported in thecto.torch.palettizationAPIs (e.g.lut_dtype(to get int8 dtyped LUT),cluster_dim(to do vector palettization),enable_per_channel_scale(to apply per-channel-scale) etc). - Applying symmetric quantization using GPTQ algorithm with 
ct.optimize.torch.layerwise_compression.LayerwiseCompressorwill not produce the correct quantization scales, due to a known bug. This may lead to poor accuracy for the quantized model 
Special thanks to our external contributors for this release: @teelrabbit @igeni @Cyanosite