Skip to content

Releases: ai-dynamo/modelexpress

ModelExpress Release v0.2.1

05 Dec 00:49
ebae023

Choose a tag to compare

ModelExpress - Release 0.2.1

Summary

ModelExpress 0.2.1 is a maintenance release focused on stability and critical fixes. This update incorporates backported fixes to ensure a smoother and more reliable deployment experience for users.This release addresses specific issues identified in previous versions—specifically around CI stability, model directory handling, and system observability—and prepares the environment for future feature updates.

Bug Fixes

  • Upgraded Rust Version to 1.90: Upgraded the Rust compiler to version 1.90 to resolve continuous integration (CI) issues, ensuring compatibility with the latest Rust features and maintaining build stability 1.
  • Ignore Hugging Face Sub-Directories: Updated the system to exclude Hugging Face sub-directories during operations, preventing potential errors and improving compatibility with Hugging Face models 2.
  • Improved Logging Mechanism: Enhanced the logging system to provide more detailed and informative logs, facilitating easier debugging and monitoring 3.

Known Limitations

  • Aggregated Kubernetes Example Deployment Failure: The examples/aggregated_k8s/agg.yaml configuration file currently fails to deploy with Dynamo 0.7.0. This is due to the deprecation and removal of the pvc field under spec.services.<serviceName> in the newer Dynamo CRD. Users attempting to deploy this example will encounter a strict decoding error.
  • Workaround: Update the agg.yaml file to adhere to the new API format by defining PVCs at the spec.pvcs level and referencing them using spec.services.<name>.volumeMounts.

Full Changelog

v0.2.0...v0.2.1

ModelExpress v0.2.0

09 Oct 17:04
3e24472

Choose a tag to compare

ModelExpress v0.2.0 Release Notes

This release marks a significant step forward for Model Express, evolving it from a foundational service to a deployable, production-ready component for large-scale inference. The key themes for this release are Performance, Kubernetes Integration, and Enhanced Configuration. We've introduced a full Helm chart for easy deployment, significantly improved download performance, and added critical features for seamless integration with inference servers like Dynamo.

Features & Enhancements

  • High-Performance Downloads (--high): You can now enable a high-CPU download mode that multiplexes downloads to better saturate high-bandwidth network connections, dramatically speeding up model fetching. (#42)
  • Helm Chart for Kubernetes Deployment: A complete Helm chart has been added, allowing you to deploy a production-ready Model Express server to any Kubernetes cluster with a single command. (#69)
  • End-to-End Dynamo Integration Example: We've added a full Kubernetes configuration example demonstrating how to run Model Express as a sidecar with an aggregated Dynamo deployment, providing a clear blueprint for production use. (#31)
  • get_model_path API for Seamless Integration: A new get_model_path API has been added, which is a critical function for integrating with inference servers like Dynamo that need to resolve the local path of a model. (#75)
  • Support for Partial Downloads (--ignore-weights): You can now download model files while ignoring the large weight files (.bin, .safetensors). This is useful for quickly fetching tokenizer and configuration files for validation or development. (#77)
  • Improved Model Name Mapping: The server can now correctly map the Hugging Face cache folder names (e.g., models--google--gemma-7b) back to their human-readable IDs (google/gemma-7b). (#73)
  • Official Dockerfile Compliance: The Dockerfile has been updated to meet OSRB compliance standards, ensuring it's secure and ready for enterprise environments. (#83)

Deployment & Configuration

  • Expanded Environment Variable Support: You can now configure cache settings, ports, and logging levels directly through environment variables, making containerized deployments more flexible. (#68, #55)
  • Corrected Kubernetes PVC Configuration: The cache directory configuration for Kubernetes deployments has been fixed, ensuring that the Persistent Volume Claim (PVC) is correctly utilized for model storage. (#71)
  • Configuration Overriding Fix: Fixed a bug where environment variables were not correctly overriding settings from a configuration file, ensuring a predictable configuration hierarchy. (#48)
  • Improved Config File Validation: The server now provides clearer error messages when validating configuration files. (#44)

Bug Fixes & Stability

  • Improved Concurrent Download Stability: Enhanced the error handling and retry logic for concurrent model downloads, making the server more resilient under high load. (#46)
  • Correct Home Directory Expansion: Fixed a bug where the tilde (~) character was not correctly expanding to the user's home directory for cache paths. (#74)
  • Dependency Version Fix: Loosened the tracing-subscriber dependency version to resolve conflicts and ensure smooth integration with the Dynamo runtime. (#79)
  • Serialization Bug Fix: Corrected a bug related to the serialization of custom configuration settings. (#76)

Housekeeping & Documentation

  • Code Cleanup: Removed dead and redundant code to improve maintainability. (#41, #43)
  • Updated Documentation: The README and other documentation files have been updated to reflect the latest changes and remove deprecated information. (#57, #70)
  • Build System: Crate names have been updated for consistency. (#65)

Looking Ahead

With the foundational Kubernetes integration now in place, our next major focus is to unlock the next level of performance by enabling direct peer-to-peer (P2P) model transfers with NIXL. Stay tuned for updates!

New Contributors

Full Changelog: https://github.com/ai-dynamo/modelexpress/compare/v0.1.0...v0.2.0

modelexpress v0.1.0

26 Aug 22:50
2309e58

Choose a tag to compare

This is the first release of Dynamo's ModelExpress v0.1.0, and is our first alpha release.
ModelExpress is a Rust-based client-server system designed to accelerate the loading of inference models in a distributed Kubernetes cluster.
Please refer to our README for more information and guides on how to use ModelExpress in Kubernetes.
This release comes with 3 Rust crates that can be found on crates.io:

You can also install and run ModelExpress by downloading this release and following the build instructions.