|
| 1 | +# kserve-storage |
| 2 | + |
| 3 | +A Python module for handling model storage and retrieval for KServe. This package provides a unified API to download models from various storage backends including cloud providers, file systems, and model hubs. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- Support for multiple storage backends: |
| 8 | + - Local file system |
| 9 | + - Google Cloud Storage (GCS) |
| 10 | + - Amazon S3 |
| 11 | + - Azure Blob Storage |
| 12 | + - Azure File Share |
| 13 | + - HTTP/HTTPS URLs |
| 14 | + - HDFS/WebHDFS |
| 15 | + - Hugging Face Hub |
| 16 | +- Automatic extraction of compressed files (zip, tar.gz, tgz) |
| 17 | +- Configuration via environment variables |
| 18 | +- Logging and error handling |
| 19 | + |
| 20 | +## Installation |
| 21 | + |
| 22 | +```bash |
| 23 | +pip install kserve-storage |
| 24 | +``` |
| 25 | + |
| 26 | +Or with Poetry: |
| 27 | + |
| 28 | +```bash |
| 29 | +poetry add kserve-storage |
| 30 | +``` |
| 31 | + |
| 32 | +## Usage |
| 33 | + |
| 34 | +The main entry point is the `Storage` class which provides a `download` method: |
| 35 | + |
| 36 | +```python |
| 37 | +from kserve_storage import Storage |
| 38 | + |
| 39 | +# Download from GCS to a temporary directory |
| 40 | +model_dir = Storage.download("gs://your-bucket/model") |
| 41 | + |
| 42 | +# Download from S3 to a specific directory |
| 43 | +model_dir = Storage.download("s3://your-bucket/model", "/path/to/destination") |
| 44 | +``` |
| 45 | + |
| 46 | +## Supported Storage Providers |
| 47 | + |
| 48 | +### Local File System |
| 49 | + |
| 50 | +```python |
| 51 | +model_dir = Storage.download("file:///path/to/model") |
| 52 | +# or using direct path |
| 53 | +model_dir = Storage.download("/path/to/model") |
| 54 | +``` |
| 55 | + |
| 56 | +### Google Cloud Storage |
| 57 | + |
| 58 | +```python |
| 59 | +model_dir = Storage.download("gs://bucket-name/model-path") |
| 60 | +``` |
| 61 | + |
| 62 | +### Amazon S3 |
| 63 | + |
| 64 | +```python |
| 65 | +model_dir = Storage.download("s3://bucket-name/model-path") |
| 66 | +``` |
| 67 | + |
| 68 | +### Azure Blob Storage |
| 69 | + |
| 70 | +```python |
| 71 | +model_dir = Storage.download("https://account-name.blob.core.windows.net/container-name/model-path") |
| 72 | +``` |
| 73 | + |
| 74 | +### Azure File Share |
| 75 | + |
| 76 | +```python |
| 77 | +model_dir = Storage.download("https://account-name.file.core.windows.net/share-name/model-path") |
| 78 | +``` |
| 79 | + |
| 80 | +### HTTP/HTTPS URLs |
| 81 | + |
| 82 | +```python |
| 83 | +model_dir = Storage.download("https://example.com/path/to/model.zip") |
| 84 | +``` |
| 85 | + |
| 86 | +### HDFS |
| 87 | + |
| 88 | +```python |
| 89 | +model_dir = Storage.download("hdfs://path/to/model") |
| 90 | +# or WebHDFS |
| 91 | +model_dir = Storage.download("webhdfs://path/to/model") |
| 92 | +``` |
| 93 | + |
| 94 | +### Hugging Face Hub |
| 95 | + |
| 96 | +```python |
| 97 | +model_dir = Storage.download("hf://org-name/model-name") |
| 98 | +# With specific revision |
| 99 | +model_dir = Storage.download("hf://org-name/model-name:revision") |
| 100 | +``` |
| 101 | + |
| 102 | +## Environment Variables |
| 103 | + |
| 104 | +### Hugging Face Hub Configuration |
| 105 | + |
| 106 | +These are all handled by the `huggingface_hub` package, you can see all the available environment variables [here](https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables). |
| 107 | + |
| 108 | +### AWS/S3 Configuration / Environments variables |
| 109 | + |
| 110 | +- `AWS_ENDPOINT_URL`: Custom endpoint URL for S3-compatible storage |
| 111 | +- `AWS_ACCESS_KEY_ID`: Access key for S3 |
| 112 | +- `AWS_SECRET_ACCESS_KEY`: Secret access key for S3 |
| 113 | +- `AWS_DEFAULT_REGION`: AWS region |
| 114 | +- `AWS_CA_BUNDLE`: Path to custom CA bundle |
| 115 | +- `S3_VERIFY_SSL`: Enable/disable SSL verification |
| 116 | +- `S3_USER_VIRTUAL_BUCKET`: Use virtual hosted-style URLs |
| 117 | +- `S3_USE_ACCELERATE`: Use transfer acceleration |
| 118 | +- `awsAnonymousCredential`: Use unsigned requests for public access |
| 119 | + |
| 120 | +### Azure Configuration |
| 121 | + |
| 122 | +- `AZURE_STORAGE_ACCESS_KEY`: Storage account access key |
| 123 | +- `AZ_TENANT_ID` / `AZURE_TENANT_ID`: Azure AD tenant ID |
| 124 | +- `AZ_CLIENT_ID` / `AZURE_CLIENT_ID`: Azure AD client ID |
| 125 | +- `AZ_CLIENT_SECRET` / `AZURE_CLIENT_SECRET`: Azure AD client secret |
| 126 | + |
| 127 | +### HDFS Configuration |
| 128 | + |
| 129 | +- `HDFS_SECRET_DIR`: Directory containing HDFS configuration files |
| 130 | +- `HDFS_NAMENODE`: HDFS namenode address |
| 131 | +- `USER_PROXY`: User proxy for HDFS |
| 132 | +- `HDFS_ROOTPATH`: Root path in HDFS |
| 133 | +- `KERBEROS_PRINCIPAL`: Kerberos principal for authentication |
| 134 | +- `KERBEROS_KEYTAB`: Path to Kerberos keytab file |
| 135 | +- `TLS_CERT`, `TLS_KEY`, `TLS_CA`: TLS configuration files |
| 136 | +- `TLS_SKIP_VERIFY`: Skip TLS verification |
| 137 | +- `N_THREADS`: Number of download threads |
| 138 | + |
| 139 | +## Storage Configuration |
| 140 | + |
| 141 | +Storage configuration can be provided through environment variables: |
| 142 | + |
| 143 | +- `STORAGE_CONFIG`: JSON string containing storage configuration |
| 144 | +- `STORAGE_OVERRIDE_CONFIG`: JSON string to override storage configuration |
| 145 | + |
| 146 | + |
| 147 | +## License |
| 148 | + |
| 149 | +Apache License 2.0 - See [LICENSE](https://github.com/kserve/kserve/blob/master/LICENSE) for details. |
0 commit comments