Skip to content

Commit 6b98fe1

Browse files
author
Ramkumar Chinchani
committed
docs: initial proposal for OCI artifact registry
Partially addresses kubeflow/community#682 Signed-off-by: Ramkumar Chinchani <[email protected]>
1 parent c9bdfc2 commit 6b98fe1

File tree

1 file changed

+127
-0
lines changed

1 file changed

+127
-0
lines changed

docs/oci-registry.md

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# OCI Registry as a Kubeflow Model Registry
2+
3+
## Authors
4+
5+
- Ramkumar Chinchani (Cisco)
6+
- _TBD_
7+
8+
## Maintainers
9+
10+
- Ramkumar Chinchani (Cisco)
11+
- _TBD_
12+
13+
## Motivation
14+
15+
According to the [Kubeflow 2023
16+
survey](https://blog.kubeflow.org/kubeflow-user-survey-2023/), 44% of users
17+
identified Model Registry as one of the big gaps in the user’s ML Lifecycle
18+
missing from the Kubeflow offering.
19+
20+
![Kubeflow survey](diagrams/model-registry-kubeflowsurvey.png "Kubeflow survey")
21+
22+
## Solution Overview
23+
24+
[Open Container Initiative](https://opencontainers.org/) is a sibling (to CNCF)
25+
organization under [The Linux Foundation](https://www.linuxfoundation.org/)
26+
which has the container
27+
[runtime](https://github.com/opencontainers/runtime-spec),
28+
[image](https://github.com/opencontainers/image-spec) and
29+
[distribution](https://github.com/opencontainers/distribution-spec)
30+
specifications under its purvey which are vendor-neutral contracts that the Kubernetes
31+
ecosystem relies on for running, filesystem layout, and pushing and pulling of
32+
container images.
33+
34+
However, recent developments in the OCI, specifically
35+
[_image_](https://github.com/opencontainers/image-spec/releases/tag/v1.1.0) and
36+
[_distribution_](https://github.com/opencontainers/distribution-spec/releases/tag/v1.1.0)
37+
spec **v1.1.0**, have included support for pushing arbitrary artifacts along
38+
with support for relationships between artifacts.
39+
40+
## OCI v1.1.0 Conformant Registries
41+
42+
The following are the highlights about OCI artifact registries.
43+
44+
- Container images: these represent workloads and have been the traditional use case for an OCI conformant registry.
45+
46+
- Artifacts: these represent arbitrary data (ML model data or additional
47+
metadata in this context) that can also be pushed and pulled from an OCI
48+
conformant registry.
49+
50+
- Content-addressable: all data is organized as a Merkle DAG with SHA256-hashed
51+
blobs. This bodes well for reproducibility.
52+
53+
- Versioning: apart from the SHA256 hash, all data can be tagged with a human-readable version.
54+
55+
- Annotations: there is provision to append arbitrary annotations to any artifact.
56+
57+
- References: an artifact can be pushed along with a reference to another
58+
artifact (via the `Subject` field) which can be leveraged to address the data
59+
lineage use case.
60+
61+
- Provenance: each artifact can be cryptographically signed, with the signature
62+
as its own separate artifact "referring" to the signed artifact.
63+
64+
- Ecosystem tooling: there are OCI v1.1.0 conformant registries and clients
65+
already available which can be leveraged.
66+
67+
- Infrastructure reuse: a container image registry is already a critical piece of
68+
infrastructure which can now be reused.
69+
70+
71+
## References
72+
73+
_TBD_
74+
75+
# Appendix
76+
77+
The following section demonstrates the supported workflow.
78+
79+
NOTE: This section is not an endorsement of all of the tools used but merely
80+
represents a demonstration. Readers are free to pick and choose any tool as
81+
they see fit with the requirement that the choice should be OCI v1.1.0
82+
conformant.
83+
84+
[`zot`](https://zotregistry.dev) is chosen as the registry and
85+
[`regctl`](https://github.com/regclient/regclient) as the client.
86+
87+
## Start a registry
88+
89+
```bash
90+
podman run -p 5000:5000 ghcr.io/project-zot/zot-linux-amd64:latest
91+
```
92+
93+
## Download model data
94+
95+
```bash
96+
curl -v -L0 https://github.com/tarilabs/demo20231212/raw/main/v1.nb20231206162408/mnist.onnx -o mnist.onnx
97+
```
98+
99+
## Upload model data with annotations
100+
101+
```bash
102+
regctl artifact put \
103+
--annotation description="used for demo purposes" \
104+
--annotation model_format_name="onnx" \
105+
--annotation model_format_version="1" \
106+
--artifact-type "application/vnd.model.type" \
107+
localhost:5000/models/my-model-from-gh:v1 \
108+
-f mnist.onnx
109+
```
110+
111+
## List all artifacts
112+
113+
```bash
114+
regctl artifact list localhost:5000/models/my-model-from-gh:v1 --format '{{jsonPretty .}}
115+
```
116+
117+
## Filter by artifact type
118+
119+
```bash
120+
regctl artifact list --filter-artifact-type "application/vnd.model.type" localhost:5000/models/my-model-from-gh:v1 --format '{{jsonPretty .}}'
121+
```
122+
123+
## Download model data
124+
125+
```bash
126+
regctl artifact get localhost:5000/models/my-model-from-gh:v1 > mnist.onnx.copy
127+
```

0 commit comments

Comments
 (0)