Skip to content

Commit 4f9c89d

Browse files
committed
feat: add adr for support binary data
1 parent 662f4fa commit 4f9c89d

2 files changed

Lines changed: 294 additions & 0 deletions

File tree

Lines changed: 292 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,292 @@
1+
# ADR-0001 - Support binary data
2+
3+
Date: 2023-06-19
4+
5+
## Status
6+
_Proposed | Accepted | Deprecated | Under consideration | WIP_
7+
8+
WIP
9+
10+
## Context
11+
12+
* We need to be able to upload and store binary data using the frontend, DM CLI or from another service to DMSS API.
13+
* We need to be able to get binary data using the DMSS API.
14+
15+
## Decision
16+
17+
The blueprint attribute should have attribute type set to `binary`.
18+
19+
```json
20+
{
21+
"type": "BlueprintAttribute",
22+
"name": "<the attribute name>",
23+
"attributeType": "binary"
24+
}
25+
```
26+
27+
### Model contained and storage non-contained
28+
29+
This is when the blueprint attribute is specified as non-contained in a storage recipe.
30+
31+
```json
32+
{
33+
"type": "dmss://system/SIMOS/StorageRecipe",
34+
"name": "DefaultStorageRecipe",
35+
"description": "",
36+
"attributes": [
37+
{
38+
"name": "<the attribute name>",
39+
"type": "dmss://system/SIMOS/StorageAttribute",
40+
"contained": false
41+
}
42+
]
43+
}
44+
```
45+
46+
#### From disk (using DM CLI)
47+
48+
```mermaid
49+
flowchart
50+
file -- "serialize" --> dm-cli
51+
dm-cli -- "call API" --> dmss
52+
dmss -- insert -->DB
53+
```
54+
55+
On disk the reference will point to a binary file using alias and path. The reference type is set to "Storage", that indicates that the binary data is model contained, but not storage contained.
56+
57+
```json
58+
{
59+
"type": "Reference",
60+
"address": "ALIAS:/package/binaryFile.something",
61+
"referenceType": "Storage"
62+
}
63+
```
64+
65+
The binary file will be uploaded to DMSS using DM CLI, and upon uploading DM CLI will change the address:
66+
67+
```json
68+
{
69+
"type": "Reference",
70+
"address": "dmss://data-source-1/$1234",
71+
"referenceType": "Storage"
72+
}
73+
```
74+
75+
Note that the reference does not depend on the storage medium, i.e. the reference on the file system and the corresponding reference after uploading look exactly the same, except for the transformation of address.
76+
77+
Any binary file on disk will need to be serialized before uploaded to the DMSS.
78+
79+
80+
#### From another service
81+
82+
```mermaid
83+
flowchart
84+
service -- "call API" --> dmss
85+
dmss -- insert -->DB
86+
```
87+
88+
Then the binary data needs to be uploaded first, and a reference needs to inserted into the document that should point to the binary data, similar to what the DM CLI does.
89+
90+
```json
91+
{
92+
"type": "Reference",
93+
"address": "dmss://data-source-1/$1234",
94+
"referenceType": "Storage"
95+
}
96+
```
97+
98+
#### From frontend
99+
100+
```mermaid
101+
flowchart
102+
frontend -- "call API" --> dmss
103+
dmss -- insert -->DB
104+
```
105+
106+
Then the binary data needs to be uploaded first, and a reference needs to inserted into the document that should point to the binary data, similar to what the DM CLI does.
107+
108+
```json
109+
{
110+
"type": "Reference",
111+
"address": "dmss://data-source-1/$1234",
112+
"referenceType": "Storage"
113+
}
114+
```
115+
116+
#### Reference
117+
118+
The uploaded binary files will get its own data source ID, and this ID will be used inside the address that points to the binary files.
119+
120+
The job of the data source is to determine which storage medium (repository) is used to store the data. The `$1234` is the data source ID for the given binary file inside the data source `data-source-1`. The data source ID points to an entry in the lookup table for the given data source.
121+
122+
```json
123+
{
124+
"_id": "data-source-1",
125+
"repositories": {
126+
"repository_a": {
127+
"type": "mongo-db"
128+
}
129+
},
130+
"documentLookUp": {
131+
"123": {
132+
"lookup_id": "1234",
133+
"repository": "repository_a",
134+
"database_id": "4568",
135+
"acl": {}
136+
}
137+
}
138+
}
139+
```
140+
141+
The lookup entry has the information about where the content is stored. Inside each look up entry, there is the `repository` field, that points to one of the repositories defined in the same data source. Each repository entry contains information about how to connect to the storage medium.
142+
143+
```mermaid
144+
flowchart LR
145+
$data-source-id --> A
146+
A["data-source-1"] --> L["lookup table"]
147+
L --> B["MongoRepository"]
148+
L --> C["AzureBlobStorageRepository"]
149+
```
150+
151+
To be able to decide that a data source ID is pointing to a binary data, we want to add a field to the lookup entry, that contain what type of data type is the lookup entry for, if it's a binary or not. This is because the repository implementation has different methods for getting and saving binary data (called update_blob, delete_blob, get_blob now). Each repository has the field data_types, and that specify if the repository support other data types like blob.
152+
153+
154+
### Model contained and storage contained
155+
156+
This is when the model contains the binary data along with other type of attributes.
157+
158+
```json
159+
{
160+
"type": "Blueprint",
161+
"attributes": [
162+
{
163+
"type": "BlueprintAttribute",
164+
"name": "data",
165+
"attributeType": "binary"
166+
}
167+
]
168+
}
169+
```
170+
171+
On disk the data will contain binary content.
172+
173+
```json
174+
{
175+
"type": "Blueprint",
176+
"data": b"verylongbinarystring"
177+
}
178+
```
179+
180+
After uploading the data will also contain the binary content.
181+
182+
```json
183+
{
184+
"type": "Blueprint",
185+
"data": b"verylongbinarystring"
186+
}
187+
```
188+
189+
TODO: The binary data has to be stored inside the storage medium after upload, and for MongoDB there are several ways of [storing binary inside a document](https://sparkbyexamples.com/mongodb/store-images-in-the-mongodb-database/).
190+
191+
192+
### Endpoints
193+
194+
Given a blueprint, that contains a `data` attribute that is binary.
195+
196+
```json
197+
{
198+
"type": "Blueprint",
199+
"attributes": [
200+
{
201+
"type": "BlueprintAttribute",
202+
"name": "data",
203+
"attributeType": "binary"
204+
}
205+
]
206+
}
207+
```
208+
209+
There exist a custom `/blob/` endpoint to handle:
210+
211+
* GET - Will return the actual blob data.
212+
* Upload - Will upload blob data by given ID.
213+
214+
The `/documents/` endpoint also handle binary data, and what it returns depends on storage.
215+
216+
For storage non-contained. Here the `data` attribute is storage non-contained.
217+
* GET - Since the `data` is non-contained, a reference will be returned.
218+
* POST - Add document, together with files.
219+
* PUT - Change document, together with files.
220+
221+
For model contained. Here the `data` attribute is model contained.
222+
* GET - Since the binary data is contained, the actual data will be returned inline.
223+
* POST - Add document, together with files.
224+
* PUT - Change document, together with files.
225+
226+
Files are sent as form data to the API endpoints, and as per [FastAPI documentation](https://fastapi.tiangolo.com/tutorial/request-files/#request-files), you need to install python-multipart. Data from forms is normally encoded using the "media type" application/x-www-form-urlencoded when it doesn't include files. But when the form includes files, it is encoded as multipart/form-data. This allows forms to be submitted with files.
227+
228+
## Upload model contained and storage non-contained
229+
230+
```python
231+
document = {
232+
"type": "MyBlueprint",
233+
"data": {
234+
"type": "Reference",
235+
"address": "dmss://data-source-1/$1234",
236+
"referenceType": "Storage"
237+
}
238+
}
239+
files = [('files', open('binaryFile.something', 'rb'))]
240+
dmss_api.post(
241+
"dmss://data-source-1/root-package", data=document, files=files
242+
)
243+
```
244+
245+
TODO: Is this correct?
246+
247+
## Upload model contained and storage contained
248+
249+
```python
250+
document = {
251+
"type": "MyBlueprint",
252+
"data": b"verylongbinarystring"
253+
}
254+
dmss_api.post(
255+
"dmss://data-source-1/root-package", data=document
256+
)
257+
```
258+
259+
TODO: Is this correct?s
260+
261+
## Other considered options
262+
263+
Create a wrapper upon import by using the DM CLI:
264+
265+
```json
266+
{
267+
"_id": "1234",
268+
"type": "File",
269+
"name": "file.pdf",
270+
"author" "someone",
271+
"date": "01/01/20",
272+
"size": "1G",
273+
"filetype": "pdf",
274+
"data": {
275+
"type": "Reference",
276+
"address": "DS/$ID",
277+
"referenceType": "Storage"
278+
},
279+
"data": {
280+
"type": "AzureBlob",
281+
"address": "$ID"
282+
},
283+
"data": {
284+
"type": "DTO",
285+
"data": b"binarystuff"
286+
}
287+
}
288+
```
289+
290+
## Consequences
291+
292+
- [ ] Add to the blueprint attribute `binary` as possible attribute type
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
label: 'Architecture Decision Records'
2+
collapsed: true

0 commit comments

Comments
 (0)