You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+238Lines changed: 238 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -561,3 +561,241 @@ Status: {
561
561
}
562
562
```
563
563
The example above shows the job `TUAzUGI8` is successful and output a data object `8b41e90cf9a22f5b25b8f7f3eac6102b20f0e5beaacd9bb41be6696b99af9619`.
564
+
565
+
## SDK
566
+
Developers can use the SaaS Middleware SDK to create applications to interact with a SaaS node, and also create their own SaaS adapter.
567
+
568
+
### Keystore
569
+
Before interacting with SaaS nodes, it is __required__ for the user to create a Keystore. It is used throughout the SaaS system for authentication/authorisation purposes as well for managing ownership and access rights to data objects.
570
+
571
+
The keystore contains the Identity of a user which consists of an ID, name, email and a set of key pairs (public/private pair used forcryptographic operations). Besides an identity, the keystore can also store informationin the form of assets. The contents in the keystore would then be encrypted with the help of a password provided by the user.
572
+
573
+
This package provides a module `saas.core.keystore` with a `Keystore` class to create such a keystore. Examples of assets that the keystore supports can be found in`saas.core.assets`. The way SaaS handles key pairs and cryptographic operations (encryption and hashing) can be found in`KeyPair` classes `saas.core.eckeypair.ECKeyPair` and `saas.core.rsakeypair.RSAKeyPair`.
574
+
575
+
Example of creating a Keystore:
576
+
```python
577
+
from saas.core.keystore import Keystore
578
+
from saas.core.schemas import GithubCredentials,
579
+
580
+
# Create the keystore in provided path (defaults to $HOME/.keystore)
# Add a Github credential asset (useful for accessing private Github repositories when deploying adapters)
584
+
keystore.github_credentials.update(
585
+
url,
586
+
GithubCredentials(
587
+
login=login,
588
+
personal_access_token=personal_access_token
589
+
)
590
+
)
591
+
592
+
```
593
+
594
+
### API
595
+
SaaS node uses REST as its main form of communication via a HTTP API interface.
596
+
597
+
Sending HTTP requests to SaaS nodes are abstracted into functions. These functions are grouped into classes based on the three main services SaaS node provides i.e. __NodeDB__ (Node Database), __DOR__ (Data Object Repository), __RTI__ (Runtime Infrastructure) and they can be found in`saas.nodedb.proxy`, `saas.dor.proxy`, `saas.rti.proxy` respectively.
598
+
599
+
In general, these services can be briefly described as follows:
600
+
601
+
The __NodeDB__ is a node database forstoring persistent records needed by the node to function. In this context, it exposes and provides the user with information regarding nodesin the network. Additionally, it also provides information of all Identities that are known to in network. Users will be able to upload and update their Identity here.
602
+
603
+
The Data Object Repository (__DOR__) stores and provides access to data objects. Data objects are used as inputs forjobs and are created as outputs after jobs are finished. Besides data objects used for jobs, it also stores information on how to deploy adaptersin the form of Git-Processor-Pointers (GPP). The user will be able to upload, download, modify access and retrieve information of such data objects from the nodes.
604
+
605
+
The Runtime Infrastructure (__RTI__) deploy/undeploy adapters and help execute jobs. It sets up the environment needed foradapters to execute jobs submitted by users (e.g. install dependencies, prepare data objects from DOR, store output into DOR). All of this is handled for the user automatically based on the descriptor of the adapter. The user is only required to specify the adapter (with its configuration) they want to run and ensure the input files existin the DOR. Users would also be able to retrieve the status of submitted jobs.
606
+
607
+
Example of sending requests to a node:
608
+
```python
609
+
from saas.dor.proxy import DORProxy
610
+
from saas.nodedb.proxy import NodeDBProxy
611
+
from saas.rti.proxy import RTIProxy
612
+
613
+
# Create the proxy object with the IP address of the SaaS node you want to interact with.
614
+
db_proxy = NodeDBProxy(node_address)
615
+
dor_proxy = DORProxy(node_address)
616
+
rti_proxy = RTIProxy(node_address)
617
+
618
+
# Interact with the node by calling methods from the proxy objects
619
+
node_info = db_proxy.get_node()
620
+
data_objects = dor_proxy.search()
621
+
deployed = rti_proxy.get_deployed()
622
+
```
623
+
624
+
To see the list of functions and the parameters required foreach of them, refer to the code foundin their respective modules.
625
+
626
+
A full example of a simple application using this module can be found [here](./examples/applications/example_api.py).
627
+
628
+
629
+
## SaaS Adapters
630
+
A SaaS Adapter provides a wrapper interface around an application (e.g. program or script) with a clearly defined specification of inputs/outputs and instructions on how to install/execute it. They can be then deployed using the RTI and users would be able to execute computational jobs.
631
+
632
+
### Adapter Structure
633
+
A valid adapter should follow a similar folder structure and contain these types of files (with exact file names and in the same directory) as shown below:
634
+
635
+
```
636
+
saas_processor/
637
+
├── descriptor.json
638
+
├── execute.sh
639
+
├── install.sh
640
+
├── processor.py
641
+
└── requirements.txt
642
+
```
643
+
The diagram below shows the lifecycle of a adapter and the stages where each of these files are used.
The Install script specifies how to set up the proper environment for the adapter during the deployment stage. This can include installing software, compiling binaries and downloading external dependencies. It runs every time an instance of the adapter is deployed.
668
+
669
+
Example of Install Script:
670
+
```bash
671
+
#!/bin/bash
672
+
673
+
if [ "$1"=="default" ];then
674
+
echo"Run default configuration"
675
+
python3 -m pip install -r ./requirements.txt
676
+
exit 0
677
+
678
+
elif [ "$1"=="nscc" ];then
679
+
echo"Run nscc configuration"
680
+
python3 -m pip install -r ./requirements.txt
681
+
exit 0
682
+
683
+
else
684
+
exit 1
685
+
fi
686
+
687
+
```
688
+
689
+
When the script is executed, an argument (value of chosen `configuration`) is passed to the script which specifies how the adapter should be deployed. From the example above, the adapter accepts either `default` or `nscc` as valid configurations, and runs the respective code based on the given argument.
690
+
691
+
This is also where python dependencies for the adapter can be installed using the `requirements.txt` file. This file follows the [format](https://pip.pypa.io/en/stable/reference/requirements-file-format/#requirements-file-format) that pip uses. Note that the RTI does not use this file automatically so it has to be added manually into the `install.sh` file by the user. If no python dependencies are required, this file can be omitted.
692
+
693
+
#### Adapter Descriptor (`descriptor.json`)
694
+
A Adapter Descriptor specifies the name, input/output interfaces and configurations of a adapter.
695
+
696
+
It is in the form of a JSON file and is structured as follows:
697
+
698
+
```json
699
+
{
700
+
"name": ...,
701
+
"input": [
702
+
...
703
+
],
704
+
"output": [
705
+
...
706
+
],
707
+
"configurations": [
708
+
...
709
+
]
710
+
}
711
+
```
712
+
713
+
The input/output interfaces (`input` and `output`) are lists of items that specify the input data consumed and output data produced by the adapter, respectively. This information is used before and after job execution to verify that the correct data objects are submitted and created respectively.
714
+
715
+
Structure of Input/Output Item:
716
+
```json
717
+
{
718
+
"name": ...,
719
+
"data_type": ...,
720
+
"data_format": ...
721
+
}
722
+
```
723
+
An item has a name, a data type and data format. `data_type` provides the context of how the data is used (e.g. `AHProfile` is for anthropogenic heat profile). `data_format` is how the data is formatted/encoded (e.g. `csv`).
724
+
725
+
726
+
The `configurations` property is a list of user defined strings that describes the runtime configurations supported by this adapter. They are mainly used in the `install.sh` and `execute.sh` scripts, and affects how the adapter will be deployed and executed.
The Execution script specifies how the adapter should run a given job during the execution stage. It runs every time the adapter executes a job.
760
+
761
+
Example of Execution Script:
762
+
```bash
763
+
#!/bin/bash
764
+
765
+
if [ "$1"=="default" ];then
766
+
echo"Run processor.py with default configuration on $2"
767
+
python3 processor.py $2
768
+
769
+
elif [ "$1"=="nscc" ];then
770
+
echo"Run processor.py with nscc configuration on $2"
771
+
python3 processor.py $2
772
+
773
+
else
774
+
exit 1
775
+
fi
776
+
```
777
+
778
+
When the script is executed, it is passed two arguments. First, the configuration of the adapter (same value passed to the `install.sh` script) and second, the path to the working directory. The working directory is where the inputs files the job needs will be found and where output files of the job should be written to. From the example above, the path is passed to as an argument to the `processor.py` during execution.
779
+
780
+
The `processor.py` is where most of the execution logic is written. Note that since the `execute.sh` script is a bash file that the RTI runs during execution, it could actually be used to run anything (e.g. run a compiled binary file, launch an external program, running simple bash commands, etc), and not use the `processor.py` file at all. It is mostly used as a convention forcreating SaaS adapters using python. As long as output files are created into the working directory (as provided by the second argument of the `execute.sh` script) and the required triggers are provided (explainedin the following section), the job should finish successfully.
781
+
782
+
### Adapter functions
783
+
During the execution stage, the status of the job must be communicated with the RTI so that the RTI can decide if the job has completed successfully. This is donein the form of __triggers__ which are lines sent to the system output (`stdout`) in the form `trigger:{type}:{value}`. RTI would monitor the system output of the execution script for such triggers and react according to the type of trigger received.
784
+
785
+
Currently, the RTI accepts two kinds of triggers that can be sent during execution of the job, `progress` and `output`.
786
+
787
+
The `progress` trigger is mainly used formonitoring purposes for the user as it only shows the progress of the execution scriptin the form of percentage number (e.g. `80`). The RTI does not do anything when receiving this trigger, other than forwarding its contents to its output.
788
+
789
+
Example of `progress` trigger:
790
+
```
791
+
trigger:progress:80
792
+
```
793
+
794
+
The `output` trigger is used to track the creation of output files of the adapters. This trigger is required to be present foreach output file that an adapter would produce (as statedin the Adapter Descriptor) for a job to be considered successful.
795
+
796
+
Example of `output` trigger:
797
+
```
798
+
trigger:output:c
799
+
```
800
+
801
+
These triggers can be found as helper functions in the module `saas.sdk.adapter` and can be used in the `processor.py`.
0 commit comments