Skip to content

Ocean Template for C2D with decentralized compute providers #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 106 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
4c2c1d8
Initialized ocean template
smejak May 28, 2022
3f9b2d1
init jinja files
smejak May 28, 2022
0278b14
WIP: added ocean option to run command
smejak May 28, 2022
4e90dae
WIP: removed deploy & aml parts of render
smejak May 28, 2022
c1764dc
WIP: root.jinja
smejak May 29, 2022
f9c8826
WIP: step.jinja
smejak May 29, 2022
4cf6684
WIP: added ocean to backends
smejak May 29, 2022
7ea3839
WIP: removed unused jinja code
smejak May 29, 2022
b732ac9
WIP: root.jinja ocean working template generation
smejak May 29, 2022
fa11eae
WIP: ocean template with ML dependencies
smejak May 29, 2022
5a9ad53
WIP: removed unused template code
smejak May 30, 2022
ba014a4
Added nbconvert python template
smejak May 31, 2022
7b22160
WIP: double tag template
smejak Jun 1, 2022
d4e0e03
Correct code indentation
smejak Jun 1, 2022
726c73e
Added dcgan example
smejak Jun 1, 2022
8f71439
Update README.md
smejak Jun 1, 2022
26836ec
Update README.md
smejak Jun 1, 2022
f348977
Update README.md
smejak Jun 1, 2022
df3ee79
refactor to python_ocean
smejak Jun 2, 2022
02f9071
Merge branch 'main' of https://github.com/AlgoveraAI/same-project int…
smejak Jun 2, 2022
00e1006
WIP: different python_ocean templates
smejak Jun 8, 2022
2f8b253
Merge branch 'SAME-Project:main' into main
smejak Jun 8, 2022
8c3dc75
WIP: cleaning up python_ocean
smejak Jun 8, 2022
c32d174
WIP: starting from clean slate
smejak Jun 8, 2022
6cbacf3
WIP: initialized ocean
smejak Jun 8, 2022
045d0db
Merge branch 'SAME-Project:main' into develop
smejak Jun 8, 2022
1a63967
WIP: added ocean to backends.py
smejak Jun 8, 2022
6f31c14
WIP: init render & deploy methods
smejak Jun 9, 2022
b0696f3
WIP: boilerplate ocean c2d script
smejak Jun 16, 2022
115d4b5
WIP: simplest jinja template
smejak Jun 16, 2022
5b415c9
WIP: render function (without build)
smejak Jun 16, 2022
d3c1f10
Added ocean deploy
smejak Jun 16, 2022
963c9ac
WIP: rendering encoded script from notebook
smejak Jun 16, 2022
48f5a0a
WIP: removed encoding for render
smejak Jun 16, 2022
d2a1e10
WIP: working printing deploy
smejak Jun 17, 2022
943661a
WIP: ocean c2d deploy
smejak Jun 17, 2022
9ad0dbd
WIP: added ocean runtime options
smejak Jun 23, 2022
c6103cf
Added ocean runtime options to init
smejak Jun 23, 2022
e5e4c14
WIP: changed ocean config in deploy.py
smejak Jun 23, 2022
d9d0007
WIP: deploy with config params
smejak Jun 23, 2022
67e0d98
Merge branch 'SAME-Project:main' into develop
smejak Jun 23, 2022
804dd55
WIP: debugging options
smejak Jun 23, 2022
3108875
Merge branch 'develop' of https://github.com/AlgoveraAI/same-project …
smejak Jun 23, 2022
ea17fb4
Config params working in deploy
smejak Jun 23, 2022
d457a7c
WIP: added rawcode to algorithm metadata
smejak Jun 24, 2022
e0a7064
FIX: refactored for ocean v3
smejak Jun 29, 2022
b9dd073
WIP: added ocean to conftest
smejak Jun 29, 2022
9b5bc46
WIP: added algo_url option
smejak Jun 29, 2022
d560224
WIP: added logging statement
smejak Jun 29, 2022
ad8b4d2
WIP: ocean template test
smejak Jun 29, 2022
3457073
FIX: working, modular deploy, passed test
smejak Jun 30, 2022
6b984f0
WIP: added boolean options
smejak Jun 30, 2022
00114d8
Added Ocean-SAME docs
smejak Jun 30, 2022
2a0b970
WIP: added algo-pushed requirement
smejak Jun 30, 2022
eb09d2b
WIP: correct render option
smejak Jun 30, 2022
ef1afc6
WIP: added new runtime options
smejak Jun 30, 2022
b242a8f
WIP: added runtime options
smejak Jun 30, 2022
d41f6a7
WIP: removed publishing from ocean template
Jul 20, 2022
90c288a
WIP: remove ocean_publish
smejak Jul 20, 2022
d67741e
WIP: added algo_did runtime option
smejak Jul 21, 2022
71a2c07
WIP: removed algorithm publishing
smejak Sep 14, 2022
e194eb2
WIP: removed algorithm publishing
smejak Sep 14, 2022
9dd0508
WIP: refactoring ocean deploy
smejak Oct 7, 2022
eabdee6
WIP: fixed wrong nb name
smejak Oct 9, 2022
f9f397b
WIP: refactored render
smejak Oct 9, 2022
42f1f33
WIP: refactored template for c2d
smejak Oct 9, 2022
5aab6ef
WIP: generating the correct python script at the correct location
smejak Oct 9, 2022
754664d
WIP: removed print statements
smejak Oct 9, 2022
56403a2
WIP: removing notebook after creating script
smejak Oct 9, 2022
c1fdab2
WIP: started dockerfile for operator engine
smejak Oct 9, 2022
164af5a
WIP: disabled interactivity in dockerfile
smejak Oct 9, 2022
9fc070d
WIP: removed deploy
smejak Oct 10, 2022
375aacc
WIP: refactoring ocean to aws
smejak Oct 12, 2022
102f6eb
WIP: refactoring to boto3
smejak Oct 12, 2022
b6b91bf
WIP: added create_job from operator engine
smejak Oct 12, 2022
3fa3f81
WIP: removed unused ocean deploy
smejak Oct 31, 2022
1720ebc
WIP: added ocean_c2d for same-ocean integration
smejak Oct 31, 2022
517ea1f
WIP: added python & bash scripts for ocean c2d
smejak Nov 6, 2022
e72cf48
WIP: changed ocean.sh
smejak Nov 6, 2022
7e77a40
WIP: changed render
smejak Nov 6, 2022
4079926
Update ocean.sh
smejak Nov 7, 2022
a7eb252
WIP: changed dockerfile
smejak Nov 7, 2022
590e197
WIP: updated dockerfile & bash script with nbconvert
smejak Nov 8, 2022
adc3359
WIP: updated dockerfile & bash script with nbconvert
smejak Nov 8, 2022
33b85b8
WIP: updated dockerfile & bash script with nbconvert
smejak Nov 8, 2022
31c60da
WIP: removed user input
smejak Nov 11, 2022
64cc284
removed click.prompt
smejak Nov 11, 2022
d153dd4
added click.prompt
smejak Nov 11, 2022
9ed0299
WIP: using same in bash script
smejak Nov 11, 2022
02fab25
WIP: with run
smejak Nov 15, 2022
7d126c9
WIP: back to no deploy
smejak Nov 15, 2022
783e9a9
WIP: updated ocean.sh with correct same run and nbconvert
smejak Nov 16, 2022
9dbe0bb
WIP: correct algorithm name in ocean.sh
smejak Nov 16, 2022
52b1aae
WIP: ocean.sh for 0.2
smejak Nov 17, 2022
89a6eb7
WIP: ocean.sh for 0.3
smejak Nov 17, 2022
dee1ae7
WIP: ocean.sh for 0.4
smejak Nov 17, 2022
7fcaabc
WIP: removed line 51 from root.jinja, added port 8888
smejak Nov 17, 2022
50fc69a
WIP: added empty config.yaml
smejak Nov 17, 2022
4042ab9
WIP: hardcoding host url
smejak Nov 22, 2022
291b161
WIP: hardcoding host url
smejak Nov 22, 2022
73f6fab
WIP: hardcoding host url
smejak Nov 22, 2022
0b78087
WIP: hardcoding host url
smejak Nov 22, 2022
251cfab
WIP: hardcoding host url
smejak Nov 22, 2022
0bae01b
WIP: trying BaseOp
smejak Nov 27, 2022
3c914e0
WIP: trying to mount dataset from same.yaml in init.py
smejak Nov 27, 2022
de500b3
WIP: trying different algo docker image
smejak Nov 28, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ artifacts/
__pycache__/
*.py[cod]
*$py.class
**.DS_Store

# C extensions
*.so
Expand Down
40 changes: 40 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
FROM python:3.8

# Basic toolchain
RUN apt-get update && apt-get install -y \
apt-utils \
build-essential \
git \
wget \
unzip \
yasm \
pkg-config \
libcurl4-openssl-dev \
zlib1g-dev \
htop \
cmake \
vim \
nano \
python3-pip \
python3-dev \
python3-tk \
libx264-dev \
gcc \
# python-pytest \
&& cd /usr/local/bin \
&& pip3 install --upgrade pip \
&& apt-get autoremove -y

RUN git clone -b develop https://github.com/AlgoveraAI/same-project.git

WORKDIR /same-project

ARG DEBIAN_FRONTEND=noninteractive

RUN pip3 install .

RUN python3.8 -m pip install jupyter
RUN python3.8 -m pip install nbconvert
ENV KF_PIPELINES_ENDPOINT_ENV='ml_pipeline.kubeflow.svc.cluster.local:8888'

RUN chmod +x ./ocean.sh
Empty file added config.yaml
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Developing and training AI models in the decentralized web

## Ocean Protocol and Decentralized AI

The SAME Project allows data scientists to easily turn their Jupyter notebooks into executable scripts that can automatically be sent to any compute pipeline.

Ocean Protocol builds tools for the decentralized data economy, particularly, one of the core features of Ocean Protocol is the ability to train your models on private data, called Compute-to-Data (C2D).

In C2D, the data scientist first searches the Ocean Market for data they want to traain their algorithm on. Once they found a dataset they like, they would buy access to that dataset through Ocean Protocol's data tokens, which act as tickets denoting who can access some dataset and under what conditions. The data scientist must then publish their model on the Ocean Market as well and execute a series of steps to train their algorithm on the dataset on a separate Compute Provider. More details on C2D can be found [here](https://blog.oceanprotocol.com/v2-ocean-compute-to-data-guide-9a3491034b64).

Long-story short, the Ocean C2D is a perfect fit for the SAME Project, allowing data scientists to focus more on their model development rather than learning the ins and outs of Ocean Protocol's libraries.

## SAME-Ocean Template Quickstart

This short guide assumes you've already installed the SAME Project in your local environment, [here](https://sameproject.ml/getting-started/installing/) is a guide to get you started.

While most of the Ocean deployment code is abstracted away in the SAME-Ocean template, there are some config parameters that you need to fill in to interact with the Ocean Market, in particular, you'll need a [Web3 wallet](https://metamask.io/) and a wallet private key. To ensure security, make sure to never expose your wallet private key anywhere outside your local environment. For running C2D, export your wallet private key as a local environment variable:
```
export WALLET_PRIVATE_KEY=='YOUR_PRIVATE_KEY'
```

When you're ready to run C2D, navigate to your working Jupyter notebook and in your terminal run
```
same run -t ocean
```
Note that at the end of the command, you'll have to add the options shown below. This is done by adding `--option-name=value`
### SAME-Ocean Runtime Options

* `algo-verified`: bool - specify whether algorithm was verified by the data provider for C2D
* `algo-pushed`: bool - specify whether algorithm was published to GitHub (currently required, aimed to be removed)
* `network`: str - network URL to access Ocean Market on
* `provider-address`: str - address of compute provider
* `wallet-private-key`: str - private key for paying transactions in the pipeline
* `dt-did`: str - Decentralized Identifier of the dataset (found through Ocean Market)
* `dt-pool`: str - address of the dataset liquidity pool (applicable if dataset has dynamic pricing)
* `algo-tag`: str - tag to refer to the model as
* `algo-version`: str - version number of the published model
* `algo-url`: str - GitHub URL to raw model code
* `algo-name`: str - name of model
* `author`: str - model author name
* `licence`: str - model licence
* `max-dt-price`: int - max price willing to pay for dataset (in OCEAN)


## The SAME Community

SAME is entirely open-source and non-commercial. We plan on donating it to a foundation as soon as we can identify one that matches our project's goals.

What can you do? Please join our community!

### Public web content

* [Website](https://sameproject.ml)
* [Google Group](https://groups.google.com/u/2/g/same-project)
* [Slack](https://join.slack.com/t/sameproject/shared_invite/zt-lq9rk2g6-Jyfv3AXu_qnX9LqWCmV7HA)

### Come join our repo

* [GitHub Organization](https://github.com/SAME-Project) / [GitHub Project](https://github.com/SAME-Project/same-project)
* Try it out (build instructions included)
* Complain about missing features
* EXPERTS ONLY: Add your own

Regardless, we are very open to taking your feedback. Thank you so much - onward!

-- The Co-founders of the SAME Project ([David Aronchick](https://twitter.com/aronchick) & [Luke Marsden](https://twitter.com/lmarsden))
17 changes: 17 additions & 0 deletions ocean.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash

cd /data/transformations/

mv algorithm hello.ipynb

same init

export KF_PIPELINES_ENDPOINT_ENV='ml_pipeline.kubeflow.svc.cluster.local:8888'

echo KF_PIPELINES_ENDPOINT_ENV

same run

jupyter nbconvert hello.ipynb --to python

python3.8 hello.py
210 changes: 210 additions & 0 deletions ocean_c2d/render_ocean.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
from jinja2 import Environment, FileSystemLoader, select_autoescape
from pathlib import Path
from typing import Tuple
from uuid import uuid4
from base64 import urlsafe_b64encode
import logging
import os
import time

from sameproject.data.step import Step
from sameproject.ops import helpers
import sameproject.ops.explode


from sameproject.ops.code import get_magic_lines, remove_magic_lines, get_installable_packages
from sameproject.data.config import SameConfig
from sameproject.data import Step
from typing import Tuple, List
from io import BufferedReader
from pathlib import Path
import jupytext
import logging
import click


def compile(config: SameConfig, target: str) -> Tuple[Path, str]:
notebook = read_notebook(config.notebook.path)
all_steps = get_steps(notebook, config)

return render(
target=target,
steps=all_steps,
config=config
)


def read_notebook(notebook_path) -> dict:
logging.info(f"Using notebook from here: {notebook_path}")
try:
notebook_file_handle = Path(notebook_path)
ntbk_dict = jupytext.read(str(notebook_file_handle))
except FileNotFoundError:
logging.fatal(f"No notebook found at {notebook_path}")
exit(1)

return ntbk_dict


def get_steps(notebook: dict, config: SameConfig) -> dict:
"""Parses the code in a notebook into a series of SAME execution steps."""

steps = {}
all_code = ""
code_buffer = []
this_step_index = 0
this_step_name = "same_step_000"
this_step_code = ""
this_step_cache_value = "P0D"
this_step_environment_name = "default"
this_step_tags = []

def save_step():
steps[this_step_name] = Step(
name=this_step_name,
code=remove_magic_lines(this_step_code),
index=this_step_index,
cache_value=this_step_cache_value,
environment_name=this_step_environment_name,
tags=this_step_tags,
parameters=[],
packages_to_install=[],
frozen_box=False, # TODO: make immutable
)

# Inject pip requirements file if configured:
if "requirements" in config.notebook:
with open(config.notebook.requirements, "r") as file:
steps[this_step_name].requirements_file = file.read()

for num, cell in enumerate(notebook["cells"]):
if "metadata" not in cell: # sanity check
continue

if len(cell["metadata"]) > 0 and "tags" in cell["metadata"] and len(cell["metadata"]["tags"]) > 0:
for tag in cell["metadata"]["tags"]:
if tag.startswith("same_step_"):
if num > 0: # don't create empty step
this_step_code = "\n".join(code_buffer)
all_code += "\n" + this_step_code
save_step()

code_buffer = []
step_tag_num = int(tag.split("same_step_")[1])
this_step_index = step_tag_num
this_step_name = f"same_step_{step_tag_num:03}"
this_step_code = ""
this_step_cache_value = "P0D"
this_step_environment_name = "default"
this_step_tags = []

elif str.startswith(tag, "cache="):
this_step_cache_value = str.split(tag, "=")[1]
elif str.startswith(tag, "environment="):
this_step_environment_name = str.split(tag, "=")[1]
else:
this_step_tags.append(tag)

if cell["cell_type"] == "code": # might be a markdown cell
code_buffer.append("\n".join(jupytext.cell_to_text.LightScriptCellExporter(cell, "py").source))

this_step_code = "\n".join(code_buffer)
all_code += "\n" + this_step_code
save_step()

magic_lines = get_magic_lines(all_code)
if len(magic_lines) > 0:
magic_lines_string = "\n".join(magic_lines)
logging.warning(f"""Notebook contains magic lines, which will be ignored:\n{magic_lines_string}""")

# Remove magic lines from code so that we can continue:
all_code = remove_magic_lines(all_code)

for k in steps:
steps[k].packages_to_install = get_installable_packages(all_code)

return steps


def get_sorted_list_of_steps(notebook: dict, config: SameConfig) -> list:
"""
Given a notebook (as a dict), get a list of Step objects, sorted by their
index in the notebook.
"""
steps_dict = get_steps(notebook, config)
steps = list(steps_dict.values())
steps_sorted_by_index = sorted(steps, key=lambda x: x.index)
return steps_sorted_by_index


def get_code(notebook: dict) -> str:
"""Combines and returns all python code in the given notebook."""
if "cells" not in notebook:
return ""

code = []
for cell in notebook["cells"]:
if cell["cell_type"] != "code":
continue

code.append("\n".join(
jupytext.cell_to_text.LightScriptCellExporter(cell, "py").source
))

return "\n".join(code)


ocean_step_template = "step.jinja"


def render(compile_path: str, steps: list, same_config: dict) -> Tuple[Path, str]:
"""Renders the notebook into a root file and a series of step files according to the target requirements. Returns an absolute path to the root file for deployment."""

templateDir = os.path.dirname(os.path.abspath(__file__))
templateLoader = FileSystemLoader(templateDir)
env = Environment(trim_blocks=True, loader=templateLoader)

root_file_string = _build_step_file(env, next(iter(steps.values())), same_config)
root_pipeline_name = f"root_pipeline_{uuid4().hex.lower()}"
root_path = Path(compile_path) / f"{root_pipeline_name}.py"
helpers.write_file(root_path, root_file_string)

# for storing in the docker image
docker_path = same_config['notebook']['path'][:-5] + 'py'
helpers.write_file(docker_path, root_file_string)
os.remove(same_config['notebook']['path'])
return (compile_path, root_file_string) # note: root_file_string replaced root_pipeline_name

def _build_step_file(env: Environment, step: Step, same_config) -> str:
with open(sameproject.ops.explode.__file__, "r") as f:
explode_code = f.read()

requirements_file = None
if "requirements_file" in step:
requirements_file = urlsafe_b64encode(bytes(step.requirements_file, "utf-8")).decode()

memory_limit = same_config.runtime_options.get(
"serialisation_memory_limit",
512 * 1024 * 1024, # 512MB
)

same_env = same_config.runtime_options.get(
"same_env",
"default",
)

step_contract = {
"name": step.name,
"same_env": same_env,
"memory_limit": memory_limit,
"unique_name": step.unique_name,
"requirements_file": requirements_file,
"user_code": step.code,
"explode_code": urlsafe_b64encode(bytes(explode_code, "utf-8")).decode(),
"same_yaml": urlsafe_b64encode(bytes(same_config.to_yaml(), "utf-8")).decode(),
}

return env.get_template(ocean_step_template).render(step_contract)

if __name__ == "__main__":
compile("same.yaml", os.environ("AlGO"))
2 changes: 2 additions & 0 deletions ocean_c2d/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Dependencies for /Users/jakub/Development/Algovera/Core/same-project/demo/test.ipynb:

14 changes: 14 additions & 0 deletions ocean_c2d/same.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: sameproject.ml/v1alpha1
environments:
default:
image_tag: combinatorml/jupyterlab-tensorflow-opencv:0.9
metadata:
labels: []
name: default_config
version: 0.0.0
notebook:
name: test
path: /data/transformation/notebook.ipynb
requirements: /same-project/ocean_c2d/requirements.txt
run:
name: default_config run
1 change: 1 addition & 0 deletions sameproject/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@
import sameproject.ops.aml.options
import sameproject.ops.functions.options
import sameproject.ops.kubeflow.options
import sameproject.ops.ocean.options
Loading