Description
GitHub Codespaces makes it easy to start development environments from GitHub repositories. Lots of cog.yaml
s are popping up in GitHub repositories, defining the development environment for ML models. These two things should be combined somehow to make one-click development environments for ML models!
How it works right now
If you boot up a Codespace with cog.yaml
it boots up into the default Codespaces environment. If you install Cog, Docker is available and you can run Cog commands as you might expect.
This mostly works except:
- Cog is not installed in the Codespace by default.
- Python dependencies aren't available for code completion, debugging, etc.
- You can't run a Python script in the terminal because none of the dependencies will be installed.
- You can't run a Jupyter notebook in VSCode because none of the dependencies will be installed.
Architecture
At a high level, we have a few options:
1. Run Cog inside the Codespaces environment
This is along the lines of what works at the moment. We then make a best-effort attempt to install all of the dependencies inside the default Codespaces environment.
Advantages:
- Simplicity.
cog
commands just work as you expect.
Disadvantages:
- The environment in Codespaces won't be quite right. The dependencies might not be installed exactly the same as they are inside Cog, so things will break for weird, hard-to-debug reasons. The CUDA versions might not be set up correctly so things will break in horrible CUDAry ways. If you just run
python train.py
or open a Jupyter notebook it'll work 80% of the time, but won't work 20% of the time.
2. Run Codespaces inside the Cog environment
We use cog.yaml
to build the Docker image the Codespaces environment uses.
Advantages:
- Beautiful perfect reproducible environment. Everything in the Codespaces environment is just so.
Disadvantages:
- The right hooks to build the Docker images from
cog.yaml
aren't in Codespaces, which makes this complicated. - You can't run
cog predict
,cog push
, etc in the terminal.
3. Both??
Maybe we could build the Codespaces environment using cog.yaml
, and make Cog available inside that environment? Would this work?
Strawman first step
This is all very complicated. Maybe as a simple first step we just get the Python dependencies working in the current setup we have.
We're intending to support requirements.txt
in cog.yaml
and probably make it the recommended way to define Python dependencies. The default Codespaces environment installs Python dependencies from requirements.txt
.
If both systems use requirements.txt
, this should all just work!
Strawman user journey
Given this implementation, this could work...
I am an ML researcher and I've come across an ML model on GitHub that I want to fine-tune and tinker with. Here's what I do:
- There is a
cog.yaml
in a GitHub repository - I click
Code -> Create Codespace
. - The readme tells me to download some pretrained weights:
script/download-weights
- The readme says I can run predictions by running
cog predict -i @input.jpg
. It does what I expect. - I run
cog run python finetune.py mydata/
to fine-tune the model on my own data. - When I go to edit the code in the editor, it has all the completions and useful things I expect from VSCode.
- I run
cog push r8.im/bfirsh/superbnet
to push up my fine-tuned model to Replicate.
The notable things that don't work so well:
- If I open a notebook, it mostly works, but some stuff might be janky because the dependencies are different.
- If I run
python finetune.py
without thecog run
prefix by accident it mostly works, but some stuff might not work right.
Next steps
- We implement
requirements.txt
support and make it the recommended method of defining Python dependencies. Switch frompython_packages
topython_requirements
#157 - Figure out some way of getting Cog installed in the Codespace. With another configuration? Get it in the Codespaces base image?