Skip to content

Cog + Codespaces #650

Open
Open
@bfirsh

Description

@bfirsh

GitHub Codespaces makes it easy to start development environments from GitHub repositories. Lots of cog.yamls are popping up in GitHub repositories, defining the development environment for ML models. These two things should be combined somehow to make one-click development environments for ML models!

How it works right now

If you boot up a Codespace with cog.yaml it boots up into the default Codespaces environment. If you install Cog, Docker is available and you can run Cog commands as you might expect.

This mostly works except:

  • Cog is not installed in the Codespace by default.
  • Python dependencies aren't available for code completion, debugging, etc.
  • You can't run a Python script in the terminal because none of the dependencies will be installed.
  • You can't run a Jupyter notebook in VSCode because none of the dependencies will be installed.

Architecture

At a high level, we have a few options:

1. Run Cog inside the Codespaces environment

This is along the lines of what works at the moment. We then make a best-effort attempt to install all of the dependencies inside the default Codespaces environment.

Advantages:

  • Simplicity.
  • cog commands just work as you expect.

Disadvantages:

  • The environment in Codespaces won't be quite right. The dependencies might not be installed exactly the same as they are inside Cog, so things will break for weird, hard-to-debug reasons. The CUDA versions might not be set up correctly so things will break in horrible CUDAry ways. If you just run python train.py or open a Jupyter notebook it'll work 80% of the time, but won't work 20% of the time.

2. Run Codespaces inside the Cog environment

We use cog.yaml to build the Docker image the Codespaces environment uses.

Advantages:

  • Beautiful perfect reproducible environment. Everything in the Codespaces environment is just so.

Disadvantages:

  • The right hooks to build the Docker images from cog.yaml aren't in Codespaces, which makes this complicated.
  • You can't run cog predict, cog push, etc in the terminal.

3. Both??

Maybe we could build the Codespaces environment using cog.yaml, and make Cog available inside that environment? Would this work?

Strawman first step

This is all very complicated. Maybe as a simple first step we just get the Python dependencies working in the current setup we have.

We're intending to support requirements.txt in cog.yaml and probably make it the recommended way to define Python dependencies. The default Codespaces environment installs Python dependencies from requirements.txt.

If both systems use requirements.txt, this should all just work!

Strawman user journey

Given this implementation, this could work...

I am an ML researcher and I've come across an ML model on GitHub that I want to fine-tune and tinker with. Here's what I do:

  • There is a cog.yaml in a GitHub repository
  • I click Code -> Create Codespace.
  • The readme tells me to download some pretrained weights: script/download-weights
  • The readme says I can run predictions by running cog predict -i @input.jpg. It does what I expect.
  • I run cog run python finetune.py mydata/ to fine-tune the model on my own data.
  • When I go to edit the code in the editor, it has all the completions and useful things I expect from VSCode.
  • I run cog push r8.im/bfirsh/superbnet to push up my fine-tuned model to Replicate.

The notable things that don't work so well:

  • If I open a notebook, it mostly works, but some stuff might be janky because the dependencies are different.
  • If I run python finetune.py without the cog run prefix by accident it mostly works, but some stuff might not work right.

Next steps

  • We implement requirements.txt support and make it the recommended method of defining Python dependencies. Switch from python_packages to python_requirements #157
  • Figure out some way of getting Cog installed in the Codespace. With another configuration? Get it in the Codespaces base image?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions