Skip to content

Metaflow dev stack #144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions docs/getting-started/devstack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
import ReactPlayer from 'react-player'

# Setting Up the Dev Stack

You can start writing and running flows just by installing Metaflow locally with
`pip install metaflow`. However, its true power lies in its integration with underlying
infrastructure, which allows you to

- [run tasks in the cloud at any scale](/scaling/remote-tasks/introduction),
- [visualize and observe them in a UI](/metaflow/visualizing-results),
- [deploy them in a highly available production orchestrator](/production/introduction),
- and compose reactive systems with [event-triggered flows](/production/event-triggering).

All of these features require an infrastructure stack that needs to be configured to work
with Metaflow. In production settings, this infrastructure runs in your cloud account -
as [described on this page](/getting-started/infrastructure) - but you may want to test the
full stack first locally.

Metaflow comes with a one-click script, `metaflow-dev`, which sets up a complete
development stack for you locally on top of [Minikube](https://minikube.sigs.k8s.io/docs/),
including a local metadata service and a database, and [Metaflow UI](https://github.com/Netflix/metaflow-ui).
The stack allows you to [test scaling with `@kubernetes`](/scaling/remote-tasks/kubernetes),
[deployment on Argo Workflows](/production/scheduling-metaflow-flows/scheduling-with-argo-workflows),
as well as [event-triggering](/production/event-triggering).

## When to use `metaflow-dev`

The `metaflow-dev` stack comes in handy in a few scenarios:

1. It allows you to **test the full functionality of Metaflow** before [deploying it in your cloud account](/getting-started/infrastructure).

2. You can use it **in your CI/CD workflows to test flows** in a fully isolated, ephemeral environment.

3. If you want to **contribute extensions for Metaflow**, or make changes in the core Metaflow, the stack

Check warning on line 34 in docs/getting-started/devstack.md

View workflow job for this annotation

GitHub Actions / Run linters

Line length: Expected: 100; Actual: 106
provides you a complete development and testing environment.

## How to set up the dev stack

Setting up the stack is straightforward:

1. Install Metaflow with `pip install metaflow`.
2. Ensure that [you have Docker installed](https://docs.docker.com/desktop/).
3. Run `metaflow-dev up`.

The `metaflow-dev` command downloads and installs Minikube. After this, it uses [Tilt](https://tilt.dev/) to deploy

Check warning on line 45 in docs/getting-started/devstack.md

View workflow job for this annotation

GitHub Actions / Run linters

Line length: Expected: 100; Actual: 115
and expose [all components required by Metaflow](/internals/technical-overview) inside Minikube.

After the deployment completes, leave the shell running `metaflow-dev up` open, as it hosts necessary port

Check warning on line 48 in docs/getting-started/devstack.md

View workflow job for this annotation

GitHub Actions / Run linters

Line length: Expected: 100; Actual: 106
forwardings. On the side, open a new shell and execute
`metaflow-dev shell`. This will open a session with a Metaflow configuration pointing at the local stack.
You can now use the shell to develop, run, and deploy Metaflow flows!

You can navigate to the Tilt UI, linked in the console output, to find links to the Metaflow and Argo Workflows UIs.

Check warning on line 53 in docs/getting-started/devstack.md

View workflow job for this annotation

GitHub Actions / Run linters

Line length: Expected: 100; Actual: 116
You can find direct links to the UI in the Metaflow output as well.

### The dev stack in action

Watch this short video (no sound) for a quick setup-to-usage walkthrough:

<ReactPlayer controls url="https://www.youtube.com/watch?v=nPtqj72hfKU" />
<br/>

The video covers:

- Setting up the dev stack
- Observing the stack through the Tilt UI
- Using the stack to run and monitor runs
- Running at scale with `@kubernetes`
- Inspecting results in a notebook, accessing metadata
- Deploying to Argo Workflows
- Tearing down the stack

## Using the dev stack in a CI/CD pipeline

The dev stack is lightweight enough to run in small CI/CD worker nodes, including those provided by GitHub Actions. You

Check warning on line 75 in docs/getting-started/devstack.md

View workflow job for this annotation

GitHub Actions / Run linters

Line length: Expected: 100; Actual: 119
can use the stack to run integration tests for flows in a fully isolated, ephemeral environment.

Take a look at [this example repository](https://github.com/outerbounds/gha-metaflow/) and
[a GitHub Actions config](https://github.com/outerbounds/gha-metaflow/blob/main/.github/workflows/metaflow.yml) for

Check warning on line 79 in docs/getting-started/devstack.md

View workflow job for this annotation

GitHub Actions / Run linters

Line length: Expected: 100; Actual: 115
a template that you can easily apply in your own setup.

23 changes: 4 additions & 19 deletions docs/getting-started/infrastructure.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,10 @@

# Deploying Infrastructure for Metaflow

While you can [get started with Metaflow easily](/getting-started/install) on your
laptop, the main benefits of Metaflow lie in its ability to [scale out to external
compute clusters](/scaling/introduction) and to [deploy to production-grade workflow
orchestrators](/production/introduction). To benefit from these features, you need to
configure Metaflow and the infrastructure behind it appropriately. A separate guide,
[Metaflow Resources for Engineers](https://docs.outerbounds.com/engineering/welcome/) covers
everything related to such deployments. This page provides a quick overview.
Use [the local dev stack](/getting-started/devstack) to explore how Metaflow integrates
with underlying infrastructure. When you are ready for a production deployment, you will need
to set up infrastructure in your own cloud account, as detailed on this page. For further
information, see [Metaflow Resources for Engineers](https://docs.outerbounds.com/engineering/welcome/).

## Supported infrastructure components

Expand All @@ -16,13 +13,6 @@ Since modern data science / ML applications are powered by a number of interconn
illustrated below ([Why? See here](/introduction/why-metaflow)). You can see logos of
all supported systems which you can use to enable each layer.

Consider this illustration as a menu that allows you to build your own pizza: You get to
customize your own crust, sauce, toppings, and cheese. You can make the choices based on
your existing business infrastructure and the requirements and preferences of your
organization. Fortunately, Metaflow provides a consistent API for all these
combinations, so you can even change the choices later without having to rewrite your
flows.

<object style={{width: 700}} type="image/svg+xml"
data="/assets/infra-stack.svg"></object>

Expand Down Expand Up @@ -193,8 +183,3 @@ This stack incurs a typical maintenance overhead of an GKE-based Kubernetes clus
which shouldn't add much burden if your organization uses GKE already.


---

If you are unsure about the stacks, just run `pip install metaflow` to install the local
stack and move on to [the tutorials](/getting-started/tutorials). Flows you create will
work without changes on any of these stacks.
5 changes: 3 additions & 2 deletions docs/getting-started/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@ Sandbox](https://docs.outerbounds.com/sandbox/).

:::


Now you are ready to get your hands dirty with the [Tutorials](tutorials/).
Now you are ready to get your hands dirty with the [Tutorials](tutorials/). Or, if you want
to take a step further and test the full power of Metaflow, you can [easily setup a
Minikube-based dev stack](/getting-started/devstack) locally.

## Upgrading Metaflow

Expand Down
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Metaflow makes it easy to build and manage real-life data science, AI, and ML pr
## Getting Started

- [Installing Metaflow locally](getting-started/install)
- [Setting Up the Dev Stack](getting-started/devstack) ✨*New*✨
- [Deploying Infrastructure for Metaflow](getting-started/infrastructure)
- [Quickstart Tutorial](getting-started/tutorials/)

Expand Down
1 change: 1 addition & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ const sidebars = {
label: "Getting Started",
items: [
"getting-started/install",
"getting-started/devstack",
"getting-started/infrastructure",
{
type: "category",
Expand Down
Loading