DOC-775 Address feedback from new docs rollout (dagster-io#27629)

neverett · LoHertel · commit 593691ba8e3a · 2025-02-11T05:52:03.000+01:00
## Summary &amp; Motivation

* Re-add "Migrating from Step Launchers to Dagster Pipes"
* Move Dagster Pipes + AWS docs to /aws subsection
* Update "Dagster Cloud" to "Dagster+"

## How I Tested These Changes

Local build.

## Changelog

&gt; Insert changelog entry or delete this section.

---------

Signed-off-by: nikki everett &lt;nikki@dagsterlabs.com&gt;
diff --git a/docs/docs/guides/build/external-pipelines/aws/aws-ecs-pipeline.md b/docs/docs/guides/build/external-pipelines/aws/aws-ecs-pipeline.md
@@ -1,7 +1,7 @@
 ---
 title: Build pipelines with AWS ECS
 description: "Learn to integrate Dagster Pipes with AWS ECS to launch external code from Dagster assets."
-sidebar_position: 200
+sidebar_position: 100
 ---
 
 This article covers how to use [Dagster Pipes](/guides/build/external-pipelines/) with [AWS ECS](https://aws.amazon.com/ecs/).
@@ -52,7 +52,7 @@ Call `open_dagster_pipes` in the ECS task script to create a context that can be
 
 :::tip
 
-The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
+The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](/guides/build/external-pipelines/using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
 
 :::
 
diff --git a/docs/docs/guides/build/external-pipelines/aws/aws-emr-containers-pipeline.md b/docs/docs/guides/build/external-pipelines/aws/aws-emr-containers-pipeline.md
@@ -1,10 +1,10 @@
 ---
 title: "Build pipelines with AWS EMR on EKS"
 description: "Learn to integrate Dagster Pipes with AWS EMR Containers to launch external code from Dagster assets."
-sidebar_position: 300
+sidebar_position: 200
 ---
 
-import Preview from '../../../partials/\_Preview.md';
+import Preview from '../../../../partials/\_Preview.md';
 
 <Preview />
 
diff --git a/docs/docs/guides/build/external-pipelines/aws/aws-emr-pipeline.md b/docs/docs/guides/build/external-pipelines/aws/aws-emr-pipeline.md
@@ -94,7 +94,7 @@ Call `open_dagster_pipes` in the EMR script to create a context that can be used
 
 :::tip
 
-The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
+The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](/guides/build/external-pipelines/using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
 
 :::
 
diff --git a/docs/docs/guides/build/external-pipelines/aws/aws-emr-serverless-pipeline.md b/docs/docs/guides/build/external-pipelines/aws/aws-emr-serverless-pipeline.md
@@ -1,7 +1,7 @@
 ---
 title: Build pipelines with AWS EMR Serverless
 description: "Learn to integrate Dagster Pipes with AWS EMR Serverless to launch external code from Dagster assets."
-sidebar_position: 300
+sidebar_position: 400
 ---
 
 This article covers how to use [Dagster Pipes](/guides/build/external-pipelines/) with [AWS EMR Serverless](https://aws.amazon.com/emr-serverless/).
@@ -67,7 +67,7 @@ Call `open_dagster_pipes` in the EMR Serverless script to create a context that
 
 :::tip
 
-The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
+The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](/guides/build/external-pipelines/using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
 
 :::
 
diff --git a/docs/docs/guides/build/external-pipelines/aws/aws-glue-pipeline.md b/docs/docs/guides/build/external-pipelines/aws/aws-glue-pipeline.md
@@ -1,7 +1,7 @@
 ---
 title: Build pipelines with AWS Glue
 description: "Learn to integrate Dagster Pipes with AWS Glue to launch external code from Dagster assets."
-sidebar_position: 400
+sidebar_position: 500
 ---
 
 # AWS Glue & Dagster Pipes
@@ -44,7 +44,7 @@ Call `open_dagster_pipes` in the Glue job script to create a context that can be
 
 :::tip
 
-The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
+The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](/guides/build/external-pipelines/using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
 
 :::
 
diff --git a/docs/docs/guides/build/external-pipelines/aws/aws-lambda-pipeline.md b/docs/docs/guides/build/external-pipelines/aws/aws-lambda-pipeline.md
@@ -1,7 +1,7 @@
 ---
 title: Build pipelines with AWS Lambda
 description: "Learn to integrate Dagster Pipes with AWS Lambda to launch external code from Dagster assets."
-sidebar_position: 500
+sidebar_position: 600
 ---
 
 :::note
@@ -92,7 +92,7 @@ In this step, you'll add the code you want to execute to the function. Create an
 
 :::tip
 
-The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
+The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](/guides/build/external-pipelines/using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
 
 :::
 
diff --git a/docs/docs/guides/build/external-pipelines/aws/index.md b/docs/docs/guides/build/external-pipelines/aws/index.md
@@ -0,0 +1,8 @@
+---
+title: "Build pipelines with AWS"
+sidebar_position: 30
+---
+
+import DocCardList from '@theme/DocCardList';
+
+<DocCardList />
diff --git a/docs/docs/guides/build/external-pipelines/dagster-pipes-details-and-customization.md b/docs/docs/guides/build/external-pipelines/dagster-pipes-details-and-customization.md
@@ -1,7 +1,7 @@
 ---
 title: "Dagster Pipes details and customization"
 description: "Learn about Dagster Pipes APIs and how to compose them to create a custom solution for your data platform."
-sidebar_position: 1000
+sidebar_position: 90
 ---
 
 [Dagster Pipes](/guides/build/external-pipelines) is a toolkit for integrating Dagster with an arbitrary external compute environment. While many users will be well-served by the simplified interface offered by Pipes client objects (e.g. <PyObject section="pipes" module="dagster" object="PipesSubprocessClient" />, <PyObject section="libraries" object="PipesDatabricksClient" module="dagster_databricks"/>), others will need a greater level of control over Pipes. This is particularly the case for users seeking to connect large existing codebases to Dagster.
diff --git a/docs/docs/guides/build/external-pipelines/databricks-pipeline.md b/docs/docs/guides/build/external-pipelines/databricks-pipeline.md
@@ -1,7 +1,7 @@
 ---
 title: Build pipelines with Databricks
 description: "Learn to integrate Dagster Pipes with Databricks to launch external code from Dagster assets."
-sidebar_position: 600
+sidebar_position: 50
 ---
 
 This article covers how to use [Dagster Pipes](/guides/build/external-pipelines/) with Dagster's [Databricks integration](/integrations/libraries/databricks) to launch Databricks jobs.
diff --git a/docs/docs/guides/build/external-pipelines/gcp-dataproc-pipeline.md b/docs/docs/guides/build/external-pipelines/gcp-dataproc-pipeline.md
@@ -1,7 +1,7 @@
 ---
 title: Build pipelines with GCP Dataproc
 description: "Learn to integrate Dagster Pipes with GCP Dataproc to launch external code from Dagster assets."
-sidebar_position: 300
+sidebar_position: 40
 ---
 
 This article covers how to use [Dagster Pipes](/guides/build/external-pipelines/) to [submit jobs](https://cloud.google.com/dataproc/docs/guides/submit-job) to [GCP Dataproc](https://cloud.google.com/dataproc).
diff --git a/docs/docs/guides/build/external-pipelines/index.md b/docs/docs/guides/build/external-pipelines/index.md
@@ -1,5 +1,5 @@
 ---
-title: External pipelines
+title: External pipelines (Dagster Pipes)
 sidebar_position: 60
 ---
 
diff --git a/docs/docs/guides/build/external-pipelines/javascript-pipeline.md b/docs/docs/guides/build/external-pipelines/javascript-pipeline.md
@@ -1,6 +1,6 @@
 ---
 title: "Build pipelines in JavaScript"
-sidebar_position: 100
+sidebar_position: 20
 ---
 
 This guide covers how to run JavaScript with Dagster using Pipes, however, the same principle will apply to other languages.
diff --git a/docs/docs/guides/build/external-pipelines/kubernetes-pipeline.md b/docs/docs/guides/build/external-pipelines/kubernetes-pipeline.md
@@ -1,7 +1,7 @@
 ---
 title: Build pipelines with Kubernetes
 description: "Learn to integrate Dagster Pipes with Kubernetes to launch external code from Dagster assets."
-sidebar_position: 700
+sidebar_position: 60
 ---
 
 :::note
diff --git a/docs/docs/guides/build/external-pipelines/migrating-from-step-launchers-to-pipes.md b/docs/docs/guides/build/external-pipelines/migrating-from-step-launchers-to-pipes.md
@@ -0,0 +1,94 @@
+---
+title: "Migrating from Spark Step Launchers to Dagster Pipes"
+description: "Learn how to migrate from Spark step launchers to Dagster Pipes."
+sidebar_position: 80
+---
+
+In this guide, we’ll show you how to migrate from using step launchers to using [Dagster Pipes](index.md) in Dagster.
+
+While step launchers were intended to support various runtime environments, in practice, they have only been implemented for Spark. Therefore, we will focus on Spark-related examples.
+
+## Considerations
+
+When deciding to migrate from step launchers to Dagster Pipes, consider the following:
+
+- **Step launchers** are superceded by Dagster Pipes. While they are still available (and there are no plans for their removal), they are no longer the recommended method for launching external code from Dagster ops and assets. They won't be receiving new features or be under active development.
+- **Dagster Pipes** is a more lightweight and flexible framework, but it does come with a few drawbacks:
+* Spark runtime and the code executed will no longer be managed by Dagster for you.
+* Dagster Pipes are not compatible with Resources and IO Managers. If you are heavily relying on these features, you might want to keep using step launchers.
+
+## Steps
+
+To migrate from step launchers to Dagster Pipes, you will have to perform the following steps.
+
+### **1. Implement new CI/CD pipelines to prepare your Spark runtime environment**
+
+Alternatively, this can be done from Dagster jobs, but either way, you will need to manage the Spark runtime yourself.
+
+When running PySpark jobs, the following changes to Python dependencies should be considered:
+
+- drop `dagster`
+- add `dagster-pipes`
+
+You can learn more about packaging Python dependencies for PySpark in [PySpark documentation](https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html#python-package-management) or in the [AWS EMR Pipes](/guides/build/external-pipelines/aws/aws-emr-pipeline) guide.
+
+The process of packaging the Python dependencies and scripts should be automated with a CI/CD pipeline and run before deploying the Dagster code location.
+
+It's also possible to run Java or Scala Spark jobs with Dagster Pipes, but currently there is no official Pipes implementation for these languages. Therefore, forwarding Dagster events from these jobs is not yet supported officially (although it can be done with some custom code).
+
+### **2. Update your Dagster code**
+
+The goal is to keep the same observability and orchestration features while moving compute to an external script. Suppose you have existing code using step launchers similar to this:
+
+<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/old_code.py" />
+
+The corresponding Pipes code will instead have two components: the Dagster asset definition, and the external PySpark job.
+
+Let's start with the PySpark job. The upstream asset will invoke the following script:
+
+<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/upstream_asset_script.py" />
+
+Now, we have to run this script from Dagster. First, let's factor the boilerplate EMR config into a reusable function:
+
+<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/utils.py" startAfter="start_emr_config_marker" endBefore="end_emr_config_marker" />
+
+Now, the asset body will be as follows:
+
+<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/new_code.py" endBefore="after_upstream_marker" />
+
+Since the asset now returns the Parquet file path, it will be saved by the `IOManager`, and the downstream asset will be able to access it.
+
+Let's continue to migrating the second `downstream` asset.
+
+Since we can't use IO Managers in scripts launched by Pipes, we would have to either make a CLI argument parser or use the handy `extras` feature provided by Pipes in order to pass this `"path"` value to the job. We will demonstrate the latter approach. The `downstream` asset turns into:
+
+<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/new_code.py" startAfter="after_upstream_marker" endBefore="after_downstream_marker" />
+
+Now, let's access the `path` value in the PySpark job:
+
+<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/downstream_asset_script.py" />
+
+Finally, provide the required resources to `Definitions`:
+
+<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/new_code.py" startAfter="after_downstream_marker" />
+
+# Conclusion
+
+In this guide, we have demonstrated how to migrate from using step launchers to using Dagster Pipes. We have shown how to launch PySpark jobs on AWS EMR using `PipesEMRClient` and how to pass small pieces of data between assets using Dagster's metadata and Pipes extras.
+
+# Supplementary
+
+- [Dagster Pipes](index.md)
+- [GitHub discussion](https://github.com/dagster-io/dagster/discussions/25685) on the topic
+- [Dagster + Spark](/integrations/libraries/spark) - an up to date list of Pipes Clients for various Spark providers can be found here
+- [AWS EMR Pipes tutorial](/guides/build/external-pipelines/aws/aws-emr-pipeline)
+- [PipesEMRClient API docs](/api/python-api/libraries/dagster-aws#dagster_aws.pipes.PipesEMRClient)
+
+:::note
+
+**Heads up!** As an alternative to storing paths with an `IOManager`, the following utility function can be used to retrieve logged metadata values from upstream assets:
+
+<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/utils.py" startAfter="start_metadata_marker" endBefore="end_metadata_marker" />
+
+:::
+
diff --git a/docs/docs/guides/build/external-pipelines/pyspark-pipeline.md b/docs/docs/guides/build/external-pipelines/pyspark-pipeline.md
@@ -1,7 +1,7 @@
 ---
 title: "Build pipelines with PySpark"
 description: "Learn to integrate Dagster Pipes with PySpark to orchestrate PySpark jobs in a Dagster pipeline."
-sidebar_position: 900
+sidebar_position: 70
 ---
 
 This tutorial is focused on using Dagster Pipes to launch & monitor general PySpark jobs. The [Spark integration page](/integrations/libraries/spark) provides more information on using Pipes with specific Spark providers, such as AWS EMR or Databricks.
diff --git a/docs/docs/guides/deploy/code-locations/workspace-yaml.md b/docs/docs/guides/deploy/code-locations/workspace-yaml.md
@@ -4,7 +4,9 @@ sidebar_position: 200
 ---
 
 :::info
-    This reference is only applicable to Dagster OSS. For Dagster Cloud see [the Dagster Cloud Code Locations documentation](/dagster-plus/deployment/code-locations)
+
+This reference is only applicable to Dagster OSS. For Dagster+, see [the Dagster+ code locations documentation](/dagster-plus/deployment/code-locations).
+
 :::
 
 The `workspace.yaml` file is used to configure code locations in Dagster. It tells Dagster where to find your code and how to load it.
diff --git a/docs/docs/integrations/libraries/spark.md b/docs/docs/integrations/libraries/spark.md
@@ -36,10 +36,10 @@ With Pipes, the code inside the asset or op definition submits a Spark job to an
 You can either use one of the available Pipes Clients or make your own. The available Pipes Clients for popular Spark providers are:
 
 - [Databricks](/guides/build/external-pipelines/databricks-pipeline)
-- [AWS Glue](/guides/build/external-pipelines/aws-glue-pipeline)
-- [AWS EMR](/guides/build/external-pipelines/aws-emr-pipeline)
-- [AWS EMR on EKS](/guides/build/external-pipelines/aws-emr-containers-pipeline)
-- [AWS EMR Serverless](/guides/build/external-pipelines/aws-emr-serverless-pipeline)
+- [AWS Glue](/guides/build/external-pipelines/aws/aws-glue-pipeline)
+- [AWS EMR](/guides/build/external-pipelines/aws/aws-emr-pipeline)
+- [AWS EMR on EKS](/guides/build/external-pipelines/aws/aws-emr-containers-pipeline)
+- [AWS EMR Serverless](/guides/build/external-pipelines/aws/aws-emr-serverless-pipeline)
 
 Existing Spark jobs can be used with Pipes without any modifications. In this case, Dagster will be receiving logs from the job, but not events like asset checks or attached metadata.
 
diff --git a/docs/vercel.json b/docs/vercel.json
@@ -777,32 +777,62 @@
     },
     {
       "source": "/concepts/dagster-pipes/aws-ecs",
-      "destination": "/guides/build/external-pipelines/aws-ecs-pipeline",
+      "destination": "/guides/build/external-pipelines/aws/aws-ecs-pipeline",
       "permanent": false
     },
     {
       "source": "/concepts/dagster-pipes/aws-glue",
-      "destination": "/guides/build/external-pipelines/aws-glue-pipeline",
+      "destination": "/guides/build/external-pipelines/aws/aws-glue-pipeline",
       "permanent": false
     },
     {
       "source": "/concepts/dagster-pipes/aws-emr",
-      "destination": "/guides/build/external-pipelines/aws-emr-pipeline",
+      "destination": "/guides/build/external-pipelines/aws/aws-emr-pipeline",
       "permanent": false
     },
     {
       "source": "/concepts/dagster-pipes/aws-emr-containers",
-      "destination": "guides/build/external-pipelines/aws-emr-pipeline",
+      "destination": "/guides/build/external-pipelines/aws/aws-emr-containers-pipeline",
       "permanent": false
     },
     {
       "source": "/concepts/dagster-pipes/aws-emr-serverless",
-      "destination": "/guides/build/external-pipelines/aws-emr-serverless-pipeline",
+      "destination": "/guides/build/external-pipelines/aws/aws-emr-serverless-pipeline",
       "permanent": false
     },
     {
       "source": "/concepts/dagster-pipes/aws-lambda",
-      "destination": "/guides/build/external-pipelines/aws-lambda-pipeline",
+      "destination": "/guides/build/external-pipelines/aws/aws-lambda-pipeline",
+      "permanent": false
+    },
+    {
+      "source": "/guides/build/external-pipelines/aws-ecs-pipeline",
+      "destination": "/guides/build/external-pipelines/aws/aws-ecs-pipeline",
+      "permanent": false
+    },
+    {
+      "source": "/guides/build/external-pipelines/aws-glue-pipeline",
+      "destination": "/guides/build/external-pipelines/aws/aws-glue-pipeline",
+      "permanent": false
+    },
+    {
+      "source": "/guides/build/external-pipelines/aws-emr-pipeline",
+      "destination": "/guides/build/external-pipelines/aws/aws-emr-pipeline",
+      "permanent": false
+    },
+    {
+      "source": "/guides/build/external-pipelines/aws/aws-emr-containers-pipeline",
+      "destination": "/guides/build/external-pipelines/aws/aws-emr-containers-pipeline",
+      "permanent": false
+    },
+    {
+      "source": "/guides/build/external-pipelines/aws-emr-serverless-pipeline",
+      "destination": "/guides/build/external-pipelines/aws/aws-emr-serverless-pipeline",
+      "permanent": false
+    },
+    {
+      "source": "/guides/build/external-pipelines/aws-lambda-pipeline",
+      "destination": "/guides/build/external-pipelines/aws/aws-lambda-pipeline",
       "permanent": false
     },
     {
@@ -817,7 +847,7 @@
     },
     {
       "source": "/guides/migrations/from-step-launchers-to-pipes",
-      "destination": "/guides/build/external-pipelines/",
+      "destination": "/guides/build/external-pipelines/migrating-from-step-launchers-to-pipes",
       "permanent": false
     },
     {