Skip to content

Commit 593691b

Browse files
neverettLoHertel
authored andcommitted
DOC-775 Address feedback from new docs rollout (dagster-io#27629)
## Summary & Motivation * Re-add "Migrating from Step Launchers to Dagster Pipes" * Move Dagster Pipes + AWS docs to /aws subsection * Update "Dagster Cloud" to "Dagster+" ## How I Tested These Changes Local build. ## Changelog > Insert changelog entry or delete this section. --------- Signed-off-by: nikki everett <[email protected]>
1 parent a4185dc commit 593691b

18 files changed

+164
-30
lines changed

Diff for: docs/docs/guides/build/external-pipelines/aws-ecs-pipeline.md renamed to docs/docs/guides/build/external-pipelines/aws/aws-ecs-pipeline.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Build pipelines with AWS ECS
33
description: "Learn to integrate Dagster Pipes with AWS ECS to launch external code from Dagster assets."
4-
sidebar_position: 200
4+
sidebar_position: 100
55
---
66

77
This article covers how to use [Dagster Pipes](/guides/build/external-pipelines/) with [AWS ECS](https://aws.amazon.com/ecs/).
@@ -52,7 +52,7 @@ Call `open_dagster_pipes` in the ECS task script to create a context that can be
5252

5353
:::tip
5454

55-
The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
55+
The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](/guides/build/external-pipelines/using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
5656

5757
:::
5858

Diff for: docs/docs/guides/build/external-pipelines/aws-emr-containers-pipeline.md renamed to docs/docs/guides/build/external-pipelines/aws/aws-emr-containers-pipeline.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
22
title: "Build pipelines with AWS EMR on EKS"
33
description: "Learn to integrate Dagster Pipes with AWS EMR Containers to launch external code from Dagster assets."
4-
sidebar_position: 300
4+
sidebar_position: 200
55
---
66

7-
import Preview from '../../../partials/\_Preview.md';
7+
import Preview from '../../../../partials/\_Preview.md';
88

99
<Preview />
1010

Diff for: docs/docs/guides/build/external-pipelines/aws-emr-pipeline.md renamed to docs/docs/guides/build/external-pipelines/aws/aws-emr-pipeline.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ Call `open_dagster_pipes` in the EMR script to create a context that can be used
9494

9595
:::tip
9696

97-
The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
97+
The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](/guides/build/external-pipelines/using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
9898

9999
:::
100100

Diff for: docs/docs/guides/build/external-pipelines/aws-emr-serverless-pipeline.md renamed to docs/docs/guides/build/external-pipelines/aws/aws-emr-serverless-pipeline.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Build pipelines with AWS EMR Serverless
33
description: "Learn to integrate Dagster Pipes with AWS EMR Serverless to launch external code from Dagster assets."
4-
sidebar_position: 300
4+
sidebar_position: 400
55
---
66

77
This article covers how to use [Dagster Pipes](/guides/build/external-pipelines/) with [AWS EMR Serverless](https://aws.amazon.com/emr-serverless/).
@@ -67,7 +67,7 @@ Call `open_dagster_pipes` in the EMR Serverless script to create a context that
6767

6868
:::tip
6969

70-
The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
70+
The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](/guides/build/external-pipelines/using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
7171

7272
:::
7373

Diff for: docs/docs/guides/build/external-pipelines/aws-glue-pipeline.md renamed to docs/docs/guides/build/external-pipelines/aws/aws-glue-pipeline.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Build pipelines with AWS Glue
33
description: "Learn to integrate Dagster Pipes with AWS Glue to launch external code from Dagster assets."
4-
sidebar_position: 400
4+
sidebar_position: 500
55
---
66

77
# AWS Glue & Dagster Pipes
@@ -44,7 +44,7 @@ Call `open_dagster_pipes` in the Glue job script to create a context that can be
4444

4545
:::tip
4646

47-
The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
47+
The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](/guides/build/external-pipelines/using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
4848

4949
:::
5050

Diff for: docs/docs/guides/build/external-pipelines/aws-lambda-pipeline.md renamed to docs/docs/guides/build/external-pipelines/aws/aws-lambda-pipeline.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Build pipelines with AWS Lambda
33
description: "Learn to integrate Dagster Pipes with AWS Lambda to launch external code from Dagster assets."
4-
sidebar_position: 500
4+
sidebar_position: 600
55
---
66

77
:::note
@@ -92,7 +92,7 @@ In this step, you'll add the code you want to execute to the function. Create an
9292

9393
:::tip
9494

95-
The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
95+
The metadata format shown above (`{"raw_value": value, "type": type}`) is part of Dagster Pipes' special syntax for specifying rich Dagster metadata. For a complete reference of all supported metadata types and their formats, see the [Dagster Pipes metadata reference](/guides/build/external-pipelines/using-dagster-pipes/reference#passing-rich-metadata-to-dagster).
9696

9797
:::
9898

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
title: "Build pipelines with AWS"
3+
sidebar_position: 30
4+
---
5+
6+
import DocCardList from '@theme/DocCardList';
7+
8+
<DocCardList />

Diff for: docs/docs/guides/build/external-pipelines/dagster-pipes-details-and-customization.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Dagster Pipes details and customization"
33
description: "Learn about Dagster Pipes APIs and how to compose them to create a custom solution for your data platform."
4-
sidebar_position: 1000
4+
sidebar_position: 90
55
---
66

77
[Dagster Pipes](/guides/build/external-pipelines) is a toolkit for integrating Dagster with an arbitrary external compute environment. While many users will be well-served by the simplified interface offered by Pipes client objects (e.g. <PyObject section="pipes" module="dagster" object="PipesSubprocessClient" />, <PyObject section="libraries" object="PipesDatabricksClient" module="dagster_databricks"/>), others will need a greater level of control over Pipes. This is particularly the case for users seeking to connect large existing codebases to Dagster.

Diff for: docs/docs/guides/build/external-pipelines/databricks-pipeline.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Build pipelines with Databricks
33
description: "Learn to integrate Dagster Pipes with Databricks to launch external code from Dagster assets."
4-
sidebar_position: 600
4+
sidebar_position: 50
55
---
66

77
This article covers how to use [Dagster Pipes](/guides/build/external-pipelines/) with Dagster's [Databricks integration](/integrations/libraries/databricks) to launch Databricks jobs.

Diff for: docs/docs/guides/build/external-pipelines/gcp-dataproc-pipeline.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Build pipelines with GCP Dataproc
33
description: "Learn to integrate Dagster Pipes with GCP Dataproc to launch external code from Dagster assets."
4-
sidebar_position: 300
4+
sidebar_position: 40
55
---
66

77
This article covers how to use [Dagster Pipes](/guides/build/external-pipelines/) to [submit jobs](https://cloud.google.com/dataproc/docs/guides/submit-job) to [GCP Dataproc](https://cloud.google.com/dataproc).

Diff for: docs/docs/guides/build/external-pipelines/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: External pipelines
2+
title: External pipelines (Dagster Pipes)
33
sidebar_position: 60
44
---
55

Diff for: docs/docs/guides/build/external-pipelines/javascript-pipeline.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "Build pipelines in JavaScript"
3-
sidebar_position: 100
3+
sidebar_position: 20
44
---
55

66
This guide covers how to run JavaScript with Dagster using Pipes, however, the same principle will apply to other languages.

Diff for: docs/docs/guides/build/external-pipelines/kubernetes-pipeline.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Build pipelines with Kubernetes
33
description: "Learn to integrate Dagster Pipes with Kubernetes to launch external code from Dagster assets."
4-
sidebar_position: 700
4+
sidebar_position: 60
55
---
66

77
:::note
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
---
2+
title: "Migrating from Spark Step Launchers to Dagster Pipes"
3+
description: "Learn how to migrate from Spark step launchers to Dagster Pipes."
4+
sidebar_position: 80
5+
---
6+
7+
In this guide, we’ll show you how to migrate from using step launchers to using [Dagster Pipes](index.md) in Dagster.
8+
9+
While step launchers were intended to support various runtime environments, in practice, they have only been implemented for Spark. Therefore, we will focus on Spark-related examples.
10+
11+
## Considerations
12+
13+
When deciding to migrate from step launchers to Dagster Pipes, consider the following:
14+
15+
- **Step launchers** are superceded by Dagster Pipes. While they are still available (and there are no plans for their removal), they are no longer the recommended method for launching external code from Dagster ops and assets. They won't be receiving new features or be under active development.
16+
- **Dagster Pipes** is a more lightweight and flexible framework, but it does come with a few drawbacks:
17+
* Spark runtime and the code executed will no longer be managed by Dagster for you.
18+
* Dagster Pipes are not compatible with Resources and IO Managers. If you are heavily relying on these features, you might want to keep using step launchers.
19+
20+
## Steps
21+
22+
To migrate from step launchers to Dagster Pipes, you will have to perform the following steps.
23+
24+
### **1. Implement new CI/CD pipelines to prepare your Spark runtime environment**
25+
26+
Alternatively, this can be done from Dagster jobs, but either way, you will need to manage the Spark runtime yourself.
27+
28+
When running PySpark jobs, the following changes to Python dependencies should be considered:
29+
30+
- drop `dagster`
31+
- add `dagster-pipes`
32+
33+
You can learn more about packaging Python dependencies for PySpark in [PySpark documentation](https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html#python-package-management) or in the [AWS EMR Pipes](/guides/build/external-pipelines/aws/aws-emr-pipeline) guide.
34+
35+
The process of packaging the Python dependencies and scripts should be automated with a CI/CD pipeline and run before deploying the Dagster code location.
36+
37+
It's also possible to run Java or Scala Spark jobs with Dagster Pipes, but currently there is no official Pipes implementation for these languages. Therefore, forwarding Dagster events from these jobs is not yet supported officially (although it can be done with some custom code).
38+
39+
### **2. Update your Dagster code**
40+
41+
The goal is to keep the same observability and orchestration features while moving compute to an external script. Suppose you have existing code using step launchers similar to this:
42+
43+
<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/old_code.py" />
44+
45+
The corresponding Pipes code will instead have two components: the Dagster asset definition, and the external PySpark job.
46+
47+
Let's start with the PySpark job. The upstream asset will invoke the following script:
48+
49+
<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/upstream_asset_script.py" />
50+
51+
Now, we have to run this script from Dagster. First, let's factor the boilerplate EMR config into a reusable function:
52+
53+
<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/utils.py" startAfter="start_emr_config_marker" endBefore="end_emr_config_marker" />
54+
55+
Now, the asset body will be as follows:
56+
57+
<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/new_code.py" endBefore="after_upstream_marker" />
58+
59+
Since the asset now returns the Parquet file path, it will be saved by the `IOManager`, and the downstream asset will be able to access it.
60+
61+
Let's continue to migrating the second `downstream` asset.
62+
63+
Since we can't use IO Managers in scripts launched by Pipes, we would have to either make a CLI argument parser or use the handy `extras` feature provided by Pipes in order to pass this `"path"` value to the job. We will demonstrate the latter approach. The `downstream` asset turns into:
64+
65+
<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/new_code.py" startAfter="after_upstream_marker" endBefore="after_downstream_marker" />
66+
67+
Now, let's access the `path` value in the PySpark job:
68+
69+
<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/downstream_asset_script.py" />
70+
71+
Finally, provide the required resources to `Definitions`:
72+
73+
<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/new_code.py" startAfter="after_downstream_marker" />
74+
75+
# Conclusion
76+
77+
In this guide, we have demonstrated how to migrate from using step launchers to using Dagster Pipes. We have shown how to launch PySpark jobs on AWS EMR using `PipesEMRClient` and how to pass small pieces of data between assets using Dagster's metadata and Pipes extras.
78+
79+
# Supplementary
80+
81+
- [Dagster Pipes](index.md)
82+
- [GitHub discussion](https://github.com/dagster-io/dagster/discussions/25685) on the topic
83+
- [Dagster + Spark](/integrations/libraries/spark) - an up to date list of Pipes Clients for various Spark providers can be found here
84+
- [AWS EMR Pipes tutorial](/guides/build/external-pipelines/aws/aws-emr-pipeline)
85+
- [PipesEMRClient API docs](/api/python-api/libraries/dagster-aws#dagster_aws.pipes.PipesEMRClient)
86+
87+
:::note
88+
89+
**Heads up!** As an alternative to storing paths with an `IOManager`, the following utility function can be used to retrieve logged metadata values from upstream assets:
90+
91+
<CodeExample path="docs_snippets/docs_snippets/guides/migrations/from_step_launchers_to_pipes/utils.py" startAfter="start_metadata_marker" endBefore="end_metadata_marker" />
92+
93+
:::
94+

Diff for: docs/docs/guides/build/external-pipelines/pyspark-pipeline.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Build pipelines with PySpark"
33
description: "Learn to integrate Dagster Pipes with PySpark to orchestrate PySpark jobs in a Dagster pipeline."
4-
sidebar_position: 900
4+
sidebar_position: 70
55
---
66

77
This tutorial is focused on using Dagster Pipes to launch & monitor general PySpark jobs. The [Spark integration page](/integrations/libraries/spark) provides more information on using Pipes with specific Spark providers, such as AWS EMR or Databricks.

Diff for: docs/docs/guides/deploy/code-locations/workspace-yaml.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,9 @@ sidebar_position: 200
44
---
55

66
:::info
7-
This reference is only applicable to Dagster OSS. For Dagster Cloud see [the Dagster Cloud Code Locations documentation](/dagster-plus/deployment/code-locations)
7+
8+
This reference is only applicable to Dagster OSS. For Dagster+, see [the Dagster+ code locations documentation](/dagster-plus/deployment/code-locations).
9+
810
:::
911

1012
The `workspace.yaml` file is used to configure code locations in Dagster. It tells Dagster where to find your code and how to load it.

Diff for: docs/docs/integrations/libraries/spark.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,10 @@ With Pipes, the code inside the asset or op definition submits a Spark job to an
3636
You can either use one of the available Pipes Clients or make your own. The available Pipes Clients for popular Spark providers are:
3737

3838
- [Databricks](/guides/build/external-pipelines/databricks-pipeline)
39-
- [AWS Glue](/guides/build/external-pipelines/aws-glue-pipeline)
40-
- [AWS EMR](/guides/build/external-pipelines/aws-emr-pipeline)
41-
- [AWS EMR on EKS](/guides/build/external-pipelines/aws-emr-containers-pipeline)
42-
- [AWS EMR Serverless](/guides/build/external-pipelines/aws-emr-serverless-pipeline)
39+
- [AWS Glue](/guides/build/external-pipelines/aws/aws-glue-pipeline)
40+
- [AWS EMR](/guides/build/external-pipelines/aws/aws-emr-pipeline)
41+
- [AWS EMR on EKS](/guides/build/external-pipelines/aws/aws-emr-containers-pipeline)
42+
- [AWS EMR Serverless](/guides/build/external-pipelines/aws/aws-emr-serverless-pipeline)
4343

4444
Existing Spark jobs can be used with Pipes without any modifications. In this case, Dagster will be receiving logs from the job, but not events like asset checks or attached metadata.
4545

Diff for: docs/vercel.json

+37-7
Original file line numberDiff line numberDiff line change
@@ -777,32 +777,62 @@
777777
},
778778
{
779779
"source": "/concepts/dagster-pipes/aws-ecs",
780-
"destination": "/guides/build/external-pipelines/aws-ecs-pipeline",
780+
"destination": "/guides/build/external-pipelines/aws/aws-ecs-pipeline",
781781
"permanent": false
782782
},
783783
{
784784
"source": "/concepts/dagster-pipes/aws-glue",
785-
"destination": "/guides/build/external-pipelines/aws-glue-pipeline",
785+
"destination": "/guides/build/external-pipelines/aws/aws-glue-pipeline",
786786
"permanent": false
787787
},
788788
{
789789
"source": "/concepts/dagster-pipes/aws-emr",
790-
"destination": "/guides/build/external-pipelines/aws-emr-pipeline",
790+
"destination": "/guides/build/external-pipelines/aws/aws-emr-pipeline",
791791
"permanent": false
792792
},
793793
{
794794
"source": "/concepts/dagster-pipes/aws-emr-containers",
795-
"destination": "guides/build/external-pipelines/aws-emr-pipeline",
795+
"destination": "/guides/build/external-pipelines/aws/aws-emr-containers-pipeline",
796796
"permanent": false
797797
},
798798
{
799799
"source": "/concepts/dagster-pipes/aws-emr-serverless",
800-
"destination": "/guides/build/external-pipelines/aws-emr-serverless-pipeline",
800+
"destination": "/guides/build/external-pipelines/aws/aws-emr-serverless-pipeline",
801801
"permanent": false
802802
},
803803
{
804804
"source": "/concepts/dagster-pipes/aws-lambda",
805-
"destination": "/guides/build/external-pipelines/aws-lambda-pipeline",
805+
"destination": "/guides/build/external-pipelines/aws/aws-lambda-pipeline",
806+
"permanent": false
807+
},
808+
{
809+
"source": "/guides/build/external-pipelines/aws-ecs-pipeline",
810+
"destination": "/guides/build/external-pipelines/aws/aws-ecs-pipeline",
811+
"permanent": false
812+
},
813+
{
814+
"source": "/guides/build/external-pipelines/aws-glue-pipeline",
815+
"destination": "/guides/build/external-pipelines/aws/aws-glue-pipeline",
816+
"permanent": false
817+
},
818+
{
819+
"source": "/guides/build/external-pipelines/aws-emr-pipeline",
820+
"destination": "/guides/build/external-pipelines/aws/aws-emr-pipeline",
821+
"permanent": false
822+
},
823+
{
824+
"source": "/guides/build/external-pipelines/aws/aws-emr-containers-pipeline",
825+
"destination": "/guides/build/external-pipelines/aws/aws-emr-containers-pipeline",
826+
"permanent": false
827+
},
828+
{
829+
"source": "/guides/build/external-pipelines/aws-emr-serverless-pipeline",
830+
"destination": "/guides/build/external-pipelines/aws/aws-emr-serverless-pipeline",
831+
"permanent": false
832+
},
833+
{
834+
"source": "/guides/build/external-pipelines/aws-lambda-pipeline",
835+
"destination": "/guides/build/external-pipelines/aws/aws-lambda-pipeline",
806836
"permanent": false
807837
},
808838
{
@@ -817,7 +847,7 @@
817847
},
818848
{
819849
"source": "/guides/migrations/from-step-launchers-to-pipes",
820-
"destination": "/guides/build/external-pipelines/",
850+
"destination": "/guides/build/external-pipelines/migrating-from-step-launchers-to-pipes",
821851
"permanent": false
822852
},
823853
{

0 commit comments

Comments
 (0)