Add AWS SageMaker Unified Studio Workflow Operator #45726

agupta01 · 2025-01-16T21:21:39Z

Description

Adds an operator used for executing Jupyter Notebooks, Querybooks, and Visual ETL jobs within the context of a SageMaker Unified Studio project.

SageMaker Unified Studio (SMUS) supports development of Airflow DAGs (called "workflows" within the product) that are run on an MWAA cluster managed by the project. These workflows have the ability to orchestrate the execution of Unified Studio artifacts that can connect to data assets stored in a SMUS project.

Implementation-wise, these notebooks are executed on a SageMaker Training Job running a SageMaker Distribution environment within the context of a SMUS project.

Components

SageMakerNotebookOperator: this operator allows users to execute Unified Studio artifacts within the context of their project.
SageMakerNotebookHook: this hook provides a wrapper around the notebook execution
SageMakerNotebookSensor: this sensor waits on status updates from the notebook execution
SageMakerNotebookJobTrigger: this trigger activates when the notebook execution completes

Usage

Note that this operator introduces a dependency on the SageMaker Studio SDK https://www.pypi.org/project/sagemaker-studio

with DAG(...) as dag:
    ...
    run_notebook = SageMakerNotebookOperator(
        task_id="initial",
        input_config={"input_path": <notebook_path_in_s3>, "input_params": {}},
        output_config={"output_formats": ["NOTEBOOK"]},
        wait_for_completion=True,
        waiter_delay=5,
    )
   ...

Testing

MWAA uses python 3.11 and postgres backend, so we will set those values for all tests.

Unit tests

breeze testing core-tests -p 3.11 -b postgres providers/amazon/tests/provider_tests/amazon/aws/*/test_sagemaker_unified_studio.py

System tests

Ensure a properly configured SageMaker Unified Domain and Project as indicated in the example_sagemaker_unified_studio.py file. Also ensure AWS credentials are populated and up to date. Then, populate the DOMAIN_ID, PROJECT_ID, ENVIRONMENT_ID, and S3_PATH in files/airflow-breeze-config/variables.env and run:

breeze testing system-tests -p 3.11 -b postgres --forward-credentials --test-timeout 3600 providers/amazon/tests/system/amazon/aws/example_sagemaker_unified_studio.py

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

… hooks

boring-cyborg · 2025-01-16T21:21:44Z

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
Be sure to read the Airflow Coding style.
Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: [email protected]
Slack: https://s.apache.org/airflow-slack

providers/src/airflow/providers/amazon/aws/hooks/sagemaker_unified_studio.py

providers/src/airflow/providers/amazon/aws/operators/sagemaker_unified_studio.py

providers/src/airflow/providers/amazon/aws/hooks/sagemaker_unified_studio.py

providers/src/airflow/providers/amazon/aws/sensors/sagemaker_unified_studio.py

providers/tests/system/amazon/aws/example_sagemaker_unified_studio.py

providers/amazon/docs/operators/sagemakerunifiedstudio.rst

providers/amazon/src/airflow/providers/amazon/aws/hooks/sagemaker_unified_studio.py

providers/amazon/src/airflow/providers/amazon/aws/operators/sagemaker_unified_studio.py

o-nikolas · 2025-02-21T19:46:31Z

Python 3.10 builds failing due to python dependency issues.

  #45 0.924 Using Python 3.10.16 environment at: /usr/local
  #45 2.028    Building apache-airflow @ file:///opt/airflow
  #45 3.403       Built apache-airflow @ file:///opt/airflow
  #45 4.229   × No solution found when resolving dependencies:
  #45 4.229   ╰─▶ Because the current Python version (3.10.16) does not satisfy
  #45 4.229       Python>=3.11 and sagemaker-studio==1.0.7 depends on Python>=3.11, we can
  #45 4.229       conclude that sagemaker-studio==1.0.7 cannot be used.
  #45 4.229       And because only sagemaker-studio<=1.0.7 is available and
  #45 4.229       apache-airflow[devel-ci]==3.0.0.dev0 depends on sagemaker-studio>=1.0.7,
  #45 4.229       we can conclude that apache-airflow[devel-ci]==3.0.0.dev0 cannot be
  #45 4.229       used.
  #45 4.229       And because only apache-airflow[devel-ci]==3.0.0.dev0 is available
  #45 4.229       and you require apache-airflow[devel-ci], we can conclude that your
  #45 4.229       requirements are unsatisfiable.
  #45 ERROR: process "/bin/bash -o pipefail -o errexit -o nounset -o nolog -c bash /scripts/docker/install_airflow.sh" did not complete successfully: exit code: 1

boring-cyborg · 2025-03-04T21:01:31Z

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

Adds an operator used for executing Jupyter Notebooks, Querybooks, and Visual ETL jobs within the context of a SageMaker Unified Studio project. --------- Co-authored-by: Niko Oliveira <[email protected]>

)" This reverts commit 9939b1b.

Adds an operator used for executing Jupyter Notebooks, Querybooks, and Visual ETL jobs within the context of a SageMaker Unified Studio project. --------- Co-authored-by: Niko Oliveira <[email protected]>

agupta01 and others added 9 commits January 13, 2025 20:35

Add sagemaker_unified_studio notebook operator, sensor, triggers, and…

1118295

… hooks

Fix sagemaker_unified_studio unit tests

44e45c8

Add basic system test for sagemaker_unified_studio

6c627ef

Add setup/teardown stubs for sagemaker_unified_studio system test

33c4a3d

Add more specifics to SUS system test

a1e0766

Update name of SUS helper

5ee6aad

Cleanup and format SUS system test

e378358

Merge branch 'apache:main' into main

b369756

Merge branch 'apache:main' into main

209cb3c

boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Jan 16, 2025

agupta01 and others added 8 commits January 17, 2025 12:10

Merge branch 'apache:main' into main

092c786

Merge branch 'apache:main' into main

55f5486

Fix notebook path in SUS system test

c640efa

Merge branch 'apache:main' into main

41bd997

Merge branch 'apache:main' into main

7acc649

Update SUS docs

b7ad9db

Update SUS docs to include vETL

7ca5022

Clarity updates on SUS docs

13a10ce

o-nikolas reviewed Jan 28, 2025

View reviewed changes

agupta01 changed the title ~~Add AWS SageMaker Unified Studio Notebook Operator~~ Add AWS SageMaker Unified Studio Workflow Operator Jan 29, 2025

agupta01 and others added 4 commits January 30, 2025 15:25

Merge branch 'apache:main' into main

756fcad

Merge branch 'apache:main' into main

ffd3363

Update SUS operator files after providers refactor

beaadda

Add failure on timeout

654f4e1

eladkal reviewed Feb 11, 2025

View reviewed changes

agupta01 and others added 2 commits February 11, 2025 17:32

Merge branch 'apache:main' into main

1450dc6

Set public sagemaker studio SDK as dependency for SUS operator

ad56227

agupta01 marked this pull request as ready for review February 11, 2025 23:00

agupta01 added 2 commits February 20, 2025 16:50

Update SUS system test to use executor_config for environment variables

237f78d

Fix linting and formatting

d60d2e9

agupta01 and others added 7 commits February 21, 2025 16:31

Fix system test for localexecutor

b23c3d3

Fix formatting

8e02fbd

Merge branch 'apache:main' into main

af2dc1a

Merge branch 'apache:main' into main

f001638

Update sagemaker-studio lower bound dependency

99e820e

Fix SMUS system test

7373926

Fix broken link in SMUS documentation

b581b2e

agupta01 requested a review from o-nikolas February 27, 2025 23:34

agupta01 and others added 9 commits February 28, 2025 10:19

Merge branch 'apache:main' into main

7001393

Merge branch 'apache:main' into main

f52762c

Fix pre-commit violations

1307ed3

Convert tests to pytest

de85e5c

Register hook + add license file

692f3f8

Merge branch 'apache:main' into main

96468ee

Merge branch 'apache:main' into main

3bd17ca

Merge branch 'main' into main

ffcd453

Merge branch 'apache:main' into main

2e05bcf

o-nikolas approved these changes Mar 4, 2025

View reviewed changes

o-nikolas merged commit 9939b1b into apache:main Mar 4, 2025
148 checks passed

eladkal mentioned this pull request Mar 9, 2025

Status of testing Providers that were prepared on March 09, 2025 #47549

Closed

67 tasks

agupta01 added a commit to agupta01/airflow that referenced this pull request Mar 13, 2025

Revert "Add AWS SageMaker Unified Studio Workflow Operator (apache#45726

2b7d5b8

)" This reverts commit 9939b1b.

eladkal mentioned this pull request Mar 26, 2025

Status of testing Providers that were prepared on March 26, 2025 #48395

Closed

This was referenced Apr 6, 2025

Status of testing Providers that were prepared on April 06, 2025 #48842

Closed

Status of testing Providers that were prepared on April 10, 2025 #49066

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add AWS SageMaker Unified Studio Workflow Operator #45726

Add AWS SageMaker Unified Studio Workflow Operator #45726

Uh oh!

agupta01 commented Jan 16, 2025 •

edited

Loading

Uh oh!

boring-cyborg bot commented Jan 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

o-nikolas commented Feb 21, 2025

Uh oh!

Uh oh!

boring-cyborg bot commented Mar 4, 2025

Uh oh!

Uh oh!

Add AWS SageMaker Unified Studio Workflow Operator #45726

Add AWS SageMaker Unified Studio Workflow Operator #45726

Uh oh!

Conversation

agupta01 commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Components

Usage

Testing

Unit tests

System tests

Uh oh!

boring-cyborg bot commented Jan 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

o-nikolas commented Feb 21, 2025

Uh oh!

Uh oh!

boring-cyborg bot commented Mar 4, 2025

Uh oh!

Uh oh!

agupta01 commented Jan 16, 2025 •

edited

Loading