Skip to content

[DOCS] GX Core to Cloud Migration Guide#11730

Open
r34ctor wants to merge 10 commits into
developfrom
core_to_cloud_migration
Open

[DOCS] GX Core to Cloud Migration Guide#11730
r34ctor wants to merge 10 commits into
developfrom
core_to_cloud_migration

Conversation

@r34ctor
Copy link
Copy Markdown
Contributor

@r34ctor r34ctor commented Mar 17, 2026

This is to provide high level guidance for migrating from Core to Cloud. In this initial pass, the goal is to show how you can do the basics of creating the same objects that you have in Core in Cloud using the same code, as well as run validations. This piece of documentation can be updated later as needed to show other use cases.

  • Description of PR changes above includes a link to an existing GitHub issue
  • PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB], [MINORBUMP]
  • Code is linted - run invoke lint (uses ruff format + ruff check)
  • Appropriate tests and docs have been updated

For more information about contributing, visit our community resources.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 17, 2026

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit 05d4def
🔍 Latest deploy log https://app.netlify.com/projects/niobium-lead-7998/deploys/69d3d6fca9341b000807ef36
😎 Deploy Preview https://deploy-preview-11730.docs.greatexpectations.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.66%. Comparing base (073dcd1) to head (05d4def).
⚠️ Report is 72 commits behind head on develop.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop   #11730   +/-   ##
========================================
  Coverage    84.66%   84.66%           
========================================
  Files          471      471           
  Lines        39170    39170           
========================================
  Hits         33165    33165           
  Misses        6005     6005           
Flag Coverage Δ
3.10 73.56% <ø> (ø)
3.10 athena ?
3.10 aws_deps ?
3.10 big ?
3.10 clickhouse ?
3.10 filesystem ?
3.10 mysql ?
3.10 openpyxl or pyarrow or project or sqlite or aws_creds ?
3.10 postgresql ?
3.10 spark ?
3.10 spark_connect ?
3.10 sql_server ?
3.10 trino ?
3.11 73.59% <ø> (-0.02%) ⬇️
3.11 athena ?
3.11 aws_deps ?
3.11 big ?
3.11 clickhouse ?
3.11 filesystem ?
3.11 mysql ?
3.11 openpyxl or pyarrow or project or sqlite or aws_creds ?
3.11 postgresql ?
3.11 spark ?
3.11 spark_connect ?
3.11 sql_server ?
3.11 trino ?
3.12 73.61% <ø> (+0.01%) ⬆️
3.12 athena ?
3.12 aws_deps ?
3.12 big ?
3.12 filesystem ?
3.12 mysql ?
3.12 openpyxl or pyarrow or project or sqlite or aws_creds ?
3.12 postgresql ?
3.12 spark ?
3.12 spark_connect ?
3.12 sql_server ?
3.12 trino ?
3.13 73.61% <ø> (ø)
3.13 athena 41.93% <ø> (ø)
3.13 aws_deps 45.18% <ø> (ø)
3.13 big 55.27% <ø> (ø)
3.13 bigquery 51.25% <ø> (ø)
3.13 clickhouse 41.94% <ø> (ø)
3.13 databricks 53.06% <ø> (ø)
3.13 filesystem 64.37% <ø> (ø)
3.13 gx-redshift 51.41% <ø> (ø)
3.13 mysql 51.81% <ø> (ø)
3.13 openpyxl or pyarrow or project or sqlite or aws_creds 59.97% <ø> (ø)
3.13 postgresql 55.22% <ø> (ø)
3.13 snowflake 53.90% <ø> (ø)
3.13 spark 55.92% <ø> (ø)
3.13 spark_connect 46.85% <ø> (ø)
3.13 sql_server 53.23% <ø> (ø)
3.13 trino 48.75% <ø> (ø)
cloud 0.00% <ø> (ø)
docs-basic 59.52% <ø> (ø)
docs-creds-needed 58.11% <ø> (ø)
docs-spark 57.57% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@r34ctor
Copy link
Copy Markdown
Contributor Author

r34ctor commented Mar 18, 2026

I couldn't figure out how to add these snippets to test while including the lines for setting up the environment variables.

@r34ctor r34ctor marked this pull request as ready for review March 19, 2026 17:51
@r34ctor r34ctor requested review from billdirks and klavavej March 19, 2026 17:51
Copy link
Copy Markdown
Contributor

@klavavej klavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up that while I included a few fine-grained notes, I didn’t proofread closely for grammar / tone / clarity yet because I’m asking for such major changes to the framing, contents, and flow of the page. I will look at grammar and other details more closely in the next round of review


This guide will enable you to migrate your GX Core configuration to a GX Cloud organization. Since GX Cloud is built on top of GX Core, the code you used to originally set up your GX Core configuration can be reused for setting up your GX Cloud organization.

The key difference between using GX Core and GX Cloud is the Data Context. By setting the mode of your Data Context to `cloud` and then providing the appropriate credentials, you will be able to connect to your GX Cloud organization. Once you have created a Cloud Data Context, the rest of the code you have already written to configure your GX entities, such as Data Sources, Data Assets, and Expectations, can be re-run to migrate your existing configuration into GX Cloud. Similarly, any code that you have written to run validations, including Custom Actions, can also be reused.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: "The key difference between using GX Core and GX Cloud is the Data Context."

This is focused on technical details of how Core and Cloud work from a code perspective and obfuscates the differences in the business value they provide.

This page seems to assume that people already know they want to migrate. That might be the case for users/prospects that we send this article to. But folks who come across the page organically would benefit from some high-level info about why someone might want to migrate from Core to Cloud in the first place.

The value proposition for Cloud on this page doesn't have to be super in-depth. I plan on eventually adding a dedicated Cloud vs. Core value comparison where we can provide all the details. But I think it would be good to mention broadly the kind of business value folks can get from Cloud but not Core and/or highlight a few specific features that are available in Cloud but not Core.

title: "GX Core to GX Cloud Migration Guide"
---

## Overview
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unusual to have an "Overview" h2 right after the page title h1. As far as I can tell, no other pages in the docs currently do this. I suggest removing this header for consistency and concision.

Suggested change
## Overview

---
id: core_to_cloud
sidebar_label: 'GX Core to GX Cloud Migration Guide'
title: "GX Core to GX Cloud Migration Guide"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This page should have a description: in the frontmatter for SEO

### Configuration Setup
In the example below, a File Data Context has been created, along with a Postgres Data Source and Data Asset.

```python
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All code blocks should have titles / language identifiers applied with title=""

This standard exists in part because of this bug

Comment on lines +94 to +95
## Limitations
Some common limitations of migrating from GX Core to GX Cloud are listed below. Refer to the [compatibility reference](/help/compatibility_reference.md) for a comprehensive list of limitations.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section intro says "Some common limitations of migrating from GX Core to GX Cloud are listed below." but the items in the list are limitations of Cloud itself rather than limitations on the act of migrating. There probably are limitations on the act of migrating that should be communicated in this page. The main one that comes to mind is that historical results from validations run with Core cannot be migrated to be visible in the Cloud UI.

The framing of "limitations" here makes me nervous because it seems to present Core as more fully-featured than Cloud and obfuscates the additional value that's available in Cloud but not Core. rhetorical question - Why would a user want to move from Core to Cloud if they lose the three items below and gain nothing?

The three items below where Core does support some options that are not available in Cloud should still be communicated, but I think they should be framed in a different way. Perhaps just changing from "limitations" language to language more like "Some options that are available in GX Core are not available in GX Cloud" with more of an explanation of why the option isn't available / doesn't make sense with Cloud would be enough to present this info in a more positive light.

And to further highlight the value of Cloud / the reason why someone should take the time to complete the steps in this guide, I think it would be good to add a "next steps" section that highlights Cloud-only features the user should explore after migrating their Data Sources and Data Assets (for example, ExpectAI and alerts)

This guide will enable you to migrate your GX Core configuration to a GX Cloud organization. Since GX Cloud is built on top of GX Core, the code you used to originally set up your GX Core configuration can be reused for setting up your GX Cloud organization.

The key difference between using GX Core and GX Cloud is the Data Context. By setting the mode of your Data Context to `cloud` and then providing the appropriate credentials, you will be able to connect to your GX Cloud organization. Once you have created a Cloud Data Context, the rest of the code you have already written to configure your GX entities, such as Data Sources, Data Assets, and Expectations, can be re-run to migrate your existing configuration into GX Cloud. Similarly, any code that you have written to run validations, including Custom Actions, can also be reused.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this page should mention a prerequisite of having a Cloud account with workspace Editor permissions or greater.

Folks who have been using Core have likely already fulfilled the other prereqs we typically mention for Cloud API workflows (e.g. Python version 3.10 to 3.13.) so I feel ok about omitting those.

name="FULL_TABLE"
)

suite = context.suites.get("my_suite")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone is migrating from Core to a totally fresh Cloud organization, they won't have any Expectation Suites to get. When I first skimmed the code samples, I had an impression that I'd be able to essentially copy/paste an Expectation Suite from Core to Cloud with all of my pre-existing Expectation configurations. But, then when I ran the code samples, this line gave me an error

I see that I can change the Expectation Suite name to one that exists in my cloud org. But, users who are migrating for the first time won't have any Cloud Expectation Suites to get

Screenshot 2026-03-20 at 11 33 51 AM

I asked chatgpt and got the following code for copying Expectations from a Core Expectation Suite and replicating them in a Cloud Expectation Suite. I tried it and it seemed to work. Can we include a workflow like this in this migration guide for migrating configured Expectations from Core?

import copy

context = gx.get_context(mode="file")
source_suite = context.suites.get("my_core_suite")

context = gx.get_context(mode="cloud")
target_suite = context.suites.add(gx.ExpectationSuite(name="my_cloud_suite")
)

for exp in source_suite.expectations:
    exp_copy = copy.copy(exp)          
    exp_copy.id = None                #
    target_suite.add_expectation(exp_copy)
    

I see that the page intro says

Once you have created a Cloud Data Context, the rest of the code you have already written to configure your GX entities, such as Data Sources, Data Assets, and Expectations, can be re-run to migrate your existing configuration into GX Cloud.

But in the case of Expectations, I'm not sure how easy it will be for people to re-find the code they used to configure Expectations in the first place so that they can re-run the code in a Cloud context. An Expectation Suite can be added to over time, so the code that created all the Expectations in a given Suite may be spread out over various people and over time. It seems like it would remove a lot of friction for users if we could provide a way to copy the current state of a Core Expectation Suite and recreate it in Cloud.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

demo video of the workflow I'm suggesting (too big for github so posted in slack)

https://greatexpectationslabs.slack.com/archives/C03B8DZCJ07/p1774033457491619

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user has a file context and wants to create a cloud context the should be able to copy the entities over from one to the other.
There are 2 ways to do this:

  • Loop over data sources, assets, suites, expectations, etc and copy the entities over one by one. This is basically what @klavavej is proposing here. You could do the same for data sources, data assets, etc.
  • For most entities, I think you can copy the top level one over:
file_context = gx.get_context(mode="file")
cloud_context = gx.get_context(mode="cloud")

# This will copy the suite which includes all it's expectations
cloud_context.suites.add(file_context.suites.get("my_suite"))

Unfortunately, this doesn't seem to work with data sources so might be confusing to document in general.

This would be a different strategy but could probably be scripted so a user could run it to move everything over.

Comment on lines +13 to +15
## Examples

### Configuration Setup
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little unusual for headings to butt up against each other like this with no content in between.

I suggest either

  • just removing the "Examples" header
  • or, adding a little content under the "Examples" header that introduces its subsections.

Comment on lines +13 to +15
## Examples

### Configuration Setup
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sections that provide instructions, I try to have the headers use imperative verbs. This helps readers know that there is a task to complete in the section.

This section of content is a bit unusual in that it both provides conceptual background info (about how you might have set things up in Core) and instructions (about how to replicate things in Cloud)

This might need some supporting edits to the phrasing of the content in the section, but what would you think about having this header be something like

Migrate entities

or

Re-create a Data Asset

or

Add a Data Asset

?


Running this script will now create the same Data Source, Data Asset, Batch Definition, and Validation Definition in your GX Cloud organization.

### Running Validations
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to another note, I try to make headers for instructional sections use imperative verbs.

Again, this might require some rephrasing of the section contents, but what would you think about changing this header to something like

Run a Validation

or

Validate a migrated Data Asset

?

Copy link
Copy Markdown
Contributor

@billdirks billdirks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together.
I've left a few comments. One I would call out is a response to Kristen's where she talks about using suite.get on the cloud context. In there is laid out a different strategy than what these docs put forward. These docs focus on manually running the same script to create entities in cloud that were used for a file system. That is great if that script exists. If there isn't a script, one can migrate by examining the file context and copy the entities over from there, eg

for expectation in file_suite.expectations:
    cloud_suite.add(expectation)

You could do something similar for data sources and assets by looking at the name and configuration information in the file context and using that to create ones in the cloud context.
Happy to discuss synchronously if that would be useful.

import great_expectations as gx

context = gx.get_context(mode="file")
ds = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've suggested a typo fix and changing the port since 5432 is the standard postgresql port and ports 0-1023 are operationally special.

Suggested change
ds = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db")
ds = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:password@myhost.domain>:5432>/sample_db")

Comment on lines +41 to +45
import os

os.environ["GX_CLOUD_ORGANIZATION_ID"] = "<YOUR_GX_CLOUD_ORGANIZATION_ID>"
os.environ["GX_CLOUD_WORKSPACE_ID"] = "<YOUR_GX_CLOUD_WORKSPACE_ID>"
os.environ["GX_CLOUD_ACCESS_TOKEN"] = "<YOUR_GX_CLOUD_ACCESS_TOKEN>"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you want snippets to run you will need to put them in a .py file and register them in examples_under_test.py. The backend_dependencies=[BackendDependencies.CLOUD] is required for cloud tests. In case people are curious, that will make the tests start a local cloud backend in a docker container. The tests will run using this docker container.

os.environ["GX_CLOUD_ACCESS_TOKEN"] = "<YOUR_GX_CLOUD_ACCESS_TOKEN>"

context = gx.get_context(mode="cloud")
data_source = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data_source = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db")
data_source = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:password@myhost.domain>:5432>/sample_db")

name="FULL_TABLE"
)

suite = context.suites.get("my_suite")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We assume the suite exists? Everything else we are creating from scratch. We could create a suite and add expectations to it. That would resolve @klavavej's comment below since then we'd be creating it in both places.

validation_definition = gx.ValidationDefinition(
data=bd, suite=suite, name="Validation Definition"
)
context.validation_definitions.add(validation_definition)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also want to create a checkpoint? That is the final domain object that a user needs. In running validations section below you do a get which means it must exists.

name="FULL_TABLE"
)

suite = context.suites.get("my_suite")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user has a file context and wants to create a cloud context the should be able to copy the entities over from one to the other.
There are 2 ways to do this:

  • Loop over data sources, assets, suites, expectations, etc and copy the entities over one by one. This is basically what @klavavej is proposing here. You could do the same for data sources, data assets, etc.
  • For most entities, I think you can copy the top level one over:
file_context = gx.get_context(mode="file")
cloud_context = gx.get_context(mode="cloud")

# This will copy the suite which includes all it's expectations
cloud_context.suites.add(file_context.suites.get("my_suite"))

Unfortunately, this doesn't seem to work with data sources so might be confusing to document in general.

This would be a different strategy but could probably be scripted so a user could run it to move everything over.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Is this PR still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity.

It will be closed if no further activity occurs. Thank you for your contributions 🙇

@github-actions github-actions Bot added stale Stale issues and PRs and removed stale Stale issues and PRs labels May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants