Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 99 additions & 0 deletions docs/docusaurus/docs/reference/learn/core_to_cloud.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
id: core_to_cloud
sidebar_label: 'GX Core to GX Cloud Migration Guide'
title: "GX Core to GX Cloud Migration Guide"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This page should have a description: in the frontmatter for SEO

---

## Overview
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unusual to have an "Overview" h2 right after the page title h1. As far as I can tell, no other pages in the docs currently do this. I suggest removing this header for consistency and concision.

Suggested change
## Overview


This guide will enable you to migrate your GX Core configuration to a GX Cloud organization. Since GX Cloud is built on top of GX Core, the code you used to originally set up your GX Core configuration can be reused for setting up your GX Cloud organization.

The key difference between using GX Core and GX Cloud is the Data Context. By setting the mode of your Data Context to `cloud` and then providing the appropriate credentials, you will be able to connect to your GX Cloud organization. Once you have created a Cloud Data Context, the rest of the code you have already written to configure your GX entities, such as Data Sources, Data Assets, and Expectations, can be re-run to migrate your existing configuration into GX Cloud. Similarly, any code that you have written to run validations, including Custom Actions, can also be reused.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: "The key difference between using GX Core and GX Cloud is the Data Context."

This is focused on technical details of how Core and Cloud work from a code perspective and obfuscates the differences in the business value they provide.

This page seems to assume that people already know they want to migrate. That might be the case for users/prospects that we send this article to. But folks who come across the page organically would benefit from some high-level info about why someone might want to migrate from Core to Cloud in the first place.

The value proposition for Cloud on this page doesn't have to be super in-depth. I plan on eventually adding a dedicated Cloud vs. Core value comparison where we can provide all the details. But I think it would be good to mention broadly the kind of business value folks can get from Cloud but not Core and/or highlight a few specific features that are available in Cloud but not Core.


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this page should mention a prerequisite of having a Cloud account with workspace Editor permissions or greater.

Folks who have been using Core have likely already fulfilled the other prereqs we typically mention for Cloud API workflows (e.g. Python version 3.10 to 3.13.) so I feel ok about omitting those.

## Examples

### Configuration Setup
Comment on lines +13 to +15
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little unusual for headings to butt up against each other like this with no content in between.

I suggest either

  • just removing the "Examples" header
  • or, adding a little content under the "Examples" header that introduces its subsections.

Comment on lines +13 to +15
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sections that provide instructions, I try to have the headers use imperative verbs. This helps readers know that there is a task to complete in the section.

This section of content is a bit unusual in that it both provides conceptual background info (about how you might have set things up in Core) and instructions (about how to replicate things in Cloud)

This might need some supporting edits to the phrasing of the content in the section, but what would you think about having this header be something like

Migrate entities

or

Re-create a Data Asset

or

Add a Data Asset

?

In the example below, a File Data Context has been created, along with a Postgres Data Source and Data Asset.

```python
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All code blocks should have titles / language identifiers applied with title=""

This standard exists in part because of this bug

import great_expectations as gx

context = gx.get_context(mode="file")
ds = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've suggested a typo fix and changing the port since 5432 is the standard postgresql port and ports 0-1023 are operationally special.

Suggested change
ds = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db")
ds = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:password@myhost.domain>:5432>/sample_db")

asset = ds.add_table_asset(table_name="sample_table", name="sample_table")

bd = asset.add_batch_definition_whole_table(
name="FULL_TABLE"
)

suite = context.suites.get("my_suite")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We assume the suite exists? Everything else we are creating from scratch. We could create a suite and add expectations to it. That would resolve @klavavej's comment below since then we'd be creating it in both places.


validation_definition = gx.ValidationDefinition(
data=bd, suite=suite, name="Validation Definition"
)
context.validation_definitions.add(validation_definition)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also want to create a checkpoint? That is the final domain object that a user needs. In running validations section below you do a get which means it must exists.

```

In order to recreate these same entities in GX Cloud, create a Cloud Data Context by setting the mode to `cloud` and providing your [GX Cloud Credentials](/cloud/connect/connect_python.md#get-your-credentials) for the `GX_CLOUD_ORGANIZATION_ID`, `GX_CLOUD_WORKSPACE_ID` and `GX_CLOUD_ACCESS_TOKEN` environment variables. The rest of the code remains unchanged.

```python
import great_expectations as gx
import os

os.environ["GX_CLOUD_ORGANIZATION_ID"] = "<YOUR_GX_CLOUD_ORGANIZATION_ID>"
os.environ["GX_CLOUD_WORKSPACE_ID"] = "<YOUR_GX_CLOUD_WORKSPACE_ID>"
os.environ["GX_CLOUD_ACCESS_TOKEN"] = "<YOUR_GX_CLOUD_ACCESS_TOKEN>"
Comment on lines +41 to +45
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: this comment

I couldn't figure out how to add these snippets to test while including the lines for setting up the environment variables.

You can refer to the recently added Fabric and SQL Server docs for how we handle this elsewhere

instead of demonstrating setting env vars in python code, we typically show setting them in terminal with a suggestion to add to .bashrc or .zshrc

  1. Save your GX Cloud and Microsoft Fabric credentials as environment variables by entering export ENV_VAR_NAME=env_var_value in the terminal or adding the command to your ~/.bashrc or ~/.zshrc file. For example:

    export GX_CLOUD_ACCESS_TOKEN=<user_access_token>
    export GX_CLOUD_ORGANIZATION_ID=<organization_id>
    export GX_CLOUD_WORKSPACE_ID=<workspace_id>
    export ENTRA_ID_TENANT=<tenant_id>.
    export ENTRA_ID_CLIENT_ID=<client_id>
    export ENTRA_ID_CLIENT_SECRET=<client_secret>

Then, when you put the python code sample under test by registering it in examples_under_test.py I think including backend_dependencies=[BackendDependencies.CLOUD] makes cloud credentials available when the test runs. (this bit could use a fact check from @billdirks )

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you want snippets to run you will need to put them in a .py file and register them in examples_under_test.py. The backend_dependencies=[BackendDependencies.CLOUD] is required for cloud tests. In case people are curious, that will make the tests start a local cloud backend in a docker container. The tests will run using this docker container.


context = gx.get_context(mode="cloud")
data_source = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data_source = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db")
data_source = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:password@myhost.domain>:5432>/sample_db")

asset = data_source.add_table_asset(table_name="sample_table", name="sample_table")

batch_definition = asset.add_batch_definition_whole_table(
name="FULL_TABLE"
)

suite = context.suites.get("my_suite")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone is migrating from Core to a totally fresh Cloud organization, they won't have any Expectation Suites to get. When I first skimmed the code samples, I had an impression that I'd be able to essentially copy/paste an Expectation Suite from Core to Cloud with all of my pre-existing Expectation configurations. But, then when I ran the code samples, this line gave me an error

I see that I can change the Expectation Suite name to one that exists in my cloud org. But, users who are migrating for the first time won't have any Cloud Expectation Suites to get

Screenshot 2026-03-20 at 11 33 51 AM

I asked chatgpt and got the following code for copying Expectations from a Core Expectation Suite and replicating them in a Cloud Expectation Suite. I tried it and it seemed to work. Can we include a workflow like this in this migration guide for migrating configured Expectations from Core?

import copy

context = gx.get_context(mode="file")
source_suite = context.suites.get("my_core_suite")

context = gx.get_context(mode="cloud")
target_suite = context.suites.add(gx.ExpectationSuite(name="my_cloud_suite")
)

for exp in source_suite.expectations:
    exp_copy = copy.copy(exp)          
    exp_copy.id = None                #
    target_suite.add_expectation(exp_copy)
    

I see that the page intro says

Once you have created a Cloud Data Context, the rest of the code you have already written to configure your GX entities, such as Data Sources, Data Assets, and Expectations, can be re-run to migrate your existing configuration into GX Cloud.

But in the case of Expectations, I'm not sure how easy it will be for people to re-find the code they used to configure Expectations in the first place so that they can re-run the code in a Cloud context. An Expectation Suite can be added to over time, so the code that created all the Expectations in a given Suite may be spread out over various people and over time. It seems like it would remove a lot of friction for users if we could provide a way to copy the current state of a Core Expectation Suite and recreate it in Cloud.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

demo video of the workflow I'm suggesting (too big for github so posted in slack)

https://greatexpectationslabs.slack.com/archives/C03B8DZCJ07/p1774033457491619

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user has a file context and wants to create a cloud context the should be able to copy the entities over from one to the other.
There are 2 ways to do this:

  • Loop over data sources, assets, suites, expectations, etc and copy the entities over one by one. This is basically what @klavavej is proposing here. You could do the same for data sources, data assets, etc.
  • For most entities, I think you can copy the top level one over:
file_context = gx.get_context(mode="file")
cloud_context = gx.get_context(mode="cloud")

# This will copy the suite which includes all it's expectations
cloud_context.suites.add(file_context.suites.get("my_suite"))

Unfortunately, this doesn't seem to work with data sources so might be confusing to document in general.

This would be a different strategy but could probably be scripted so a user could run it to move everything over.


validation_definition = gx.ValidationDefinition(
data=batch_definition, suite=suite, name="Validation Definition"
)
context.validation_definitions.add(validation_definition)
```

Running this script will now create the same Data Source, Data Asset, Batch Definition, and Validation Definition in your GX Cloud organization.

### Running Validations
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to another note, I try to make headers for instructional sections use imperative verbs.

Again, this might require some rephrasing of the section contents, but what would you think about changing this header to something like

Run a Validation

or

Validate a migrated Data Asset

?


The code snippet below runs a Checkpoint in an existing GX Core configuration.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a disconnect here in that the "Configuration Setup" section shows how to migrate a Validation Definition but not how to migrate a Checkpoint. Then this "Running Validations" section shows how to run a Checkpoint. I worry that users will get tripped up by this - that they'll try the Cloud code sample for running a Checkpoint, have it fail, and not understand why.

To close the gap between the two sections (so users can copy/paste code samples, run them in order, and have them succeed) I think the "Configuration Setup" section should show how to migrate a Checkpoint OR the "Running Validations" section should show how to run a Validation Definition.


```python
import great_expectations as gx

context = gx.get_context("file")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got an error when I tried to run this line, looks like mode= is missing

Suggested change
context = gx.get_context("file")
context = gx.get_context(mode="file")

checkpoint = context.checkpoints.get("My Checkpoint")
checkpoint.run()
```

In order to execute a checkpoint within your GX Cloud organization, the same code snippet can be used. In the same way as the previous example, set the mode of your data context to `cloud` and provide your [GX Cloud Credentials](/cloud/connect/connect_python.md#get-your-credentials) for the `GX_CLOUD_ORGANIZATION_ID`, `GX_CLOUD_WORKSPACE_ID` and `GX_CLOUD_ACCESS_TOKEN` environment variables.

```python
import great_expectations as gx
import os

os.environ["GX_CLOUD_ORGANIZATION_ID"] = "<YOUR_GX_CLOUD_ORGANIZATION_ID>"
os.environ["GX_CLOUD_WORKSPACE_ID"] = "<YOUR_GX_CLOUD_WORKSPACE_ID>"
os.environ["GX_CLOUD_ACCESS_TOKEN"] = "<YOUR_GX_CLOUD_ACCESS_TOKEN>"

context = gx.get_context(mode="cloud")
checkpoint = context.checkpoints.get("My Checkpoint")
checkpoint.run()
```

Your GX Cloud checkpoint can now be used to run validations wherever needed, such as within your data pipelines.

## Limitations
Some common limitations of migrating from GX Core to GX Cloud are listed below. Refer to the [compatibility reference](/help/compatibility_reference.md) for a comprehensive list of limitations.
Comment on lines +94 to +95
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section intro says "Some common limitations of migrating from GX Core to GX Cloud are listed below." but the items in the list are limitations of Cloud itself rather than limitations on the act of migrating. There probably are limitations on the act of migrating that should be communicated in this page. The main one that comes to mind is that historical results from validations run with Core cannot be migrated to be visible in the Cloud UI.

The framing of "limitations" here makes me nervous because it seems to present Core as more fully-featured than Cloud and obfuscates the additional value that's available in Cloud but not Core. rhetorical question - Why would a user want to move from Core to Cloud if they lose the three items below and gain nothing?

The three items below where Core does support some options that are not available in Cloud should still be communicated, but I think they should be framed in a different way. Perhaps just changing from "limitations" language to language more like "Some options that are available in GX Core are not available in GX Cloud" with more of an explanation of why the option isn't available / doesn't make sense with Cloud would be enough to present this info in a more positive light.

And to further highlight the value of Cloud / the reason why someone should take the time to complete the steps in this guide, I think it would be good to add a "next steps" section that highlights Cloud-only features the user should explore after migrating their Data Sources and Data Assets (for example, ExpectAI and alerts)


- Some Data Sources that are supported in GX Core may not be supported in GX Cloud.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds overly broad given that it's only SQLite that's not supported for Cloud. There's an old resolved comment in the Data Sources Painted Doors PRD that explains why SQLite isn't available for Cloud and why this shouldn't surprise any SQLite users

sqlite

I think the info about Data Source support should be explicit that it's only SQLite that's not supported (and maybe also include the reason why) so that we don't create undue fear/uncertainty/doubt that a user's key Data Sources they've been using with Core won't be supported in Cloud.

- Any Custom Expectations that you have created are not compatible with GX Cloud.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if someone tries to use a Custom Expectation with Cloud? If there are other non-custom Expectations in an Expectation Suite with a Custom Expectation, does the custom one just no-op while the rest are validated normally? Or does the whole validation run fail to execute?

I'm not trying to say that we need to explicitly document this behavior, I'm just wanting to know what happens so I can see if we should let people know about the lack of Cloud support for Custom Expectations earlier on the page.

- Credentials stored in the `config_variables.yml` file are not supported in GX Cloud.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If folks were using config_variables.yml what should they do instead with their credentials when moving to Cloud? I think this info about what's not supported should come with a migration strategy about what the user can do instead to achieve a similar effect for Cloud. Is the path forward migrating any variables defined in config_variables.yml to instead be exported as env vars in the terminal / in .bashrc or .zshrc?

2 changes: 2 additions & 0 deletions docs/docusaurus/docs/reference/learn/reference_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ import OverviewCard from '@site/src/components/OverviewCard';
## Supplemental Information

<LinkCardGrid>
<LinkCard topIcon label="GX Core to Cloud Migration Guide" description="Learn how to use your GX Core code in GX Cloud" to="/reference/learn/core_to_cloud" icon="/img/convert_icon.svg" />

<LinkCard topIcon label="GX in your data pipeline" description="Learn where GX can be integrated into a data pipeline to manage and monitor data quality" to="/reference/learn/gx_in_your_data_pipeline/gx_in_your_data_pipeline_lp" icon="/img/workflow_icon.svg" />

<LinkCard topIcon label="Data quality use cases" description="Learn how to use GX to address key data quality scenarios" to="/reference/learn/data_quality_use_cases/dq_use_cases_lp" icon="/img/statistics_icon.svg" />
Expand Down
1 change: 1 addition & 0 deletions docs/docusaurus/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,7 @@ module.exports = {
}
],
learn: [
'reference/learn/core_to_cloud',
{
type: 'category',
label: 'GX in your data pipeline',
Expand Down
Loading