-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[DOCS] GX Core to Cloud Migration Guide #11730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
9284b86
22878e9
8c07b2e
4d08948
0620e01
0dd1930
8c7a13d
3364ebc
05a5bbd
05d4def
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,99 @@ | ||||||
| --- | ||||||
| id: core_to_cloud | ||||||
| sidebar_label: 'GX Core to GX Cloud Migration Guide' | ||||||
| title: "GX Core to GX Cloud Migration Guide" | ||||||
| --- | ||||||
|
|
||||||
| ## Overview | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's unusual to have an "Overview" h2 right after the page title h1. As far as I can tell, no other pages in the docs currently do this. I suggest removing this header for consistency and concision.
Suggested change
|
||||||
|
|
||||||
| This guide will enable you to migrate your GX Core configuration to a GX Cloud organization. Since GX Cloud is built on top of GX Core, the code you used to originally set up your GX Core configuration can be reused for setting up your GX Cloud organization. | ||||||
|
|
||||||
| The key difference between using GX Core and GX Cloud is the Data Context. By setting the mode of your Data Context to `cloud` and then providing the appropriate credentials, you will be able to connect to your GX Cloud organization. Once you have created a Cloud Data Context, the rest of the code you have already written to configure your GX entities, such as Data Sources, Data Assets, and Expectations, can be re-run to migrate your existing configuration into GX Cloud. Similarly, any code that you have written to run validations, including Custom Actions, can also be reused. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. re: "The key difference between using GX Core and GX Cloud is the Data Context." This is focused on technical details of how Core and Cloud work from a code perspective and obfuscates the differences in the business value they provide. This page seems to assume that people already know they want to migrate. That might be the case for users/prospects that we send this article to. But folks who come across the page organically would benefit from some high-level info about why someone might want to migrate from Core to Cloud in the first place. The value proposition for Cloud on this page doesn't have to be super in-depth. I plan on eventually adding a dedicated Cloud vs. Core value comparison where we can provide all the details. But I think it would be good to mention broadly the kind of business value folks can get from Cloud but not Core and/or highlight a few specific features that are available in Cloud but not Core. |
||||||
|
|
||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this page should mention a prerequisite of having a Cloud account with workspace Editor permissions or greater. Folks who have been using Core have likely already fulfilled the other prereqs we typically mention for Cloud API workflows (e.g. Python version 3.10 to 3.13.) so I feel ok about omitting those. |
||||||
| ## Examples | ||||||
|
|
||||||
| ### Configuration Setup | ||||||
|
Comment on lines
+13
to
+15
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a little unusual for headings to butt up against each other like this with no content in between. I suggest either
Comment on lines
+13
to
+15
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For sections that provide instructions, I try to have the headers use imperative verbs. This helps readers know that there is a task to complete in the section. This section of content is a bit unusual in that it both provides conceptual background info (about how you might have set things up in Core) and instructions (about how to replicate things in Cloud) This might need some supporting edits to the phrasing of the content in the section, but what would you think about having this header be something like
or
or
? |
||||||
| In the example below, a File Data Context has been created, along with a Postgres Data Source and Data Asset. | ||||||
|
|
||||||
| ```python | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All code blocks should have titles / language identifiers applied with This standard exists in part because of this bug |
||||||
| import great_expectations as gx | ||||||
|
|
||||||
| context = gx.get_context(mode="file") | ||||||
| ds = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db") | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've suggested a typo fix and changing the port since 5432 is the standard postgresql port and ports 0-1023 are operationally special.
Suggested change
|
||||||
| asset = ds.add_table_asset(table_name="sample_table", name="sample_table") | ||||||
|
|
||||||
| bd = asset.add_batch_definition_whole_table( | ||||||
| name="FULL_TABLE" | ||||||
| ) | ||||||
|
|
||||||
| suite = context.suites.get("my_suite") | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We assume the suite exists? Everything else we are creating from scratch. We could create a suite and add expectations to it. That would resolve @klavavej's comment below since then we'd be creating it in both places. |
||||||
|
|
||||||
| validation_definition = gx.ValidationDefinition( | ||||||
| data=bd, suite=suite, name="Validation Definition" | ||||||
| ) | ||||||
| context.validation_definitions.add(validation_definition) | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we also want to create a checkpoint? That is the final domain object that a user needs. In running validations section below you do a |
||||||
| ``` | ||||||
|
|
||||||
| In order to recreate these same entities in GX Cloud, create a Cloud Data Context by setting the mode to `cloud` and providing your [GX Cloud Credentials](/cloud/connect/connect_python.md#get-your-credentials) for the `GX_CLOUD_ORGANIZATION_ID`, `GX_CLOUD_WORKSPACE_ID` and `GX_CLOUD_ACCESS_TOKEN` environment variables. The rest of the code remains unchanged. | ||||||
|
|
||||||
| ```python | ||||||
| import great_expectations as gx | ||||||
| import os | ||||||
|
|
||||||
| os.environ["GX_CLOUD_ORGANIZATION_ID"] = "<YOUR_GX_CLOUD_ORGANIZATION_ID>" | ||||||
| os.environ["GX_CLOUD_WORKSPACE_ID"] = "<YOUR_GX_CLOUD_WORKSPACE_ID>" | ||||||
| os.environ["GX_CLOUD_ACCESS_TOKEN"] = "<YOUR_GX_CLOUD_ACCESS_TOKEN>" | ||||||
|
Comment on lines
+41
to
+45
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. re: this comment
You can refer to the recently added Fabric and SQL Server docs for how we handle this elsewhere instead of demonstrating setting env vars in python code, we typically show setting them in terminal with a suggestion to add to
Then, when you put the python code sample under test by registering it in
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, if you want snippets to run you will need to put them in a |
||||||
|
|
||||||
| context = gx.get_context(mode="cloud") | ||||||
| data_source = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db") | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| asset = data_source.add_table_asset(table_name="sample_table", name="sample_table") | ||||||
|
|
||||||
| batch_definition = asset.add_batch_definition_whole_table( | ||||||
| name="FULL_TABLE" | ||||||
| ) | ||||||
|
|
||||||
| suite = context.suites.get("my_suite") | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If someone is migrating from Core to a totally fresh Cloud organization, they won't have any Expectation Suites to get. When I first skimmed the code samples, I had an impression that I'd be able to essentially copy/paste an Expectation Suite from Core to Cloud with all of my pre-existing Expectation configurations. But, then when I ran the code samples, this line gave me an error I see that I can change the Expectation Suite name to one that exists in my cloud org. But, users who are migrating for the first time won't have any Cloud Expectation Suites to get
I asked chatgpt and got the following code for copying Expectations from a Core Expectation Suite and replicating them in a Cloud Expectation Suite. I tried it and it seemed to work. Can we include a workflow like this in this migration guide for migrating configured Expectations from Core? I see that the page intro says
But in the case of Expectations, I'm not sure how easy it will be for people to re-find the code they used to configure Expectations in the first place so that they can re-run the code in a Cloud context. An Expectation Suite can be added to over time, so the code that created all the Expectations in a given Suite may be spread out over various people and over time. It seems like it would remove a lot of friction for users if we could provide a way to copy the current state of a Core Expectation Suite and recreate it in Cloud.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. demo video of the workflow I'm suggesting (too big for github so posted in slack) https://greatexpectationslabs.slack.com/archives/C03B8DZCJ07/p1774033457491619
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the user has a file context and wants to create a cloud context the should be able to copy the entities over from one to the other.
Unfortunately, this doesn't seem to work with data sources so might be confusing to document in general. This would be a different strategy but could probably be scripted so a user could run it to move everything over. |
||||||
|
|
||||||
| validation_definition = gx.ValidationDefinition( | ||||||
| data=batch_definition, suite=suite, name="Validation Definition" | ||||||
| ) | ||||||
| context.validation_definitions.add(validation_definition) | ||||||
| ``` | ||||||
|
|
||||||
| Running this script will now create the same Data Source, Data Asset, Batch Definition, and Validation Definition in your GX Cloud organization. | ||||||
|
|
||||||
| ### Running Validations | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. similar to another note, I try to make headers for instructional sections use imperative verbs. Again, this might require some rephrasing of the section contents, but what would you think about changing this header to something like
or
? |
||||||
|
|
||||||
| The code snippet below runs a Checkpoint in an existing GX Core configuration. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a disconnect here in that the "Configuration Setup" section shows how to migrate a Validation Definition but not how to migrate a Checkpoint. Then this "Running Validations" section shows how to run a Checkpoint. I worry that users will get tripped up by this - that they'll try the Cloud code sample for running a Checkpoint, have it fail, and not understand why. To close the gap between the two sections (so users can copy/paste code samples, run them in order, and have them succeed) I think the "Configuration Setup" section should show how to migrate a Checkpoint OR the "Running Validations" section should show how to run a Validation Definition. |
||||||
|
|
||||||
| ```python | ||||||
| import great_expectations as gx | ||||||
|
|
||||||
| context = gx.get_context("file") | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I got an error when I tried to run this line, looks like
Suggested change
|
||||||
| checkpoint = context.checkpoints.get("My Checkpoint") | ||||||
| checkpoint.run() | ||||||
| ``` | ||||||
|
|
||||||
| In order to execute a checkpoint within your GX Cloud organization, the same code snippet can be used. In the same way as the previous example, set the mode of your data context to `cloud` and provide your [GX Cloud Credentials](/cloud/connect/connect_python.md#get-your-credentials) for the `GX_CLOUD_ORGANIZATION_ID`, `GX_CLOUD_WORKSPACE_ID` and `GX_CLOUD_ACCESS_TOKEN` environment variables. | ||||||
|
|
||||||
| ```python | ||||||
| import great_expectations as gx | ||||||
| import os | ||||||
|
|
||||||
| os.environ["GX_CLOUD_ORGANIZATION_ID"] = "<YOUR_GX_CLOUD_ORGANIZATION_ID>" | ||||||
| os.environ["GX_CLOUD_WORKSPACE_ID"] = "<YOUR_GX_CLOUD_WORKSPACE_ID>" | ||||||
| os.environ["GX_CLOUD_ACCESS_TOKEN"] = "<YOUR_GX_CLOUD_ACCESS_TOKEN>" | ||||||
|
|
||||||
| context = gx.get_context(mode="cloud") | ||||||
| checkpoint = context.checkpoints.get("My Checkpoint") | ||||||
| checkpoint.run() | ||||||
| ``` | ||||||
|
|
||||||
| Your GX Cloud checkpoint can now be used to run validations wherever needed, such as within your data pipelines. | ||||||
|
|
||||||
| ## Limitations | ||||||
| Some common limitations of migrating from GX Core to GX Cloud are listed below. Refer to the [compatibility reference](/help/compatibility_reference.md) for a comprehensive list of limitations. | ||||||
|
Comment on lines
+94
to
+95
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The section intro says "Some common limitations of migrating from GX Core to GX Cloud are listed below." but the items in the list are limitations of Cloud itself rather than limitations on the act of migrating. There probably are limitations on the act of migrating that should be communicated in this page. The main one that comes to mind is that historical results from validations run with Core cannot be migrated to be visible in the Cloud UI. The framing of "limitations" here makes me nervous because it seems to present Core as more fully-featured than Cloud and obfuscates the additional value that's available in Cloud but not Core. rhetorical question - Why would a user want to move from Core to Cloud if they lose the three items below and gain nothing? The three items below where Core does support some options that are not available in Cloud should still be communicated, but I think they should be framed in a different way. Perhaps just changing from "limitations" language to language more like "Some options that are available in GX Core are not available in GX Cloud" with more of an explanation of why the option isn't available / doesn't make sense with Cloud would be enough to present this info in a more positive light. And to further highlight the value of Cloud / the reason why someone should take the time to complete the steps in this guide, I think it would be good to add a "next steps" section that highlights Cloud-only features the user should explore after migrating their Data Sources and Data Assets (for example, ExpectAI and alerts) |
||||||
|
|
||||||
| - Some Data Sources that are supported in GX Core may not be supported in GX Cloud. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sounds overly broad given that it's only SQLite that's not supported for Cloud. There's an old resolved comment in the Data Sources Painted Doors PRD that explains why SQLite isn't available for Cloud and why this shouldn't surprise any SQLite users
I think the info about Data Source support should be explicit that it's only SQLite that's not supported (and maybe also include the reason why) so that we don't create undue fear/uncertainty/doubt that a user's key Data Sources they've been using with Core won't be supported in Cloud. |
||||||
| - Any Custom Expectations that you have created are not compatible with GX Cloud. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if someone tries to use a Custom Expectation with Cloud? If there are other non-custom Expectations in an Expectation Suite with a Custom Expectation, does the custom one just no-op while the rest are validated normally? Or does the whole validation run fail to execute? I'm not trying to say that we need to explicitly document this behavior, I'm just wanting to know what happens so I can see if we should let people know about the lack of Cloud support for Custom Expectations earlier on the page. |
||||||
| - Credentials stored in the `config_variables.yml` file are not supported in GX Cloud. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If folks were using |
||||||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This page should have a
description:in the frontmatter for SEO