[DOCS] GX Core to Cloud Migration Guide#11730
Conversation
✅ Deploy Preview for niobium-lead-7998 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #11730 +/- ##
========================================
Coverage 84.66% 84.66%
========================================
Files 471 471
Lines 39170 39170
========================================
Hits 33165 33165
Misses 6005 6005 Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I couldn't figure out how to add these snippets to test while including the lines for setting up the environment variables. |
klavavej
left a comment
There was a problem hiding this comment.
Heads up that while I included a few fine-grained notes, I didn’t proofread closely for grammar / tone / clarity yet because I’m asking for such major changes to the framing, contents, and flow of the page. I will look at grammar and other details more closely in the next round of review
|
|
||
| This guide will enable you to migrate your GX Core configuration to a GX Cloud organization. Since GX Cloud is built on top of GX Core, the code you used to originally set up your GX Core configuration can be reused for setting up your GX Cloud organization. | ||
|
|
||
| The key difference between using GX Core and GX Cloud is the Data Context. By setting the mode of your Data Context to `cloud` and then providing the appropriate credentials, you will be able to connect to your GX Cloud organization. Once you have created a Cloud Data Context, the rest of the code you have already written to configure your GX entities, such as Data Sources, Data Assets, and Expectations, can be re-run to migrate your existing configuration into GX Cloud. Similarly, any code that you have written to run validations, including Custom Actions, can also be reused. |
There was a problem hiding this comment.
re: "The key difference between using GX Core and GX Cloud is the Data Context."
This is focused on technical details of how Core and Cloud work from a code perspective and obfuscates the differences in the business value they provide.
This page seems to assume that people already know they want to migrate. That might be the case for users/prospects that we send this article to. But folks who come across the page organically would benefit from some high-level info about why someone might want to migrate from Core to Cloud in the first place.
The value proposition for Cloud on this page doesn't have to be super in-depth. I plan on eventually adding a dedicated Cloud vs. Core value comparison where we can provide all the details. But I think it would be good to mention broadly the kind of business value folks can get from Cloud but not Core and/or highlight a few specific features that are available in Cloud but not Core.
| title: "GX Core to GX Cloud Migration Guide" | ||
| --- | ||
|
|
||
| ## Overview |
There was a problem hiding this comment.
It's unusual to have an "Overview" h2 right after the page title h1. As far as I can tell, no other pages in the docs currently do this. I suggest removing this header for consistency and concision.
| ## Overview |
| --- | ||
| id: core_to_cloud | ||
| sidebar_label: 'GX Core to GX Cloud Migration Guide' | ||
| title: "GX Core to GX Cloud Migration Guide" |
There was a problem hiding this comment.
This page should have a description: in the frontmatter for SEO
| ### Configuration Setup | ||
| In the example below, a File Data Context has been created, along with a Postgres Data Source and Data Asset. | ||
|
|
||
| ```python |
There was a problem hiding this comment.
All code blocks should have titles / language identifiers applied with title=""
This standard exists in part because of this bug
| ## Limitations | ||
| Some common limitations of migrating from GX Core to GX Cloud are listed below. Refer to the [compatibility reference](/help/compatibility_reference.md) for a comprehensive list of limitations. |
There was a problem hiding this comment.
The section intro says "Some common limitations of migrating from GX Core to GX Cloud are listed below." but the items in the list are limitations of Cloud itself rather than limitations on the act of migrating. There probably are limitations on the act of migrating that should be communicated in this page. The main one that comes to mind is that historical results from validations run with Core cannot be migrated to be visible in the Cloud UI.
The framing of "limitations" here makes me nervous because it seems to present Core as more fully-featured than Cloud and obfuscates the additional value that's available in Cloud but not Core. rhetorical question - Why would a user want to move from Core to Cloud if they lose the three items below and gain nothing?
The three items below where Core does support some options that are not available in Cloud should still be communicated, but I think they should be framed in a different way. Perhaps just changing from "limitations" language to language more like "Some options that are available in GX Core are not available in GX Cloud" with more of an explanation of why the option isn't available / doesn't make sense with Cloud would be enough to present this info in a more positive light.
And to further highlight the value of Cloud / the reason why someone should take the time to complete the steps in this guide, I think it would be good to add a "next steps" section that highlights Cloud-only features the user should explore after migrating their Data Sources and Data Assets (for example, ExpectAI and alerts)
| This guide will enable you to migrate your GX Core configuration to a GX Cloud organization. Since GX Cloud is built on top of GX Core, the code you used to originally set up your GX Core configuration can be reused for setting up your GX Cloud organization. | ||
|
|
||
| The key difference between using GX Core and GX Cloud is the Data Context. By setting the mode of your Data Context to `cloud` and then providing the appropriate credentials, you will be able to connect to your GX Cloud organization. Once you have created a Cloud Data Context, the rest of the code you have already written to configure your GX entities, such as Data Sources, Data Assets, and Expectations, can be re-run to migrate your existing configuration into GX Cloud. Similarly, any code that you have written to run validations, including Custom Actions, can also be reused. | ||
|
|
There was a problem hiding this comment.
I think this page should mention a prerequisite of having a Cloud account with workspace Editor permissions or greater.
Folks who have been using Core have likely already fulfilled the other prereqs we typically mention for Cloud API workflows (e.g. Python version 3.10 to 3.13.) so I feel ok about omitting those.
| name="FULL_TABLE" | ||
| ) | ||
|
|
||
| suite = context.suites.get("my_suite") |
There was a problem hiding this comment.
If someone is migrating from Core to a totally fresh Cloud organization, they won't have any Expectation Suites to get. When I first skimmed the code samples, I had an impression that I'd be able to essentially copy/paste an Expectation Suite from Core to Cloud with all of my pre-existing Expectation configurations. But, then when I ran the code samples, this line gave me an error
I see that I can change the Expectation Suite name to one that exists in my cloud org. But, users who are migrating for the first time won't have any Cloud Expectation Suites to get
I asked chatgpt and got the following code for copying Expectations from a Core Expectation Suite and replicating them in a Cloud Expectation Suite. I tried it and it seemed to work. Can we include a workflow like this in this migration guide for migrating configured Expectations from Core?
import copy
context = gx.get_context(mode="file")
source_suite = context.suites.get("my_core_suite")
context = gx.get_context(mode="cloud")
target_suite = context.suites.add(gx.ExpectationSuite(name="my_cloud_suite")
)
for exp in source_suite.expectations:
exp_copy = copy.copy(exp)
exp_copy.id = None #
target_suite.add_expectation(exp_copy)
I see that the page intro says
Once you have created a Cloud Data Context, the rest of the code you have already written to configure your GX entities, such as Data Sources, Data Assets, and Expectations, can be re-run to migrate your existing configuration into GX Cloud.
But in the case of Expectations, I'm not sure how easy it will be for people to re-find the code they used to configure Expectations in the first place so that they can re-run the code in a Cloud context. An Expectation Suite can be added to over time, so the code that created all the Expectations in a given Suite may be spread out over various people and over time. It seems like it would remove a lot of friction for users if we could provide a way to copy the current state of a Core Expectation Suite and recreate it in Cloud.
There was a problem hiding this comment.
demo video of the workflow I'm suggesting (too big for github so posted in slack)
https://greatexpectationslabs.slack.com/archives/C03B8DZCJ07/p1774033457491619
There was a problem hiding this comment.
If the user has a file context and wants to create a cloud context the should be able to copy the entities over from one to the other.
There are 2 ways to do this:
- Loop over data sources, assets, suites, expectations, etc and copy the entities over one by one. This is basically what @klavavej is proposing here. You could do the same for data sources, data assets, etc.
- For most entities, I think you can copy the top level one over:
file_context = gx.get_context(mode="file")
cloud_context = gx.get_context(mode="cloud")
# This will copy the suite which includes all it's expectations
cloud_context.suites.add(file_context.suites.get("my_suite"))
Unfortunately, this doesn't seem to work with data sources so might be confusing to document in general.
This would be a different strategy but could probably be scripted so a user could run it to move everything over.
| ## Examples | ||
|
|
||
| ### Configuration Setup |
There was a problem hiding this comment.
It's a little unusual for headings to butt up against each other like this with no content in between.
I suggest either
- just removing the "Examples" header
- or, adding a little content under the "Examples" header that introduces its subsections.
| ## Examples | ||
|
|
||
| ### Configuration Setup |
There was a problem hiding this comment.
For sections that provide instructions, I try to have the headers use imperative verbs. This helps readers know that there is a task to complete in the section.
This section of content is a bit unusual in that it both provides conceptual background info (about how you might have set things up in Core) and instructions (about how to replicate things in Cloud)
This might need some supporting edits to the phrasing of the content in the section, but what would you think about having this header be something like
Migrate entities
or
Re-create a Data Asset
or
Add a Data Asset
?
|
|
||
| Running this script will now create the same Data Source, Data Asset, Batch Definition, and Validation Definition in your GX Cloud organization. | ||
|
|
||
| ### Running Validations |
There was a problem hiding this comment.
similar to another note, I try to make headers for instructional sections use imperative verbs.
Again, this might require some rephrasing of the section contents, but what would you think about changing this header to something like
Run a Validation
or
Validate a migrated Data Asset
?
billdirks
left a comment
There was a problem hiding this comment.
Thanks for putting this together.
I've left a few comments. One I would call out is a response to Kristen's where she talks about using suite.get on the cloud context. In there is laid out a different strategy than what these docs put forward. These docs focus on manually running the same script to create entities in cloud that were used for a file system. That is great if that script exists. If there isn't a script, one can migrate by examining the file context and copy the entities over from there, eg
for expectation in file_suite.expectations:
cloud_suite.add(expectation)
You could do something similar for data sources and assets by looking at the name and configuration information in the file context and using that to create ones in the cloud context.
Happy to discuss synchronously if that would be useful.
| import great_expectations as gx | ||
|
|
||
| context = gx.get_context(mode="file") | ||
| ds = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db") |
There was a problem hiding this comment.
I've suggested a typo fix and changing the port since 5432 is the standard postgresql port and ports 0-1023 are operationally special.
| ds = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db") | |
| ds = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:password@myhost.domain>:5432>/sample_db") |
| import os | ||
|
|
||
| os.environ["GX_CLOUD_ORGANIZATION_ID"] = "<YOUR_GX_CLOUD_ORGANIZATION_ID>" | ||
| os.environ["GX_CLOUD_WORKSPACE_ID"] = "<YOUR_GX_CLOUD_WORKSPACE_ID>" | ||
| os.environ["GX_CLOUD_ACCESS_TOKEN"] = "<YOUR_GX_CLOUD_ACCESS_TOKEN>" |
There was a problem hiding this comment.
Yes, if you want snippets to run you will need to put them in a .py file and register them in examples_under_test.py. The backend_dependencies=[BackendDependencies.CLOUD] is required for cloud tests. In case people are curious, that will make the tests start a local cloud backend in a docker container. The tests will run using this docker container.
| os.environ["GX_CLOUD_ACCESS_TOKEN"] = "<YOUR_GX_CLOUD_ACCESS_TOKEN>" | ||
|
|
||
| context = gx.get_context(mode="cloud") | ||
| data_source = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db") |
There was a problem hiding this comment.
| data_source = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:passowrd@myhost.domain>:443>/sample_db") | |
| data_source = context.data_sources.add_sql(name="Postgres DB", connection_string="postgresql+psycopg2://username:password@myhost.domain>:5432>/sample_db") |
| name="FULL_TABLE" | ||
| ) | ||
|
|
||
| suite = context.suites.get("my_suite") |
There was a problem hiding this comment.
We assume the suite exists? Everything else we are creating from scratch. We could create a suite and add expectations to it. That would resolve @klavavej's comment below since then we'd be creating it in both places.
| validation_definition = gx.ValidationDefinition( | ||
| data=bd, suite=suite, name="Validation Definition" | ||
| ) | ||
| context.validation_definitions.add(validation_definition) |
There was a problem hiding this comment.
I think we also want to create a checkpoint? That is the final domain object that a user needs. In running validations section below you do a get which means it must exists.
| name="FULL_TABLE" | ||
| ) | ||
|
|
||
| suite = context.suites.get("my_suite") |
There was a problem hiding this comment.
If the user has a file context and wants to create a cloud context the should be able to copy the entities over from one to the other.
There are 2 ways to do this:
- Loop over data sources, assets, suites, expectations, etc and copy the entities over one by one. This is basically what @klavavej is proposing here. You could do the same for data sources, data assets, etc.
- For most entities, I think you can copy the top level one over:
file_context = gx.get_context(mode="file")
cloud_context = gx.get_context(mode="cloud")
# This will copy the suite which includes all it's expectations
cloud_context.suites.add(file_context.suites.get("my_suite"))
Unfortunately, this doesn't seem to work with data sources so might be confusing to document in general.
This would be a different strategy but could probably be scripted so a user could run it to move everything over.
|
Is this PR still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions 🙇 |
This is to provide high level guidance for migrating from Core to Cloud. In this initial pass, the goal is to show how you can do the basics of creating the same objects that you have in Core in Cloud using the same code, as well as run validations. This piece of documentation can be updated later as needed to show other use cases.
invoke lint(usesruff format+ruff check)For more information about contributing, visit our community resources.
After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!