Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Crawler to Open Crawler Migration Notebook #404

Merged
merged 5 commits into from
Feb 25, 2025

Conversation

mattnowzari
Copy link
Contributor

@mattnowzari mattnowzari commented Feb 24, 2025

Closes https://github.com/elastic/search-team/issues/8801 and #408

This is a Jupyter Notebook that will help users of Elastic Crawler migrate their crawler configuration to Open Crawler-compatible YAML files.

The design document for this Migration notebook can be found here!

Copy link

gitnotebooks bot commented Feb 24, 2025

Found 1 changed notebook. Review the changes at https://app.gitnotebooks.com/elastic/elasticsearch-labs/pull/404

Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice notebook! Couple nitty copyedits from me, and a few open questions :)

"id": "f4131f88-9895-4c0e-8b0a-6ec7b3b45653",
"metadata": {},
"source": [
"We are going to need a few things from your Elasticsearch deployment before we can migrate your configurations:\n",
Copy link
Contributor

@leemthompo leemthompo Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user is running the Elastic Crawler with Enterprise Search on a self-managed deployment?

This info makes it seem like this is only relevant to Elastic Cloud Hosted users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point! I hadn't considered/remembered that local deployments don't have Cloud IDs (it's in the name! 🤣 )

I will fix this so that it accepts an endpoint + port instead of a Cloud ID, making it compatible with local deployments too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pquentin as Python client expert, I'd ask for confirmation that this approach works equally well for both :)

"id": "67dfc7c6-429e-42f0-ab08-2c84d72945cb",
"metadata": {},
"source": [
"#### **This is the final step! You have two options here:**\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any links we want to provide at the end of the notebook in case people get stuck or want to learn more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have anything directly related to Crawler Migration, but I can link to the Open Crawler repo's Getting Started section, which is a natural "next step" once the configs have been generated!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added final remarks and some URLs to the very final cell of the Notebook

@mattnowzari
Copy link
Contributor Author

Going to merge this in for now as:

  • I am confident that initializing an ES Python client with an endpoint URL + Port is a good approach
  • Liam has signed off on the copy (thank you!)
  • QA actions are being performed against this Notebook this week (see here for the QA steps). As such, technical fixes for this Notebook may yet come about very soon, and those can be separate PRs to address specific issues.

@mattnowzari mattnowzari merged commit 9a5ebf1 into main Feb 25, 2025
5 checks passed
@mattnowzari mattnowzari deleted the opencrawler_migration_nb branch February 25, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants