Elastic Crawler to Open Crawler Migration Notebook #404

mattnowzari · 2025-02-24T16:53:03Z

Closes https://github.com/elastic/search-team/issues/8801 and #408

This is a Jupyter Notebook that will help users of Elastic Crawler migrate their crawler configuration to Open Crawler-compatible YAML files.

The design document for this Migration notebook can be found here!

gitnotebooks · 2025-02-24T16:53:08Z

Found 1 changed notebook. Review the changes at https://app.gitnotebooks.com/elastic/elasticsearch-labs/pull/404

…ase rename

leemthompo

Nice notebook! Couple nitty copyedits from me, and a few open questions :)

notebooks/enterprise-search/elastic-crawler-to-open-crawler-migration.ipynb

leemthompo · 2025-02-25T14:35:47Z

notebooks/enterprise-search/elastic-crawler-to-open-crawler-migration.ipynb

+   "id": "f4131f88-9895-4c0e-8b0a-6ec7b3b45653",
+   "metadata": {},
+   "source": [
+    "We are going to need a few things from your Elasticsearch deployment before we can migrate your configurations:\n",


What if the user is running the Elastic Crawler with Enterprise Search on a self-managed deployment?

This info makes it seem like this is only relevant to Elastic Cloud Hosted users.

Fair point! I hadn't considered/remembered that local deployments don't have Cloud IDs (it's in the name! 🤣 )

I will fix this so that it accepts an endpoint + port instead of a Cloud ID, making it compatible with local deployments too.

@pquentin as Python client expert, I'd ask for confirmation that this approach works equally well for both :)

notebooks/enterprise-search/elastic-crawler-to-open-crawler-migration.ipynb

leemthompo · 2025-02-25T14:41:55Z

notebooks/enterprise-search/elastic-crawler-to-open-crawler-migration.ipynb

+   "id": "67dfc7c6-429e-42f0-ab08-2c84d72945cb",
+   "metadata": {},
+   "source": [
+    "#### **This is the final step! You have two options here:**\n",


Do we have any links we want to provide at the end of the notebook in case people get stuck or want to learn more?

We don't have anything directly related to Crawler Migration, but I can link to the Open Crawler repo's Getting Started section, which is a natural "next step" once the configs have been generated!

I've added final remarks and some URLs to the very final cell of the Notebook

…int URL and port, not Cloud ID

mattnowzari · 2025-02-25T16:54:32Z

Going to merge this in for now as:

I am confident that initializing an ES Python client with an endpoint URL + Port is a good approach
Liam has signed off on the copy (thank you!)
QA actions are being performed against this Notebook this week (see here for the QA steps). As such, technical fixes for this Notebook may yet come about very soon, and those can be separate PRs to address specific issues.

Migration helper notebook for elastic->open Crawler

356f845

mattnowzari mentioned this pull request Feb 24, 2025

Crawler Migration Jupyter Notebook elastic/crawler#218

Closed

7 tasks

Cleanup and more output

c7debe3

mattnowzari mentioned this pull request Feb 25, 2025

Elastic Crawler to Open Crawler Migration Notebook #408

Closed

mattnowzari added 2 commits February 25, 2025 08:38

Added Notebook to testing exemption as no unit tests needed + kebab-c…

9491277

…ase rename

Fixed Colab link to reflect kebab-case name

90f5eed

joemcelroy approved these changes Feb 25, 2025

View reviewed changes

leemthompo reviewed Feb 25, 2025

View reviewed changes

Copyedit fixes + initial connection to ES instance now requires endpo…

266c297

…int URL and port, not Cloud ID

mattnowzari merged commit 9a5ebf1 into main Feb 25, 2025
5 checks passed

mattnowzari deleted the opencrawler_migration_nb branch February 25, 2025 16:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elastic Crawler to Open Crawler Migration Notebook #404

Elastic Crawler to Open Crawler Migration Notebook #404

mattnowzari commented Feb 24, 2025 •

edited

Loading

gitnotebooks bot commented Feb 24, 2025

leemthompo left a comment •

edited

Loading

leemthompo Feb 25, 2025 •

edited

Loading

mattnowzari Feb 25, 2025

leemthompo Feb 25, 2025

leemthompo Feb 25, 2025

mattnowzari Feb 25, 2025

mattnowzari Feb 25, 2025

mattnowzari commented Feb 25, 2025

Elastic Crawler to Open Crawler Migration Notebook #404

Elastic Crawler to Open Crawler Migration Notebook #404

Conversation

mattnowzari commented Feb 24, 2025 • edited Loading

Closes https://github.com/elastic/search-team/issues/8801 and #408

gitnotebooks bot commented Feb 24, 2025

leemthompo left a comment • edited Loading

Choose a reason for hiding this comment

leemthompo Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

mattnowzari Feb 25, 2025

Choose a reason for hiding this comment

leemthompo Feb 25, 2025

Choose a reason for hiding this comment

leemthompo Feb 25, 2025

Choose a reason for hiding this comment

mattnowzari Feb 25, 2025

Choose a reason for hiding this comment

mattnowzari Feb 25, 2025

Choose a reason for hiding this comment

mattnowzari commented Feb 25, 2025

mattnowzari commented Feb 24, 2025 •

edited

Loading

leemthompo left a comment •

edited

Loading

leemthompo Feb 25, 2025 •

edited

Loading