You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-10Lines changed: 22 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,16 +59,26 @@ Create a **Dedicated** cluster with the Neo4j Spark Connector:
59
59
60
60
### 2. Import the Workshop
61
61
62
-
1. Clone or download this repository
63
-
2. In Databricks, go to **Workspace**
64
-
3. Click **Import** and upload the `labs/` folder
62
+
In Databricks, go to **Workspace** > right-click your user folder > **Import** > **URL** and paste:
63
+
64
+
```
65
+
<DBC_URL>
66
+
```
67
+
68
+
This imports all lab notebooks into your workspace. Data files (CSV, HTML, embeddings) are downloaded automatically from GitHub when you run the setup notebook.
69
+
70
+
> **Alternative:** If you prefer to import manually, clone the repo and use the Databricks CLI:
The **ML Runtime** is recommended because it includes `pyyaml`, `neo4j`, and `beautifulsoup4`. If using a standard (non-ML) runtime, install these Python packages as cluster libraries:
141
+
The **ML Runtime** is recommended because it includes `neo4j` and `beautifulsoup4`. If using a standard (non-ML) runtime, install these Python packages as cluster libraries:
-`python -m cli upload` pushes Python files to the Databricks workspace via the Databricks SDK
110
-
-`python -m cli submit` checks that the cluster is RUNNING (errors if not), injects Neo4j credentials from `.env` as command-line arguments, and submits a one-shot job via the SDK Jobs API
111
-
- Each script uses `argparse` to receive credentials and prints PASS/FAIL for each verification check
110
+
-`python -m cli submit` checks that the cluster is RUNNING (auto-starts if terminated), passes all non-core `.env`keys as `KEY=VALUE` parameters, and submits a one-shot job via the SDK Jobs API
111
+
- Each script parses `KEY=VALUE` parameters from `sys.argv` into `os.environ` at startup, then reads configuration via `os.environ` / `os.getenv()`
112
112
- Scripts exit with code 0 on success, code 1 on any failure
113
113
-`python -m cli clean` removes the remote workspace directory and deletes job runs matching the `graph_validation:` prefix
parser=argparse.ArgumentParser(description="Generate pre-computed embeddings for workshop HTML files")
210
-
parser.add_argument("--volume-path", required=True, help="Unity Catalog Volume path containing HTML files")
211
-
parser.add_argument("--output-path", default=None, help="Output path for JSON file (defaults to volume-path/embeddings/document_chunks_embedded.json)")
212
-
parser.add_argument("--endpoint", default="databricks-gte-large-en", help="Databricks embedding model endpoint")
0 commit comments