Skip to content

uploading notebooks for demo session as draft PR#150

Draft
kadolor wants to merge 10 commits into
mainfrom
knowledgebase-dogfooding
Draft

uploading notebooks for demo session as draft PR#150
kadolor wants to merge 10 commits into
mainfrom
knowledgebase-dogfooding

Conversation

@kadolor
Copy link
Copy Markdown
Contributor

@kadolor kadolor commented Apr 17, 2026

Description

Checklist

  • I have tested these changes in Wherobots Cloud
  • Notebooks follow the style guide

For Release PRs (Wherobots Team Only)

If this PR is part of a wbc-images release, tags must be created after merging:

  • I will create tags from main after merge (or tags are not needed for this PR)

Tagging instructions: Both v1 and v2 tags should typically point to the same commit unless you know otherwise.

git checkout main && git pull
git tag v1.X.Y && git tag v2.X.Y-preview
git push origin v1.X.Y v2.X.Y-preview

See CONTRIBUTING.md for details.

kadolor added 5 commits April 14, 2026 18:54
Add check_notebook_self_contained.py that scans .ipynb files for local
file references (open() with relative paths, local image paths in
markdown cells) and blocks commits that would break VS Code extension
compatibility. Update CONTRIBUTING.md with self-contained notebook
requirements and style guide changes for image embedding.
@gitnotebooks
Copy link
Copy Markdown

gitnotebooks Bot commented Apr 17, 2026

Found 16 changed notebooks. Review the changes at https://app.gitnotebooks.com/wherobots/wherobots-examples/pull/150

" \"`Latitude[deg]` AS lat\",\n",
" \"`Longitude[deg]` AS lon\",\n",
" )\n",
"\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i generally prefer this syntax for multi line commands

                                .read
                                .csv(GPS_S3_PATH, header=True, inferSchema=True) 
                                .selectExpr(
                                        "VehId            AS vehicle_id",
                                        "Trip             AS trip_id",
                                        "`Timestamp(ms)`  AS ts_ms",
                                        "`Latitude[deg]`  AS lat",
                                        "`Longitude[deg]` AS lon",
    ))```

" .groupBy(\"vehicle_id\", \"trip_id\")\n",
" .agg(collect_list(struct(\"ts_ms\", \"lat\", \"lon\")).alias(\"coords\"))\n",
" .withColumn(\"geometry\", linestring_udf(\"coords\"))\n",
")\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it helps but this is typically how i generate trips from gps trip_trajectories = sedona.sql(f"""
SELECT
trip_id,
ST_MakeLine(
array_sort(
array_agg(geometry) , (a,b) -> CAST(ST_M(b)-ST_M(a) as INT ))
) AS geometry
FROM
{CATALOG}.{SCHEMA}.{GPS_TABLE}
GROUP BY
trip_id
""")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this requires a POintZM to be created first, like this

             .read
             .format("parquet")
             .load(GPS_BANK)
             .withColumn("delivery_date",lit(date.today()))
             .withColumn("geometry", expr(f"ST_PointZM(x_coord,y_coord,timestamp,timestamp)"))  ##--SPECIAL: Creating a 4D Point!!
            )
trip_data.count()```

Comment thread Business_Cases/backend_warehouse_network_optimization.ipynb Outdated
Comment thread Business_Cases/backend_warehouse_network_optimization.ipynb Outdated
Comment thread Business_Cases/backend_warehouse_network_optimization.ipynb
" COALESCE(z.zone_name, 'IN_TRANSIT') AS zone_name,\n",
" z.zone_type\n",
" FROM pings p\n",
" LEFT JOIN delivery_zones z\n",
Copy link
Copy Markdown
Contributor

@RoboDonut RoboDonut Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for small number of zones like this, i typically broadcast them.

"visits_df.orderBy(\"vehicle_id\", \"trip_id\", \"enter_ts\") \\\n",
" .toPandas() \\\n",
" .to_csv(visits_path, index=False)\n",
"print(f\"Wrote visits to {visits_path}\")\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to include the geom for the BI dashboard.

"geojson_path = \"/tmp/fleet_delivery_zones.geojson\"\n",
"with open(geojson_path, \"w\") as fh:\n",
" json.dump(fc, fh, indent=2)\n",
"print(f\"Wrote {len(features)} zones to {geojson_path}\")"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure we have a geojson writer that removes the need to write like this.

Comment thread Business_Cases/backend_warehouse_network_optimization.ipynb Outdated
" SELECT\n",
" site_id, label,\n",
" ROUND(\n",
" ST_Area(ST_Transform(isochrone, 'EPSG:4326', 'EPSG:3857')) / 1e6, 2\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong CRS. it needs to use a good equal area projection

"## 4. Prepare Demographics — ZCTA Polygons + Synthesized Values\n",
"\n",
"Pull U.S. Census ZCTA polygons intersecting a Bay-Area bbox and attach\n",
"deterministic population / median-income values keyed off the ZCTA ID.\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad bot, the ID has no relation to anything. if its stubbing in values, make them VV fake

" WHERE longitude BETWEEN -123.0 AND -121.5\n",
" AND latitude BETWEEN 37.1 AND 38.2\n",
" AND exists(fsq_category_labels, x -> x LIKE '%Coffee%')\n",
" AND date_closed IS NULL\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to use spatial filter push down and not lon/lat between

Comment on lines +196 to +199
" ST_Buffer(\n",
" geometry,\n",
" SQRT(CAST(FIRE_SIZE AS DOUBLE) * 4046.86 / 3.14159) / 111000.0\n",
" ) AS burn_perim\n",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is trying to convert acres to meters to degrees before buffering.
The meter to degrees conversion is not geodetically correct. The correct approach would be to set useSpheroid parameter to True.
See 3rd paramater in ST_Buffer docs

"iso_path = \"/tmp/site_selection_trade_areas.geojson\"\n",
"with open(iso_path, \"w\") as fh:\n",
" json.dump({\"type\": \"FeatureCollection\", \"features\": iso_features}, fh, indent=2)\n",
"print(f\"Wrote {iso_path} ({len(iso_features)} trade areas)\")"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about the geojson writer.

@kadolor
Copy link
Copy Markdown
Contributor Author

kadolor commented Apr 21, 2026

  • Mark notebooks that still need review
  • Assign reviewers to remaining notebooks
  • Add Visualization to each notebook for increasing customer impact/onset interest
  • Add header images

@rbavery rbavery force-pushed the self-contained-notebook branch from 50e33c2 to 9e77a32 Compare April 24, 2026 19:43
Base automatically changed from self-contained-notebook to main April 24, 2026 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants