Geospatial data is rich, complex, nuanced, and is integral to many businesses, especially the automotive industry. With its high velocity, complex representations, and time-bound nature, unlocking its full value requires a high performance platform.
This Solution Accelerator helps organizations harness real-time geospatial, telematics, and sensor data. The key use cases include primarily road safety and risk prevention; future releases will include smart mobility, EV infrastructure optimization, and driving-based insurance.
These geospatial analytics and AI capabilities on Databricks allow companies to achieve improvements up to 30% in fleet efficiency, reduce infrastructure costs by up to 25%, and enable better road safety by decreasing accident rates by up to 20%.
The content in this repository builds an end-to-end ingestion pipeline for combining multiple datasets and a pipeline to train a collision prediction ML model using hyperparameter tuning and traffic‑volume data with AutoML. We also provide a sample dashboard and a Databricks App. Everything can be quickly deployed using the provided Databricks Asset Bundle files (DAB).
Lakehouse App with Embedded Dashboard
Sample Dashboard
| Dataset | Source Used | Description | Note |
|---|---|---|---|
| Collisions | NYC Open Data | A set of road incidents, including contributing factors. | Refer to Terms of Use for more information. |
| Traffic Volume | NYC Open Data | Historical traffic volume for a set of lat/long. | Refer to Terms of Use for more information. |
| Road Condition | 511ny | Road and traffic conditions. | Refer to Developer Access Agreement for more information |
| Weather | Open-Meteo Weather API | Historical weather data. | Licensed under Attribution 4.0 International (CC BY 4.0) |
| Telematics | Synthetic | A set of rides/drives with driving metrics like acceleration, speed. | Uses dbldatagen for synthetic generation. |
- Eumar Assis
- Himanshu Gupta
- Andres Urrutia
- Michael Johns
- Varun Mahajan
- Eric Lind
- Fareed Aref
- Zachary Ryan
Please note the code in this project is provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects. The source in this project is provided subject to the Databricks License. All included or referenced third party libraries are subject to the licenses set forth below.
Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.
© 2024 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License [https://databricks.com/db-license-source]. All included or referenced third party libraries are subject to the licenses set forth below.
| library | description | license | source |
|---|---|---|---|
| branca | HTML colormap library for leaflet.js | BSD 3‑Clause License | PyPI |
| databricks-sdk | Databricks SDK for interacting with the Databricks REST APIs | Apache 2.0 | PyPI |
| folium | Python wrapper for leaflet.js maps | MIT License | PyPI |
| geopandas | Pandas support for geospatial data | BSD License | PyPI |
| jmespath | JSON matching & extraction library | MIT License | PyPI |
| keplergl | Python wrapper for kepler.gl interactive maps | MIT License | PyPI |
| matplotlib | Comprehensive library for static & interactive visualizations | PSF License | PyPI |
| matplotlib-inline | Matplotlib inline backend for Jupyter notebooks | BSD 3‑Clause License | PyPI |
| mermaid-python | Generate Mermaid diagrams from Python | MIT License | PyPI |
| networkx | Graph creation, manipulation, and study of networks | BSD License | PyPI |
| nbformat | Jupyter notebook format APIs | BSD 3‑Clause License | PyPI |
| numpy | Fundamental package for scientific computing | BSD License | PyPI |
| openmeteo-requests | Python client for the Open‑Meteo weather API | MIT License | PyPI |
| osmnx | Retrieve, model, analyze & visualize OSM networks | MIT License | PyPI |
| pandas | Data structures & data analysis tools | BSD 3‑Clause License | PyPI |
| pgeocode | Postal code geocoding library | BSD 3‑Clause License | PyPI |
| plotly | Interactive plotting library for Python | MIT License | PyPI |
| pyparsing | Text parsing toolkit | MIT License | PyPI |
| requests-cache | Persistent caching for requests HTTP library |
BSD 2‑Clause License | PyPI |
| retry-requests | Automatic retry logic for requests HTTP calls |
MIT License | PyPI |
| scikit-learn | Machine learning in Python | BSD 3‑Clause License | PyPI |
| seaborn | Statistical data visualization | BSD License | PyPI |
| streamlit | Framework for building interactive data apps in Python | Apache 2.0 | PyPI |



