docs(examples): create demos on large data volumes

# Objective

Create two real-world reference projects that showcase Ibis and IbisML at scale.

# Outcomes

* Documented end-to-end ML projects, including:
  * data ingestion
  * data exploration (using Ibis; **stretch**: produce visualizations using existing Ibis integrations)
  * data processing (including feature engineering using Ibis)
  * train-test split (manually using Ibis)
  * last-mile feature preprocessing (using IbisML)
  * handoff to model (approach TBD)
  * modeling (one using Dask-XGBoost on GPU, another using PyTorch)
  * **stretch**: real-time inference
	
  Ideally, these can be written up as (series of) blog posts in the future.
  They can also be submitted to conferences.
  It could be useful to track approximate time needed for each stage of the project (e.g. to confirm whether most time really is spent on feature engineering).

* Lessons learned on model handoff that can inform future work (if any necessary) in that area for IbisML
* Also expect feedback across the rest of the pipeline, but this is where we have the most uncertainty

# Projects

* Lichess live win probability using distributed XGBoost
  * Full dataset size: >12TB
* TBD using PyTorch
* (Backup option) NYC taxi dataset
* (Backup option) Bureau of Transportation Statistics full airline dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(examples): create demos on large data volumes #126

Objective

Outcomes

Projects

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

docs(examples): create demos on large data volumes #126

Description

Objective

Outcomes

Projects

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions