This repository contains code and documentation for use with Google Cloud Dataproc.
codelabs/opencv-haarcascade
provides the source code for the OpenCV Dataproc Codelab, which demonstrates a Spark job that adds facial detection to a set of images.codelabs/spark-bigquery
provides the source code for the PySpark for Preprocessing BigQuery Data Codelab, which demonstrates using PySpark on Cloud Dataproc to process data from BigQuery.codelabs/spark-nlp
provides the source code for the PySpark for Natural Language Processing Codelab, which demonstrates using spark-nlp library for Natural Language Processing.notebooks/ai-ml/
provides source code for Spark for AI/ML use cases, including a PyTorch sample for image classification.notebooks/python
provides example Jupyter notebooks to demonstrate using PySpark with the BigQuery Storage Connector and the Spark GCS Connectorspark-tensorflow
provides an example of using Spark as a preprocessing toolchain for Tensorflow jobs. Optionally, it demonstrates the spark-tensorflow-connector to convert CSV files to TFRecords.spark-translate
provides a simple demo Spark application that translates words using Google's Translation API and running on Cloud Dataproc.gcloud
provides a set of scripts to provision dataproc clusters for use in exercising arbitrary initialization-actions.
See each directories README for more information.
You can find more Dataproc resources in these github repositories:
- Hadoop/Spark GCS Connector
- Spark BigTable Connector
- Spark BigQuery Connector
- Flink BigQuery Connector
- Spark Spanner Connector
- Hive BigQuery Connector
- Hive Bigquery Storage Handler [No Longer Maintained]
- Dataproc JDBC Connector
- Dataproc Python examples
- Dataproc Pubsub Spark Streaming example
- Dataproc Java Bigtable sample
- Dataproc Spark-Bigtable samples
For more information, review the Dataproc
documentation. You can also
pose questions to the Stack
Overflow community
with the tag google-cloud-dataproc
.
See our other Google Cloud Platform github
repos for sample applications and
scaffolding for other frameworks and use cases.
- See CONTRIBUTING.md
- See LICENSE