Skip to content

GoogleCloudDataproc/cloud-dataproc

Repository files navigation

Google Cloud Dataproc

This repository contains code and documentation for use with Google Cloud Dataproc.

Samples in this Repository

  • codelabs/opencv-haarcascade provides the source code for the OpenCV Dataproc Codelab, which demonstrates a Spark job that adds facial detection to a set of images.
  • codelabs/spark-bigquery provides the source code for the PySpark for Preprocessing BigQuery Data Codelab, which demonstrates using PySpark on Cloud Dataproc to process data from BigQuery.
  • codelabs/spark-nlp provides the source code for the PySpark for Natural Language Processing Codelab, which demonstrates using spark-nlp library for Natural Language Processing.
  • notebooks/ai-ml/ provides source code for Spark for AI/ML use cases, including a PyTorch sample for image classification.
  • notebooks/python provides example Jupyter notebooks to demonstrate using PySpark with the BigQuery Storage Connector and the Spark GCS Connector
  • spark-tensorflow provides an example of using Spark as a preprocessing toolchain for Tensorflow jobs. Optionally, it demonstrates the spark-tensorflow-connector to convert CSV files to TFRecords.
  • spark-translate provides a simple demo Spark application that translates words using Google's Translation API and running on Cloud Dataproc.
  • gcloud provides a set of scripts to provision dataproc clusters for use in exercising arbitrary initialization-actions.

See each directories README for more information.

Additional Dataproc Repositories

You can find more Dataproc resources in these github repositories:

Dataproc projects

Connectors

Examples

For more information

For more information, review the Dataproc documentation. You can also pose questions to the Stack Overflow community with the tag google-cloud-dataproc. See our other Google Cloud Platform github repos for sample applications and scaffolding for other frameworks and use cases.

Contributing changes

Licensing

About

Cloud Dataproc: Samples and Utils

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 33