Skip to content

feat: Samples showing Gemini generating code and minor changes to get the code working#1095

Open
sundar-mudupalli-work wants to merge 16 commits intomainfrom
gemini-codegen
Open

feat: Samples showing Gemini generating code and minor changes to get the code working#1095
sundar-mudupalli-work wants to merge 16 commits intomainfrom
gemini-codegen

Conversation

@sundar-mudupalli-work
Copy link
Collaborator

Hi,

This pull request shows how to provide prompts to Gemini to generate code in Python and Java

  1. Hive to BigQuery (Python)
  2. GCS to GCS (Python)
  3. JDBC to JDBC (Java)
  4. DeltaLaketoIceberg (Java)

The code generated by Gemini was committed as is. Some minor changes were needed to get the code working. Those changes were made and committed as well.

Create a PySpark script to tranform data in GCS from parquet to avro and use the
  add_insertion_time_column function in @data_tranformer.py to add an additional column
Code Generated by Gemini for GCS to GCS
Successfully ran serverless Spark job using following command:
gcloud dataproc batches submit pyspark transform_parquet_to_avro.py --batch="parquet-to-avro-$(date +%s)" \
--jars=file:///usr/lib/spark/connector/spark-avro.jar --py-files=./data_transformer.py \
--deps-bucket=gs://dataproc-templates-python-deps \
-- --input=gs://dataproc-templates_cloudbuild/gemini-codegen/transform_parquet_to_avro/input/parquet-table  \
--output=gs://dataproc-templates_cloudbuild/gemini-codegen/transform_parquet_to_avro/output/avro_table
Confirmed by using BigQuery, creating parquet and avro files as external tables.
… an insertion_time column

using the add_insertion_time_column function in @data_tranformer.py. Save this table to BigQuery,
providing detailed instructions to run this script against a dataproc cluster.
Save a summary of this session to hive_to_BQReadme.md
Gemini generated Hive to BQ transformation script
Comments: spark-bigquery comes preinstalled on dataproc clusters version 2.1 and higher, so no jars need
to be provided. The following command worked:
gcloud dataproc jobs submit pyspark gs://dataproc-templates_cloudbuild/gemini-codegen/transform_hive_to_bq/src/transform_hive_to_bigquery.py \
--cluster=mixer-test2 --py-files=gs://dataproc-templates_cloudbuild/gemini-codegen/transform_hive_to_bq/src/data_transformer.py \
--properties=spark.hadoop.hive.metastore.uris=thrift://10.115.64.27:9083 \
-- --hive_database=test_db --hive_table=employees --bq_table=gemini_codegen.py_hive_to_bq \
--bq_temp_gcs_bucket=gs://dataproc-templates-python-deps
@sundar-mudupalli-work
Copy link
Collaborator Author

/gcbrun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments