Skip to content

Commit c49046a

Browse files
authored
fix: remove deprecated templates (#1066)
* fix: remove deprecated templates Remove deprecated templates and clean up files. * fix: remove python test for deprecated templates * fix: python init file remove reference * fix: clean up python files
1 parent 541c270 commit c49046a

32 files changed

+4
-3164
lines changed

java/README.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Please refer to the [Dataproc Templates (Java - Spark) README](java/README.md) f
1515
* [GCSToGCS](/java/src/main/java/com/google/cloud/dataproc/templates/gcs#5-gcs-to-gcs) (blogpost [link](https://medium.com/@ankuljain/migrate-gcs-to-gcs-using-dataproc-serverless-3b7b0f6ad6b9))
1616
* [GCSToJDBC](/java/src/main/java/com/google/cloud/dataproc/templates/gcs#4-gcs-to-jdbc) (blogpost [link](https://medium.com/google-cloud/importing-data-from-gcs-to-databases-via-jdbc-using-dataproc-serverless-7ed75eab93ba))
1717
* [GCSToSpanner](/java/src/main/java/com/google/cloud/dataproc/templates/gcs#3-gcs-to-spanner) (blogpost [link](https://medium.com/google-cloud/fast-export-large-database-tables-using-gcp-serverless-dataproc-spark-bb32b1260268))
18-
* [GCSToMongo](/java/src/main/java/com/google/cloud/dataproc/templates/gcs#6-gcs-to-mongo) (blogpost [link] (https://medium.com/google-cloud/importing-data-from-gcs-to-mongodb-using-java-dataproc-serverless-6ff5c8d6f6d5))
18+
* [GCSToMongo](/java/src/main/java/com/google/cloud/dataproc/templates/gcs#6-gcs-to-mongo) (blogpost [link](https://medium.com/google-cloud/importing-data-from-gcs-to-mongodb-using-java-dataproc-serverless-6ff5c8d6f6d5))
1919
* [GeneralTemplate](/java/src/main/java/com/google/cloud/dataproc/templates/general)
2020
* [HBaseToGCS](/java/src/main/java/com/google/cloud/dataproc/templates/hbase#1-hbase-to-gcs) (blogpost [link](https://medium.com/google-cloud/using-dataproc-serverless-to-migrate-your-hbase-data-to-gcs-bf1ccf4ab945))
2121
* [HiveToBigQuery](/java/src/main/java/com/google/cloud/dataproc/templates/hive#1-hive-to-bigquery) (blogpost [link](https://medium.com/google-cloud/using-dataproc-serverless-to-migrate-your-hive-data-to-bigquery-8e2d4fcd1c24))
@@ -33,20 +33,17 @@ Please refer to the [Dataproc Templates (Java - Spark) README](java/README.md) f
3333
* [MongoToGCS](/java/src/main/java/com/google/cloud/dataproc/templates/databases#executing-mongo-to-gcs-template) (blogpost [link](https://medium.com/google-cloud/migrating-data-from-mongo-to-gcs-using-java-and-dataproc-serverless-template-390500481804))
3434
* [PubSubToBigQuery](/java/src/main/java/com/google/cloud/dataproc/templates/pubsub#1-pubsub-to-bigquery) (blogpost [link](https://medium.com/google-cloud/from-pub-sub-to-bigquery-streaming-data-in-near-real-time-b550aeff595d))
3535
* [PubSubToBigTable](/java/src/main/java/com/google/cloud/dataproc/templates/pubsub#1-pubsub-to-bigtable) (blogpost [link](https://medium.com/google-cloud/stream-data-from-pub-sub-to-bigtable-using-dataproc-serverless-3142c1bcc22a))
36-
* [PubSubLiteToBigTable](/java/src/main/java/com/google/cloud/dataproc/templates/pubsublite#1-pubsublite-to-bigtable) (blogpost [link](https://medium.com/google-cloud/stream-data-from-pub-sub-lite-to-bigtable-using-dataproc-serverless-2c8816f40581)) **Deprecated and will be removed in Q1 2025**
3736
* [PubSubToGCS](/java/src/main/java/com/google/cloud/dataproc/templates/pubsub/README.md#2-pubsub-to-gcs) (blogpost [link](https://medium.com/google-cloud/stream-data-from-pub-sub-to-cloud-storage-using-dataproc-serverless-7a1e4823926e))
38-
* [RedshiftToGCS](/java/src/main/java/com/google/cloud/dataproc/templates/databases#executing-redshift-to-gcs-template) **Deprecated and will be removed in Q1 2025**
3937
* [S3ToBigQuery](/java/src/main/java/com/google/cloud/dataproc/templates/s3#1-s3-to-bigquery) (blogpost [link](https://medium.com/google-cloud/export-data-from-aws-s3-to-bigquery-using-dataproc-serverless-6dc7a9952fc4))
4038
* [SnowflakeToGCS](/java/src/main/java/com/google/cloud/dataproc/templates/snowflake#1-snowflake-to-gcs) (blogpost [link](https://medium.com/google-cloud/export-snowflake-query-results-to-gcs-using-dataproc-serverless-3d68f5a01ca9))
4139
* [SpannerToGCS](/java/src/main/java/com/google/cloud/dataproc/templates/databases#executing-spanner-to-gcs-template) (blogpost [link](https://medium.com/google-cloud/cloud-spanner-export-query-results-using-dataproc-serverless-6f2f65b583a4))
42-
* [TextToBigquery](/java/src/main/java/com/google/cloud/dataproc/templates/gcs#7-text-to-bigquery) **Deprecated and will be removed in Q1 2025**
4340
* [WordCount](/java/src/main/java/com/google/cloud/dataproc/templates/word/WordCount.java)
4441

4542
...
4643

4744
## Requirements
4845

49-
* Java 8
46+
* Java 11
5047
* Maven 3
5148

5249
## Running Templates

java/pom.xml

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -77,12 +77,10 @@
7777
<shade.skip>false</shade.skip>
7878
<spanner.jdbc.version>2.14.0</spanner.jdbc.version>
7979
<google.cloud.pubsub.version>1.132.0</google.cloud.pubsub.version>
80-
<google.cloud.pubsublite.version>1.14.0</google.cloud.pubsublite.version>
8180
<google.cloud.bigtable.version>2.54.0</google.cloud.bigtable.version>
8281
<google.oauth.client.version>1.36.0</google.oauth.client.version>
8382
<google.api.client.version>2.7.0</google.api.client.version>
8483
<spark.bigquery.connector.version>0.36.4</spark.bigquery.connector.version>
85-
<pubsublite.spark.sql.streaming.version>1.0.0</pubsublite.spark.sql.streaming.version>
8684
<spark.bigtable.connector.version>0.6.0</spark.bigtable.connector.version>
8785
<spark.streaming.pubsub.version>2.4.0</spark.streaming.pubsub.version>
8886

@@ -333,11 +331,6 @@
333331
<artifactId>google-cloud-spanner-jdbc</artifactId>
334332
<version>${spanner.jdbc.version}</version>
335333
</dependency>
336-
<dependency>
337-
<groupId>com.google.cloud</groupId>
338-
<artifactId>google-cloud-pubsublite</artifactId>
339-
<version>${google.cloud.pubsublite.version}</version>
340-
</dependency>
341334
<dependency>
342335
<groupId>com.google.cloud</groupId>
343336
<artifactId>google-cloud-storage</artifactId>
@@ -358,11 +351,6 @@
358351
<version>${spark.bigquery.connector.version}</version>
359352
<scope>provided</scope>
360353
</dependency>
361-
<dependency>
362-
<groupId>com.google.cloud</groupId>
363-
<artifactId>pubsublite-spark-sql-streaming</artifactId>
364-
<version>${pubsublite.spark.sql.streaming.version}</version>
365-
</dependency>
366354
<dependency>
367355
<groupId>com.google.cloud</groupId>
368356
<artifactId>google-cloud-bigtable</artifactId>

java/src/main/java/com/google/cloud/dataproc/templates/databases/README.md

Lines changed: 0 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -173,41 +173,6 @@ You can replace the ```casscon``` with your catalog name if it is passed. This i
173173

174174
Make sure that either ```cassandratobq.input.query``` or both ```cassandratobq.input.keyspace``` and ```cassandratobq.input.table``` is provided. Setting or not setting all three properties at the same time will throw an error.
175175

176-
177-
## Executing Redshift to Cloud Storage template
178-
179-
General Execution:
180-
181-
```
182-
export GCP_PROJECT=<gcp-project-id>
183-
export REGION=<region>
184-
export SUBNET=<subnet>
185-
export GCS_STAGING_LOCATION=<gcs-staging-bucket-folder>
186-
export JARS=gs://<cloud-storage-bucket-name>/spark-redshift_<version>.jar,gs://<cloud-storage-bucket-name>/redshift_jdbc_<version>.jar,gs://<cloud-storage-bucket-name>/minimal_json<version>.jar
187-
188-
bin/start.sh \
189-
-- --template REDSHIFTTOGCS \
190-
--templateProperty project.id=<gcp-project-id> \
191-
--templateProperty redshift.aws.input.url=<jdbc:redshift://host-name:port-number/> \
192-
--templateProperty redshift.aws.input.table=<Redshift-table-name> \
193-
--templateProperty redshift.aws.input.temp.dir=<AWS-temp-directory> \
194-
--templateProperty redshift.aws.input.iam.role=<Redshift-S3-IAM-role> \
195-
--templateProperty redshift.aws.input.access.key=<Access-key> \
196-
--templateProperty redshift.aws.input.secret.key=<Secret-key> \
197-
--templateProperty redshift.gcs.output.file.format=<Output-File-Format> \
198-
--templateProperty redshift.gcs.output.file.location=<Output-GCS-location> \
199-
--templateProperty redshift.gcs.output.mode=<Output-GCS-Save-mode>
200-
```
201-
202-
There are two optional properties as well with "Redshift to Cloud Storage" Template. Please find below the details :-
203-
204-
```
205-
--templateProperty redshift.gcs.temp.table='temporary_view_name'
206-
--templateProperty redshift.gcs.temp.query='select * from global_temp.temporary_view_name'
207-
```
208-
These properties are responsible for applying some spark sql transformations while loading data into Cloud Storage.
209-
The only thing needs to keep in mind is that, the name of the Spark temporary view and the name of table in the query should match exactly. Otherwise, there would be an error as:- "Table or view not found:"
210-
211176
## Executing Mongo to Cloud Storage template
212177

213178
Template for exporting a MongoDB Collection to files in Google Cloud Storage. It supports writing JSON, CSV, Parquet and Avro formats.

java/src/main/java/com/google/cloud/dataproc/templates/databases/RedshiftToGCS.java

Lines changed: 0 additions & 88 deletions
This file was deleted.

java/src/main/java/com/google/cloud/dataproc/templates/databases/RedshiftToGCSConfig.java

Lines changed: 0 additions & 134 deletions
This file was deleted.

java/src/main/java/com/google/cloud/dataproc/templates/gcs/README.md

Lines changed: 1 addition & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -237,38 +237,7 @@ bin/start.sh \
237237
--templateProperty gcs.mongo.output.mode="overwrite"
238238
```
239239

240-
## 7. Text To BigQuery
241-
242-
General Execution:
243-
244-
```
245-
GCP_PROJECT=<gcp-project-id> \
246-
REGION=<region> \
247-
SUBNET=<subnet> \
248-
GCS_STAGING_LOCATION=<gcs-staging-bucket-folder> \
249-
HISTORY_SERVER_CLUSTER=<history-server> \
250-
bin/start.sh \
251-
-- --template TEXTTOBIGQUERY \
252-
--templateProperty project.id=<gcp-project-id> \
253-
--templateProperty text.bigquery.input.location=<gcs path for input file> \
254-
--templateProperty text.bigquery.input.compression=<compression file format like gzip or deflate> \
255-
--templateProperty text.bigquery.input.delimiter=<, for csv> \
256-
--templateProperty text.bigquery.output.dataset=<Big query dataset name> \
257-
--templateProperty text.bigquery.output.table=<Big query table name> \
258-
--templateProperty text.bigquery.output.mode=<Append|Overwrite|ErrorIfExists|Ignore> \
259-
--templateProperty text.bigquery.temp.bucket=<bigquery temp bucket name>
260-
```
261-
262-
There are two optional properties as well with "Text to BigQuery" Template. Please find below the details :-
263-
264-
```
265-
--templateProperty text.bigquery.temp.table='temporary_view_name'
266-
--templateProperty text.bigquery.temp.query='select * from global_temp.temporary_view_name'
267-
```
268-
These properties are responsible for applying some spark sql transformations while loading data into BigQuery.
269-
The only thing needs to keep in mind is that, the name of the Spark temporary view and the name of table in the query should match exactly. Otherwise, there would be an error as:- "Table or view not found:"
270-
271-
## 8. Deltalake To Iceberg
240+
## 7. Deltalake To Iceberg
272241

273242
`deltalake.version.as_of` is an optional parameter which is default set to `0` means we will pick up the latest change only. We are providing below example to show how you can pass the value if you require time travel based on version number.
274243

0 commit comments

Comments
 (0)