Skip to content

Commit c8bf023

Browse files
authored
Support for TFX 1.4, TF 2.6, Beam 2.33.0 (#58)
* ermoved used _float_feature * Updated dependencies * updated tfdv notebook * updated apache beam example * updated components * updated interactive notebook, working with tfx 1.4 * airflow updates * updated beam pipeline example * kubeflow updates * updated beam_arg * added vertex example * renamed to vertex * Updated readme
1 parent 35a545a commit c8bf023

File tree

20 files changed

+24062
-2795
lines changed

20 files changed

+24062
-2795
lines changed

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
Code repository for the O'Reilly publication ["Building Machine Learning Pipelines"](http://www.buildingmlpipelines.com) by Hannes Hapke & Catherine Nelson
44

5+
## Update
6+
7+
* The example code has been updated to work with TFX 1.4.0, TensorFlow 2.6.1, and Apache Beam 2.33.0. A GCP Vertex example (training and serving) was added.
8+
59
## Set up the demo project
610

711
Download the initial dataset. From the root of this repository, execute
@@ -63,7 +67,9 @@ Chapter 14. Code for training a differentially private version of the demo proje
6367

6468
The code was written and tested for version 0.22.
6569

70+
- As of 11/23/21, the examples have been updated to support TFX 1.4.0, TensorFlow 2.6.1, and Apache Beam 2.33.0. A GCP Vertex example (training and serving) was added.
71+
6672
- As of 9/22/20, the interactive pipeline runs on TFX version 0.24.0rc1.
6773
Due to tiny TFX bugs, the pipelines currently don't work on the releases 0.23 and 0.24-rc0. Github issues have been filed with the TFX team specifically for the book pipelines ([Issue 2500](https://github.com/tensorflow/tfx/issues/2500#issuecomment-695363847)). We will update the repository once the issue is resolved.
6874

69-
- As of 9/14/20, TFX only supports Python 3.8 with version >0.24.0rc0.
75+
- As of 9/14/20, TFX only supports Python 3.8 with version >0.24.0rc0.

chapters/data_ingestion/convert_data_to_tfrecords.py

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,7 @@
88

99

1010
def _bytes_feature(value):
11-
return tf.train.Feature(
12-
bytes_list=tf.train.BytesList(value=[value.encode()])
13-
)
14-
15-
16-
def _float_feature(value):
17-
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
11+
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value.encode()]))
1812

1913

2014
def _int64_feature(value):
@@ -26,12 +20,13 @@ def clean_rows(row):
2620
row["zip_code"] = "99999"
2721
return row
2822

23+
2924
def convert_zipcode_to_int(zipcode):
3025
if isinstance(zipcode, str) and "XX" in zipcode:
3126
zipcode = zipcode.replace("XX", "00")
3227
int_zipcode = int(zipcode)
3328
return int_zipcode
34-
29+
3530

3631
original_data_file = "../../data/consumer_complaints_with_narrative.csv"
3732
tfrecords_filename = "consumer-complaints.tfrecords"
@@ -53,9 +48,7 @@ def convert_zipcode_to_int(zipcode):
5348
"company": _bytes_feature(row["company"]),
5449
"company_response": _bytes_feature(row["company_response"]),
5550
"timely_response": _bytes_feature(row["timely_response"]),
56-
"consumer_disputed": _bytes_feature(
57-
row["consumer_disputed"]
58-
),
51+
"consumer_disputed": _bytes_feature(row["consumer_disputed"]),
5952
}
6053
)
6154
)

0 commit comments

Comments
 (0)