Skip to content

Commit 625ec75

Browse files
cantoniostf-text-github-robot
authored andcommitted
Update tensorflow patch to allow tf-text nightlies to build.
The patch was out-dated. Applied the changes manually from the latest nightly and re-exported the patch. FUTURE_COPYBARA_INTEGRATE_REVIEW=#1266 from 8bitmp3:fix-transformer fca8889 PiperOrigin-RevId: 634504731
1 parent f0f675c commit 625ec75

File tree

5 files changed

+44
-44
lines changed

5 files changed

+44
-44
lines changed

WORKSPACE

+3-3
Original file line numberDiff line numberDiff line change
@@ -58,10 +58,10 @@ http_archive(
5858
name = "org_tensorflow",
5959
patch_args = ["-p1"],
6060
patches = ["//third_party/tensorflow:tf.patch"],
61-
strip_prefix = "tensorflow-d17c801006947b240ec4b8caf232c39b6a24718a",
62-
sha256 = "1a32ed7b5ea090db114008ea382c1e1beda622ffd4c62582f2f906cb10ee6290",
61+
strip_prefix = "tensorflow-f6b72954734f8304bfb83228bd8406a3ba3394f4",
62+
sha256 = "15df197aace44fe2c67e6e22f930cf76f45d9e6ac1291e7c9ce8dd0dcc26e9a5",
6363
urls = [
64-
"https://github.com/tensorflow/tensorflow/archive/d17c801006947b240ec4b8caf232c39b6a24718a.zip"
64+
"https://github.com/tensorflow/tensorflow/archive/f6b72954734f8304bfb83228bd8406a3ba3394f4.zip"
6565
],
6666
)
6767

docs/tutorials/transformer.ipynb

+19-19
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@
9393
"source": [
9494
"This tutorial demonstrates how to create and train a [sequence-to-sequence](https://developers.google.com/machine-learning/glossary#sequence-to-sequence-task) [Transformer](https://developers.google.com/machine-learning/glossary#Transformer) model to translate [Portuguese into English](https://www.tensorflow.org/datasets/catalog/ted_hrlr_translate#ted_hrlr_translatept_to_en). The Transformer was originally proposed in [\"Attention is all you need\"](https://arxiv.org/abs/1706.03762) by Vaswani et al. (2017).\n",
9595
"\n",
96-
"Transformers are deep neural networks that replace CNNs and RNNs with [self-attention](https://developers.google.com/machine-learning/glossary#self-attention). Self attention allows Transformers to easily transmit information across the input sequences.\n",
96+
"Transformers are deep neural networks that replace CNNs and RNNs with [self-attention](https://developers.google.com/machine-learning/glossary#self-attention). Self-attention allows Transformers to easily transmit information across the input sequences.\n",
9797
"\n",
9898
"As explained in the [Google AI Blog post](https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html):\n",
9999
"\n",
@@ -138,7 +138,7 @@
138138
"To get the most out of this tutorial, it helps if you know about [the basics of text generation](./text_generation.ipynb) and attention mechanisms. \n",
139139
"\n",
140140
"A Transformer is a sequence-to-sequence encoder-decoder model similar to the model in the [NMT with attention tutorial](https://www.tensorflow.org/text/tutorials/nmt_with_attention).\n",
141-
"A single-layer Transformer takes a little more code to write, but is almost identical to that encoder-decoder RNN model. The only difference is that the RNN layers are replaced with self attention layers.\n",
141+
"A single-layer Transformer takes a little more code to write, but is almost identical to that encoder-decoder RNN model. The only difference is that the RNN layers are replaced with self-attention layers.\n",
142142
"This tutorial builds a 4-layer Transformer which is larger and more powerful, but not fundamentally more complex."
143143
]
144144
},
@@ -186,8 +186,8 @@
186186
"## Why Transformers are significant\n",
187187
"\n",
188188
"- Transformers excel at modeling sequential data, such as natural language.\n",
189-
"- Unlike the [recurrent neural networks (RNNs)](./text_generation.ipynb), Transformers are parallelizable. This makes them efficient on hardware like GPUs and TPUs. The main reasons is that Transformers replaced recurrence with attention, and computations can happen simultaneously. Layer outputs can be computed in parallel, instead of a series like an RNN.\n",
190-
"- Unlike [RNNs](https://www.tensorflow.org/guide/keras/rnn) (like [seq2seq, 2014](https://arxiv.org/abs/1409.3215)) or [convolutional neural networks (CNNs)](https://www.tensorflow.org/tutorials/images/cnn) (for example, [ByteNet](https://arxiv.org/abs/1610.10099)), Transformers are able to capture distant or long-range contexts and dependencies in the data between distant positions in the input or output sequences. Thus, longer connections can be learned. Attention allows each location to have access to the entire input at each layer, while in RNNs and CNNs, the information needs to pass through many processing steps to move a long distance, which makes it harder to learn.\n",
189+
"- Unlike [recurrent neural networks (RNNs)](./text_generation.ipynb), Transformers are parallelizable. This makes them efficient on hardware like GPUs and TPUs. The main reasons is that Transformers replaced recurrence with attention, and computations can happen simultaneously. Layer outputs can be computed in parallel, instead of a series like an RNN.\n",
190+
"- Unlike [RNNs](https://www.tensorflow.org/guide/keras/rnn) (such as [seq2seq, 2014](https://arxiv.org/abs/1409.3215)) or [convolutional neural networks (CNNs)](https://www.tensorflow.org/tutorials/images/cnn) (for example, [ByteNet](https://arxiv.org/abs/1610.10099)), Transformers are able to capture distant or long-range contexts and dependencies in the data between distant positions in the input or output sequences. Thus, longer connections can be learned. Attention allows each location to have access to the entire input at each layer, while in RNNs and CNNs, the information needs to pass through many processing steps to move a long distance, which makes it harder to learn.\n",
191191
"- Transformers make no assumptions about the temporal/spatial relationships across the data. This is ideal for processing a set of objects (for example, [StarCraft units](https://www.deepmind.com/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii)).\n",
192192
"\n",
193193
"\u003cimg src=\"https://www.tensorflow.org/images/tutorials/transformer/encoder_self_attention_distribution.png\" width=\"800\" alt=\"Encoder self-attention distribution for the word it from the 5th to the 6th layer of a Transformer trained on English-to-French translation\"\u003e\n",
@@ -1007,8 +1007,8 @@
10071007
},
10081008
"outputs": [],
10091009
"source": [
1010-
"embed_pt = PositionalEmbedding(vocab_size=tokenizers.pt.get_vocab_size(), d_model=512)\n",
1011-
"embed_en = PositionalEmbedding(vocab_size=tokenizers.en.get_vocab_size(), d_model=512)\n",
1010+
"embed_pt = PositionalEmbedding(vocab_size=tokenizers.pt.get_vocab_size().numpy(), d_model=512)\n",
1011+
"embed_en = PositionalEmbedding(vocab_size=tokenizers.en.get_vocab_size().numpy(), d_model=512)\n",
10121012
"\n",
10131013
"pt_emb = embed_pt(pt)\n",
10141014
"en_emb = embed_en(en)"
@@ -1340,7 +1340,7 @@
13401340
"id": "J6qrQxSpv34R"
13411341
},
13421342
"source": [
1343-
"### The global self attention layer"
1343+
"### The global self-attention layer"
13441344
]
13451345
},
13461346
{
@@ -1360,7 +1360,7 @@
13601360
"source": [
13611361
"\u003ctable\u003e\n",
13621362
"\u003ctr\u003e\n",
1363-
" \u003cth colspan=1\u003eThe global self attention layer\u003c/th\u003e\n",
1363+
" \u003cth colspan=1\u003eThe global self-attention layer\u003c/th\u003e\n",
13641364
"\u003ctr\u003e\n",
13651365
"\u003ctr\u003e\n",
13661366
" \u003ctd\u003e\n",
@@ -1378,7 +1378,7 @@
13781378
"source": [
13791379
"Since the context sequence is fixed while the translation is being generated, information is allowed to flow in both directions. \n",
13801380
"\n",
1381-
"Before Transformers and self attention, models commonly used RNNs or CNNs to do this task:"
1381+
"Before Transformers and self-attention, models commonly used RNNs or CNNs to do this task:"
13821382
]
13831383
},
13841384
{
@@ -1415,7 +1415,7 @@
14151415
"- The RNN allows information to flow all the way across the sequence, but it passes through many processing steps to get there (limiting gradient flow). These RNN steps have to be run sequentially and so the RNN is less able to take advantage of modern parallel devices.\n",
14161416
"- In the CNN each location can be processed in parallel, but it only provides a limited receptive field. The receptive field only grows linearly with the number of CNN layers, You need to stack a number of Convolution layers to transmit information across the sequence ([Wavenet](https://arxiv.org/abs/1609.03499) reduces this problem by using dilated convolutions).\n",
14171417
"\n",
1418-
"The global self attention layer on the other hand lets every sequence element directly access every other sequence element, with only a few operations, and all the outputs can be computed in parallel. \n",
1418+
"The global self-attention layer on the other hand lets every sequence element directly access every other sequence element, with only a few operations, and all the outputs can be computed in parallel. \n",
14191419
"\n",
14201420
"To implement this layer you just need to pass the target sequence, `x`, as both the `query`, and `value` arguments to the `mha` layer: "
14211421
]
@@ -1470,7 +1470,7 @@
14701470
"source": [
14711471
"\u003ctable\u003e\n",
14721472
"\u003ctr\u003e\n",
1473-
" \u003cth colspan=1\u003eThe global self attention layer\u003c/th\u003e\n",
1473+
" \u003cth colspan=1\u003eThe global self-attention layer\u003c/th\u003e\n",
14741474
"\u003ctr\u003e\n",
14751475
"\u003ctr\u003e\n",
14761476
" \u003ctd\u003e\n",
@@ -1499,7 +1499,7 @@
14991499
"source": [
15001500
"\u003ctable\u003e\n",
15011501
"\u003ctr\u003e\n",
1502-
" \u003cth colspan=1\u003eThe global self attention layer\u003c/th\u003e\n",
1502+
" \u003cth colspan=1\u003eThe global self-attention layer\u003c/th\u003e\n",
15031503
"\u003ctr\u003e\n",
15041504
"\u003ctr\u003e\n",
15051505
" \u003ctd\u003e\n",
@@ -1515,7 +1515,7 @@
15151515
"id": "Yq4NtLymD99-"
15161516
},
15171517
"source": [
1518-
"### The causal self attention layer"
1518+
"### The causal self-attention layer"
15191519
]
15201520
},
15211521
{
@@ -1524,7 +1524,7 @@
15241524
"id": "VufkgF7caLze"
15251525
},
15261526
"source": [
1527-
"This layer does a similar job as the global self attention layer, for the output sequence:"
1527+
"This layer does a similar job as the global self-attention layer, for the output sequence:"
15281528
]
15291529
},
15301530
{
@@ -1535,7 +1535,7 @@
15351535
"source": [
15361536
"\u003ctable\u003e\n",
15371537
"\u003ctr\u003e\n",
1538-
" \u003cth colspan=1\u003eThe causal self attention layer\u003c/th\u003e\n",
1538+
" \u003cth colspan=1\u003eThe causal self-attention layer\u003c/th\u003e\n",
15391539
"\u003ctr\u003e\n",
15401540
"\u003ctr\u003e\n",
15411541
" \u003ctd\u003e\n",
@@ -1551,7 +1551,7 @@
15511551
"id": "0AtF1HYFEOYf"
15521552
},
15531553
"source": [
1554-
"This needs to be handled differently from the encoder's global self attention layer. \n",
1554+
"This needs to be handled differently from the encoder's global self-attention layer. \n",
15551555
"\n",
15561556
"Like the [text generation tutorial](https://www.tensorflow.org/text/tutorials/text_generation), and the [NMT with attention](https://www.tensorflow.org/text/tutorials/nmt_with_attention) tutorial, Transformers are an \"autoregressive\" model: They generate the text one token at a time and feed that output back to the input. To make this _efficient_, these models ensure that the output for each sequence element only depends on the previous sequence elements; the models are \"causal\"."
15571557
]
@@ -1608,7 +1608,7 @@
16081608
"id": "WLYfIa8eiYgk"
16091609
},
16101610
"source": [
1611-
"To build a causal self attention layer, you need to use an appropriate mask when computing the attention scores and summing the attention `value`s.\n",
1611+
"To build a causal self-attention layer, you need to use an appropriate mask when computing the attention scores and summing the attention `value`s.\n",
16121612
"\n",
16131613
"This is taken care of automatically if you pass `use_causal_mask = True` to the `MultiHeadAttention` layer when you call it:"
16141614
]
@@ -1650,7 +1650,7 @@
16501650
"source": [
16511651
"\u003ctable\u003e\n",
16521652
"\u003ctr\u003e\n",
1653-
" \u003cth colspan=1\u003eThe causal self attention layer\u003c/th\u003e\n",
1653+
" \u003cth colspan=1\u003eThe causal self-attention layer\u003c/th\u003e\n",
16541654
"\u003ctr\u003e\n",
16551655
"\u003ctr\u003e\n",
16561656
" \u003ctd\u003e\n",
@@ -1679,7 +1679,7 @@
16791679
"source": [
16801680
"\u003ctable\u003e\n",
16811681
"\u003c/tr\u003e\n",
1682-
" \u003cth colspan=1\u003eThe causal self attention layer\u003c/th\u003e\n",
1682+
" \u003cth colspan=1\u003eThe causal self-attention layer\u003c/th\u003e\n",
16831683
"\u003ctr\u003e\n",
16841684
"\u003ctr\u003e\n",
16851685
" \u003ctd\u003e\n",

tensorflow_text/BUILD

+1-3
Original file line numberDiff line numberDiff line change
@@ -1396,9 +1396,7 @@ py_tf_text_library(
13961396
cc_op_kernels = [
13971397
"//tensorflow_text/core/kernels:utf8_binarize_kernel",
13981398
],
1399-
visibility = [
1400-
"//visibility:private", # Only private by automation, not intent. Owner may accept CLs adding visibility. See go/scheuklappen#explicit-private.
1401-
],
1399+
visibility = ["//visibility:private"],
14021400
deps = [
14031401
# python/framework:ops tensorflow dep,
14041402
# python/ops:array_ops tensorflow dep,

tensorflow_text/core/kernels/BUILD

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
1-
# Kernels for tf.text ops.
2-
# [internal] load cc_proto_library.bzl
1+
"""Kernels for tf.text ops."""
2+
33
load("@flatbuffers//:build_defs.bzl", "flatbuffer_cc_library")
44
load("//tensorflow_text:tftext.bzl", "tf_cc_library", "tflite_cc_library")
5+
# [internal] load cc_proto_library.bzl
56

67
licenses(["notice"])
78

third_party/tensorflow/tf.patch

+18-17
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,34 @@
11
diff --git a/tensorflow/tools/toolchains/cpus/aarch64/aarch64_compiler_configure.bzl b/tensorflow/tools/toolchains/cpus/aarch64/aarch64_compiler_configure.bzl
2-
index a2bdd6a7eed..ec25c23d8d4 100644
2+
index 00cd6983ca3..d9c5ef16f9b 100644
33
--- a/tensorflow/tools/toolchains/cpus/aarch64/aarch64_compiler_configure.bzl
44
+++ b/tensorflow/tools/toolchains/cpus/aarch64/aarch64_compiler_configure.bzl
5-
@@ -2,7 +2,7 @@
5+
@@ -1,7 +1,7 @@
6+
"""Configurations of AARCH64 builds used with Docker container."""
67

78
load("//tensorflow/tools/toolchains:cpus/aarch64/aarch64.bzl", "remote_aarch64_configure")
8-
load("//third_party/remote_config:remote_platform_configure.bzl", "remote_platform_configure")
99
-load("//third_party/py:python_configure.bzl", "remote_python_configure")
1010
+load("//third_party/py/non_hermetic:python_configure.bzl", "remote_python_configure")
11+
load("//third_party/remote_config:remote_platform_configure.bzl", "remote_platform_configure")
1112

1213
def ml2014_tf_aarch64_configs(name_container_map, env):
13-
for name, container in name_container_map.items():
1414
diff --git a/tensorflow/tools/toolchains/remote_config/rbe_config.bzl b/tensorflow/tools/toolchains/remote_config/rbe_config.bzl
15-
index 9f71a414bf7..57f70752323 100644
15+
index ae776c2a2fd..108e79edbd7 100644
1616
--- a/tensorflow/tools/toolchains/remote_config/rbe_config.bzl
1717
+++ b/tensorflow/tools/toolchains/remote_config/rbe_config.bzl
18-
@@ -1,6 +1,6 @@
19-
"""Macro that creates external repositories for remote config."""
20-
21-
-load("//third_party/py:python_configure.bzl", "local_python_configure", "remote_python_configure")
22-
+load("//third_party/py/non_hermetic:python_configure.bzl", "local_python_configure", "remote_python_configure")
18+
@@ -4,7 +4,7 @@ load("//tensorflow/tools/toolchains/remote_config:containers.bzl", "containers")
2319
load("//third_party/gpus:cuda_configure.bzl", "remote_cuda_configure")
24-
load("//third_party/nccl:nccl_configure.bzl", "remote_nccl_configure")
2520
load("//third_party/gpus:rocm_configure.bzl", "remote_rocm_configure")
21+
load("//third_party/nccl:nccl_configure.bzl", "remote_nccl_configure")
22+
-load("//third_party/py:python_configure.bzl", "local_python_configure", "remote_python_configure")
23+
+load("//third_party/py/non_hermetic:python_configure.bzl", "local_python_configure", "remote_python_configure")
24+
load("//third_party/remote_config:remote_platform_configure.bzl", "remote_platform_configure")
25+
load("//third_party/tensorrt:tensorrt_configure.bzl", "remote_tensorrt_configure")
26+
2627
diff --git a/tensorflow/workspace2.bzl b/tensorflow/workspace2.bzl
27-
index 056df85ffdb..7422baf8c59 100644
28+
index 77eea2ac869..54a3ec2fed6 100644
2829
--- a/tensorflow/workspace2.bzl
2930
+++ b/tensorflow/workspace2.bzl
30-
@@ -37,7 +37,7 @@ load("//third_party/nasm:workspace.bzl", nasm = "repo")
31+
@@ -44,7 +44,7 @@ load("//third_party/nasm:workspace.bzl", nasm = "repo")
3132
load("//third_party/nccl:nccl_configure.bzl", "nccl_configure")
3233
load("//third_party/opencl_headers:workspace.bzl", opencl_headers = "repo")
3334
load("//third_party/pasta:workspace.bzl", pasta = "repo")
@@ -37,15 +38,15 @@ index 056df85ffdb..7422baf8c59 100644
3738
load("//third_party/pybind11_abseil:workspace.bzl", pybind11_abseil = "repo")
3839
load("//third_party/pybind11_bazel:workspace.bzl", pybind11_bazel = "repo")
3940
diff --git a/third_party/py/non_hermetic/python_configure.bzl b/third_party/py/non_hermetic/python_configure.bzl
40-
index 300cbfb6c71..09d98505dd9 100644
41+
index 89732c3e33d..4ac1c8f5c04 100644
4142
--- a/third_party/py/non_hermetic/python_configure.bzl
4243
+++ b/third_party/py/non_hermetic/python_configure.bzl
43-
@@ -206,7 +206,7 @@ def _create_local_python_repository(repository_ctx):
44+
@@ -203,7 +203,7 @@ def _create_local_python_repository(repository_ctx):
4445
# Resolve all labels before doing any real work. Resolving causes the
4546
# function to be restarted with all previous state being lost. This
4647
# can easily lead to a O(n^2) runtime in the number of labels.
4748
- build_tpl = repository_ctx.path(Label("//third_party/py:BUILD.tpl"))
4849
+ build_tpl = repository_ctx.path(Label("//third_party/py/non_hermetic:BUILD.tpl"))
49-
50+
5051
python_bin = get_python_bin(repository_ctx)
51-
_check_python_bin(repository_ctx, python_bin)
52+
_check_python_bin(repository_ctx, python_bin)

0 commit comments

Comments
 (0)