Skip to content

Commit 4ef466c

Browse files
committed
Add version badge
1 parent 9f73249 commit 4ef466c

File tree

3 files changed

+10
-9
lines changed

3 files changed

+10
-9
lines changed

README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# TFtftransformers
22
Converting Hugginface tokenizers to Tensorflow tokenizers. The main reason is to be able to bundle the tokenizer and model into one Reusable SavedModel.
33

4+
<a href="https://badge.fury.io/py/tftokenizers"><img src="https://badge.fury.io/py/tftokenizers.svg" alt="PyPI version" height="18"></a>
45
---
56

67
**Source Code**: <a href="https://github.com/Huggingface-Supporters/tftftransformers" target="_blank">https://github.com/Hugging-Face-Supporter/tftokenizers</a>
@@ -55,21 +56,21 @@ output = reloaded_model([s1, s2, s3])
5556
print(output)
5657
```
5758

58-
### `Setup`
59+
## `Setup`
5960
```bash
6061
git clone https://github.com/Hugging-Face-Supporter/tftokenizers.git
6162
cd tftokenizers
6263
poetry install
6364
poetry shell
6465
```
6566

66-
### `Run`
67+
## `Run`
6768
To convert a Huggingface tokenizer to Tensorflow, first choose one from the models or tokenizers from the Huggingface hub to download.
6869

6970
**NOTE**
7071
> Currently only BERT models work with the converter.
7172
72-
#### `Download`
73+
### `Download`
7374
First download tokenizers from the hub by name. Either run the bash script do download multiple tokenizers or download a single tokenizer with the python script.
7475

7576
The idea is to eventually only to automatically download and convert
@@ -79,13 +80,13 @@ python tftokenizers/download.py -n bert-base-uncased
7980
bash scripts/download_tokenizers.sh
8081
```
8182

82-
#### `Convert`
83+
### `Convert`
8384
Convert downloaded tokenizer from Huggingface format to Tensorflow
8485
```bash
8586
python tftokenizers/convert.py
8687
```
8788

88-
### `Before Commit`
89+
## `Before Commit`
8990
```bash
9091
make build
9192
```
@@ -97,8 +98,8 @@ make build
9798
- [x] Make a TF Reusabel SavedModel with Tokenizer and Model in the same class. Emulate how the TF Hub example for BERT works.
9899
- [x] Find methods for identifying the base tokenizer model and map those settings and special tokens to new tokenizers
99100
- [x] Extend the tokenizers to more tokenizer types and identify them from a huggingface model name
100-
- [ ] Document how others can use the library and document the different stages in the process
101+
- [x] Document how others can use the library and document the different stages in the process
102+
- [x] Improve the conversion pipeline (s.a. Download and export files if not passed in or available locally)
101103
- [ ] Convert other tokenizers. Identify limitations
102-
- [ ] Improve the conversion pipeline (s.a. Download and export files if not passed in or available locally)
103104
- [ ] Support encoding of two sentences at a time [Ref](https://www.tensorflow.org/text/guide/bert_preprocessing_guide)
104105
- [ ] Allow the tokenizers to be used for Masking (MLM) [Ref](https://www.tensorflow.org/text/guide/bert_preprocessing_guide)

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "tftokenizers"
3-
version = "0.1.1"
3+
version = "0.1.2"
44
description = "Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels."
55
authors = ["MarkusSagen <markus.john.sagen@gmail.com>"]
66
license = "Apache License 2.0"

tftokenizers/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
"""Use Huggingface Transformer and Tokenizers as Tensorflow Resuable SavedModels."""
22

3-
__version__ = "0.1.1"
3+
__version__ = "0.1.2"
44

55
from .detect import detect_and_load_tokenizer as detect_and_load_tokenizer
66
from .detect import find_tf_base_tokenizer as find_tf_base_tokenizer

0 commit comments

Comments
 (0)