Skip to content

Commit 1b6eeaa

Browse files
author
minghui.qmh
committed
EasyNLP init commits
1 parent 0d35b2c commit 1b6eeaa

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+10490
-11
lines changed

.gitignore

+158
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
*.tar.gz
2+
*.zip
3+
*.gz
4+
*.txt
5+
*.tsv
6+
*.csv
7+
.idea
8+
easytexminer.egg-info
9+
build
10+
tools
11+
xflow_deploy
12+
dist
13+
.DS_Store
14+
__pycache__
15+
### Python template
16+
# Byte-compiled / optimized / DLL files
17+
__pycache__/
18+
*.py[cod]
19+
*$py.class
20+
21+
# C extensions
22+
*.so
23+
24+
# Distribution / packaging
25+
.Python
26+
build/
27+
develop-eggs/
28+
dist/
29+
downloads/
30+
eggs/
31+
.eggs/
32+
lib/
33+
lib64/
34+
parts/
35+
sdist/
36+
var/
37+
wheels/
38+
share/python-wheels/
39+
*.egg-info/
40+
.installed.cfg
41+
*.egg
42+
MANIFEST
43+
44+
# PyInstaller
45+
# Usually these files are written by a python script from a template
46+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
47+
*.manifest
48+
*.spec
49+
50+
# Installer logs
51+
pip-log.txt
52+
pip-delete-this-directory.txt
53+
54+
# Unit test / coverage reports
55+
htmlcov/
56+
.tox/
57+
.nox/
58+
.coverage
59+
.coverage.*
60+
.cache
61+
nosetests.xml
62+
coverage.xml
63+
*.cover
64+
*.py,cover
65+
.hypothesis/
66+
.pytest_cache/
67+
cover/
68+
69+
# Translations
70+
*.mo
71+
*.pot
72+
73+
# Django stuff:
74+
*.log
75+
local_settings.py
76+
db.sqlite3
77+
db.sqlite3-journal
78+
79+
# Flask stuff:
80+
instance/
81+
.webassets-cache
82+
83+
# Scrapy stuff:
84+
.scrapy
85+
86+
# Sphinx documentation
87+
docs/_build/
88+
89+
# PyBuilder
90+
.pybuilder/
91+
target/
92+
93+
# Jupyter Notebook
94+
.ipynb_checkpoints
95+
96+
# IPython
97+
profile_default/
98+
ipython_config.py
99+
100+
# pyenv
101+
# For a library or package, you might want to ignore these files since the code is
102+
# intended to run in multiple environments; otherwise, check them in:
103+
# .python-version
104+
105+
# pipenv
106+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
107+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
108+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
109+
# install all needed dependencies.
110+
#Pipfile.lock
111+
112+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
113+
__pypackages__/
114+
115+
# Celery stuff
116+
celerybeat-schedule
117+
celerybeat.pid
118+
119+
# SageMath parsed files
120+
*.sage.py
121+
122+
# Environments
123+
.env
124+
.venv
125+
env/
126+
venv/
127+
ENV/
128+
env.bak/
129+
venv.bak/
130+
131+
# Spyder project settings
132+
.spyderproject
133+
.spyproject
134+
135+
# Rope project settings
136+
.ropeproject
137+
138+
# mkdocs documentation
139+
/site
140+
141+
# mypy
142+
.mypy_cache/
143+
.dmypy.json
144+
dmypy.json
145+
user_datasets
146+
# Pyre type checker
147+
.pyre/
148+
149+
# pytype static type analyzer
150+
.pytype/
151+
152+
# Cython debug symbols
153+
cython_debug/
154+
155+
glue_data/
156+
results/
157+
logs/
158+
.vscode/

.isort.cfg

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
[settings]
2+
known_third_party = numpy,requests,rouge,scipy,setuptools,sklearn,sphinx_rtd_theme,torch,tqdm

README.md

+10-11
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
<p align="center">
44
<br>
5-
<img src="https://cdn.nlark.com/yuque/0/2022/png/2480469/1649297935073-2fce0ec9-ec8c-490f-bc25-a8cf50d9918f.png" width="300"/>
5+
<img src="https://cdn.nlark.com/yuque/0/2022/png/2480469/1649297935073-2fce0ec9-ec8c-490f-bc25-a8cf50d9918f.png" width="200"/>
66
<br>
77
<p>
88

@@ -24,13 +24,11 @@ EasyNLP is an easy-to-use NLP development and application toolkit in PyTorch, fi
2424
- **Compatible with open-source libraries:** EasyNLP has APIs to support the training of models from Huggingface/Transformers with the PAI distributed framework. It also supports the pre-trained models in EasyTransfer ModelZoo.
2525
- **Knowledge-injected pre-training:** The PAI team has a lot of research on knowledge-injected pre-training, and builds a knowledge-injected model that wins the first place in the CCF knowledge pre-training competition. EasyNLP integrates these cutting-edge knowledge pre-trained models, including DKPLM and KGBERT.
2626
- **Landing large pre-trained models:** EasyNLP provides few-shot learning capabilities, allowing users to finetune large models with only a few samples to achieve good results. At the same time, it provides knowledge distillation functions to help quickly distill large models to a small and efficient model to faciliate online deployment.
27-
- **Seamless integration to PAI products:** It is seamlessly integrated to [Platform of AI (PAI)](https://www.aliyun.com/product/bigdata/product/learn) products, including PAI-DSW for development, PAI-DLC for cloud-native training, PAI-EAS for serving, and PAI-Designer for zero-coding model training.
28-
29-
27+
- **Seamless integration to PAI products::** It is seamlessly integrated to [Platform of AI (PAI)](https://www.aliyun.com/product/bigdata/product/learn) products, including PAI-DSW for development, PAI-DLC for cloud-native training, PAI-EAS for serving, and PAI-Designer for zero-coding model training.
3028

3129
# Installation
3230

33-
You can either install from pip
31+
You can either install from pip
3432

3533
```bash
3634
$ pip install pai-easynlp (to be released)
@@ -43,11 +41,12 @@ $ git clone https://github.com/alibaba/EasyNLP.git
4341
$ cd EasyNLP
4442
$ python setup.py install
4543
```
46-
This repo is tested on Python3.6, PyTorch >= 1.8.
4744

45+
This repo is tested on Python3.6, PyTorch >= 1.8.
4846

4947
# Quick Start
50-
Now let's show how to use just a few lines of code to build a text classification model based on BERT.
48+
49+
Now let's show how to use just a few lines of code to build a text classification model based on BERT.
5150

5251
```python
5352
from easynlp.core import Trainer
@@ -71,6 +70,7 @@ Trainer(model=model, train_dataset=train_dataset).train()
7170
```
7271

7372
Then you can run the code:
73+
7474
```bash
7575
python main.py \
7676
--mode train \
@@ -85,7 +85,7 @@ python main.py \
8585
--user_defined_parameters='pretrain_model_name_or_path=bert-tiny-uncased'
8686
```
8787

88-
You can also use AppZoo Command Line Tools to quickly train an App model. Take text classification on SST-2 dataset as an example. First you can download the [train.tsv](http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/classification/train.tsv), and [dev.tsv](http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/classification/dev.tsv), then start training:
88+
You can also use AppZoo Command Line Tools to quickly train an App model. Take text classification on SST-2 dataset as an example. First you can download the [train.tsv](http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/classification/train.tsv), and [dev.tsv](http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/classification/dev.tsv), then start training:
8989

9090
```bash
9191
$ easynlp \
@@ -117,9 +117,8 @@ $ easynlp \
117117
--checkpoint_path=./classification_model \
118118
--app_name=text_classify
119119
```
120-
To learn more about the usage of AppZoo, please refer to our [documentation](https://www.yuque.com/easyx/easynlp/psm6fr).
121-
122120

121+
To learn more about the usage of AppZoo, please refer to our [documentation](https://www.yuque.com/easyx/easynlp/psm6fr).
123122

124123
# Tutorials
125124

@@ -133,8 +132,8 @@ To learn more about the usage of AppZoo, please refer to our [documentation](htt
133132
- [小样本学习实践](https://www.yuque.com/easyx/easynlp/vgbopy)
134133
- API docs: [http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/easynlp/easynlp_docs/html/index.html](http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/easynlp/easynlp_docs/html/index.html)
135134

136-
137135
# Contact Us
136+
138137
Scan the following QR codes to join Dingtalk discussion group. The group discussions are most in Chinese, but English is also welcomed.
139138

140139
<img src="https://cdn.nlark.com/yuque/0/2020/png/2480469/1600310258842-d7121051-32f1-494b-a7a5-a35ede74b6c4.png#align=left&display=inline&height=352&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1178&originWidth=1016&size=312154&status=done&style=none&width=304" width="300"/>

docs/Makefile

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line.
5+
SPHINXOPTS =
6+
SPHINXBUILD = sphinx-build
7+
SOURCEDIR = source
8+
BUILDDIR = build
9+
10+
# Put it first so that "make" without argument is like "make help".
11+
help:
12+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
13+
14+
.PHONY: help Makefile
15+
16+
# Catch-all target: route all unknown targets to Sphinx using the new
17+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
18+
%: Makefile
19+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

docs/README.md

+98
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
### How to maintain docs
2+
3+
### 1. Install easy transfer
4+
5+
```
6+
$ cd /to/dir/EasyNLP
7+
$ python setup.py install
8+
```
9+
10+
### 2. Install sphinx
11+
12+
```bash
13+
$ pip install sphinx
14+
$ pip install sphinx_rtd_theme
15+
```
16+
17+
### 3. Add modules
18+
19+
#### 3.1 Add class or functions in existing files
20+
21+
You need to add class or functions with `docstring` into the attached file.
22+
23+
1. Google Python Style Guide [link](http://google.github.io/styleguide/pyguide.html#381-docstrings)
24+
1. Google docstring Sample [link](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
25+
1. Sample project:torch.nn.modules.conv [link](https://pytorch.org/docs/stable/_modules/torch/nn/modules/conv.html#Conv1d)
26+
1. Take `easynlp.appzoo.classification.BertTextClassify` as example:
27+
28+
````python
29+
class BertTextClassify(BertPreTrainedModel):
30+
"""
31+
Transformer model from ```Attention Is All You Need''',
32+
Original paper: https://arxiv.org/abs/1706.03762
33+
34+
Args:
35+
num_token (int): vocab size.
36+
num_layer (int): num of layer.
37+
num_head (int): num of attention heads.
38+
embedding_dim (int): embedding dimension.
39+
attention_head_dim (int): attention head dimension.
40+
feed_forward_dim (int): feed forward dimension.
41+
initializer: initializer type.
42+
activation: activation function.
43+
dropout (float): dropout rate (0.0 to 1.0).
44+
attention_dropout (float): dropout rate for attention layer.
45+
46+
Returns: None
47+
"""
48+
````
49+
50+
#### 3.2 Add new file
51+
52+
For example, if you need to add a new file in `easynlp/data` and the file name is `blackmagic.py` with a `BlackMagic` class:
53+
54+
1. Add `docstring` to the code
55+
1. In `docs/source/api/data.rst`,Find a position for `blackmagic` and add
56+
57+
```rst
58+
blackmagic
59+
--------------------------------------
60+
61+
.. automodule:: easynlp.data.blackmagic
62+
:members:
63+
:undoc-members:
64+
:show-inheritance:
65+
66+
```
67+
68+
#### 3.3 Add new directory
69+
70+
For example, you want to add a `magic` directory in `ez_transfer`,and there is a file named `blackmagic.py` with a `BlackMagic` class:
71+
72+
1. Add `docstring` to the code
73+
1. Add file `docs/source/api/magic.rst` and write the following line
74+
75+
```rst
76+
ez\_transfer.magic
77+
===========================
78+
```
79+
80+
3. In `docs/source/api/magic.rst`,Find a position for `blackmagic` and add
81+
82+
```rst
83+
blackmagic
84+
--------------------------------------
85+
86+
.. automodule:: ez_transfer.layers.blackmagic
87+
:members:
88+
:undoc-members:
89+
:show-inheritance:
90+
91+
```
92+
93+
### 4. Generate doc html
94+
95+
```bash
96+
$ cd docs/
97+
$ sh build_docs.sh
98+
```

docs/build_docs.sh

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Test with sphinx
2+
pip install sphinx==1.8.6
3+
pip install sphinx_rtd_theme
4+
5+
rm -rf build
6+
make html
7+
8+
# upload to oss
9+
ossconfig=`cat /home/admin/workspace/odps_clt_release_64/conf/atp-public-eki`
10+
echo 'copy files to atp-modelzoo docs'
11+
ossutil64 cp -f build oss://atp-modelzoo-sh/release/easynlp/easynlp_docs/ $ossconfig --recursive

0 commit comments

Comments
 (0)