Skip to content

Commit 6f071c0

Browse files
Merge pull request #16 from yahoojapan/release/v1.3.0
Discard low quality questions in the training set of JSQuAD
2 parents 31f58e8 + 7bbbe02 commit 6f071c0

File tree

16 files changed

+40
-40
lines changed

16 files changed

+40
-40
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ multiple datasets. Each dataset can be found under the `datasets` directory. We
1717
||JCoLA&dagger;|6,919(in-domain)|865(in-domain),<br>685 (out-of-domain)|865 (in-domain),<br>686 (out-of-domain)|
1818
|Sentence Pair Classification|JSTS|12,451|1,457|1,589|
1919
||JNLI|20,073|2,434|2,508|
20-
|QA|JSQuAD|62,859|4,442|4,420|
20+
|QA|JSQuAD|62,697|4,442|4,420|
2121
||JCommonsenseQA|8,939|1,119|1,118|
2222

2323
&dagger;The JCoLA dataset (Someya+, 2014) is available at https://github.com/osekilab/JCoLA.
@@ -48,12 +48,12 @@ $ cd preprocess/marc-ja/scripts
4848
$ gzip -dc /somewhere/amazon_reviews_multilingual_JP_v1_00.tsv.gz | \
4949
python marc-ja.py \
5050
--positive-negative \
51-
--output-dir ../../../datasets/marc_ja-v1.2 \
51+
--output-dir ../../../datasets/marc_ja-v1.3 \
5252
--max-char-length 500 \
5353
--filter-review-id-list-valid ../data/filter_review_id_list/valid.txt \
5454
--label-conv-review-id-list-valid ../data/label_conv_review_id_list/valid.txt
5555
```
56-
~~The train and valid sets will be generated under the `datasets/marc_ja-v1.2` directory.~~
56+
~~The train and valid sets will be generated under the `datasets/marc_ja-v1.3` directory.~~
5757

5858

5959
~~When you use this dataset, please follow the license of [Multilingual Amazon Reviews Corpus (MARC)](https://docs.opendata.aws/amazon-reviews-ml/readme.html).~~
File renamed without changes.
File renamed without changes.
File renamed without changes.
Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)