Skip to content

nguyenthanhasia/RRE-datasets

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

RRE-datasets

This repo contains datasets for the task of Recognizing Requisite-Effectuation structures in legal texts.

Data format

japan-pension dataset

  • Tagging scheme: IOE
  • Text encoding: Shift JIS
  • File format: tsv
  • Columns: Head word [TAB] Function word [TAB] Punctuation (if any) [TAB] Label

Example:

給付は、その支給をすべき事由が生じたときは、その事由が生じた日のする月の翌月からその事由がした日のする月の分の支給をする。

給付			E-S2
その	その	NO	I-R
支給		NO	I-R
	べき	NO	I-R
事由		NO	I-R
生じ		NO	I-R
とき			E-R
その	その	NO	I-E
事由		NO	I-E
生じ		NO	I-E
		NO	I-E
する	する	NO	I-E
		NO	I-E
翌月	から	NO	I-E
その	その	NO	I-E
事由		NO	I-E
		NO	I-E
		NO	I-E
する	する	NO	I-E
		NO	I-E
		NO	I-E
支給		NO	I-E
する	する		E-E

japan-civil-code dataset

  • Tagging scheme: BIOE (Begin, Inside, End, and not-included)
  • Text encoding: utf-8
  • File format: tsv
  • Columns: Word [TAB] Label_1,...,Label_N
### Doc 100
Any	B-E,B-R
manifestation	I-E,I-R
of	I-E,I-R
intention	I-E,I-R
made	I-E,I-R
by	I-R
an	I-R
agent	I-R
with	I-R
no	I-R
indication	I-R
that	I-R
it	I-R
is	I-R
made	I-R
on	I-R
behalf	I-R
of	I-R
the	I-R
principal	E-R
is	I-E
deemed	I-E
to	I-E
have	I-E
been	I-E
made	I-E
for	I-E
the	I-E
agent	I-E
's	I-E
own	I-E
behalf	E-E

Experimental Results

japan-pension dataset

Method F1 (%)
Nguyen et al., 2018 93.77
bert-base-multilingual-cased (max len: 128) 91.60
bert-base-multilingual-cased (max len: 256) 91.46
Kyoto bert-base (max len: 128) 90.01
Kyoto bert-base (max len: 256) 89.89
Tohoku bert (max len: 256) 90.93
Tohoku bert char (max len: 256) 91.36
Tohoku bert wwm (max len: 256) 90.66
Tohoku bert cwwm (max len: 256) 89.40
Cinamon electra small discriminator (max len: 256) 85.80
Cinamon electra small generator (max len: 256) 88.57
Albert JA v2 (max len: 256) 90.09

Citation

Please cite the following paper when using the datasets.

@article{10.1007/s10506-018-9225-1,
author = {Nguyen, Truong-Son and Nguyen, Le-Minh and Tojo, Satoshi and Satoh, Ken and Shimazu, Akira},
title = {Recurrent Neural Network-Based Models for Recognizing Requisite and Effectuation Parts in Legal Texts},
year = {2018},
issue_date = {June      2018},
publisher = {Kluwer Academic Publishers},
address = {USA},
volume = {26},
number = {2},
issn = {0924-8463},
url = {https://doi.org/10.1007/s10506-018-9225-1},
doi = {10.1007/s10506-018-9225-1},
journal = {Artif. Intell. Law},
month = jun,
pages = {169–199},
numpages = {31},
keywords = {Sequence labeling, Legal text analysis, Recognizing requisite and effectuation parts, Deep learning, Long short-term memory, Recurrent neural networks, Conditional random fields}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published