-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathxml-format.html
169 lines (142 loc) · 9.56 KB
/
xml-format.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
title: XML Format
layout: default
---
<p>The State Decoded parser can either be modified to use bespoke PHP to parse a legal code in its native format, or it can parse a prescribed XML format. This is a totally invented XML format. There is no legal code that is provided in this format. It is provided as an option because, via <a href="http://en.wikipedia.org/wiki/XSLT">XSLT</a>, it can be substantially easier to convert an existing XML to this format than to write a parser for it in PHP.</p>
<p>The State Decoded includes <a href="https://github.com/statedecoded/statedecoded/blob/master/sample.xsl">a sample XSLT file</a> on which you can base your own XSLT. It is well commented and verbose, so even if you’re not familiar with XSLT, you’ve got a fighting chance of transforming your XML to The State Decoded’s format. If the legal code that you are ingesting is in Lexis-Nexis' XML format (which sometimes travels under the name of “legislativedoc v1”), then use <code><a href="https://github.com/statedecoded/statedecoded/blob/master/lexis-nexis.xsl">lexis-nexis.xsl</a></code>, which is included with The State Decoded.</p>
<p>Note that this is <em>not</em> an effort to create a standard interchange or storage format for legal codes. This exists solely as a convenience for importing data into The State Decoded.</p>
<h1>Terminology</h1>
Legal codes throughout the U.S. all use different terminology (e.g., "statutes" vs "code" vs "laws"). We need to adopt some terminology here, and while it can't be the right terminology everywhere, we hope to avoid any disputes about what exactly is the right terminology. The terminology we've adopted was the simplest that also reflected the reality of the data. A lot of people implementing The State Decoded are programmers, not attorneys, and the terminology that the project uses needs to reflect the reality of that user base.
Below, the word "law" refers to what some jurisdictions would call a section, and it's usually prefixed with the § symbol. It corresponds to a single page on a State Decoded website.
Within a "law", we use "section" elements to represent the hierarchical structure of the paragraphs within it. In some jurisdictions this would be called a subsection. A "section" corresponds to a paragraph plus any logically nested (typically indented) content within it.
<h1>Descriptive Specifications</h1>
<ul>
<li>There must be one file per law.</li>
<li>All files must be in the same directory.</li>
<li>The names of the files are irrelevant.</li>
<li>Every file contains both structural data about its container—which chapter/title/part/article/etc. that it's within—and data about the law itself.</li>
<li>Text must be broken up by subsection, rather than provided as a single block of text, unless the text of the law does not have subsections.</li>
<li>Subsections may be nested.</li>
</ul>
<p>This being open source software, some relatively simple tweaks to <code>Parser::iterate</code> and <code>Parser::parse</code> make all of these specifications malleable.</p>
<h1>Fields</h1>
<ul>
<li>law: the container for all other fields—required
<ul><li>structure: the container for structural units (chapters, titles, articles, parts, etc.)—required</li>
<li>unit: a single structural unit that contains this law—required
<ul><li>label: the type of structural unit that this is (chapter, title, article, part, etc.)—required</li>
<li>identifier: the unique identifier for this structural unit (but not necessarily globally unique!), almost always a number (e.g., 18.2, 6)—required</li>
<li>orderby: when listing all structural units of this label, in what ordinal position this structural unit should appear—optional</li>
<li>level: a whole number, starting at one, indicating the depth of this structural unit—required</li></ul></li>
<li>sectionnumber: the unique identifier for this law—required</li>
<li>catchline: the title of this law—required</li>
<li>orderby: when listing all laws within this structural unit, in what ordinal position this law should appear—optional</li>
<li>text: the container for the text of the law itself—required</li>
<li>section: a single labelled section of a law—optional
<ul><li>prefix: the identifying label for this section (e.g., A, 6, vi)—required</li>
<li>type: the type of material within this section; default is text, other options are "table" and "image"—optional</li></ul></li>
<li>history: the textual description of the history of amendments to this law—optional</li>
<li>metadata: a container for any additional fields, which will be stored as key:value in the <code>laws_meta</code> table—optional; values of <code>true</code> and <code>false</code> will be stored in the database as <code>y</code> and <code>n</code>, respectively, but converted back to <code>true</code> and <code>false</code> within the internal and external APIs</li></ul></li>
</ul>
<h1>Sample XML</h1>
<h2>Structure</h2>
<h3>With All Fields</h3>
<pre>
<?xml version="1.0" encoding="utf-8"?>
<law>
<structure>
<unit label="" identifier="" order_by="" level=""></unit>
</structure>
<section_number></section_number>
<catch_line></catch_line>
<order_by></order_by>
<text>
<section prefix=""></section>
</text>
<history></history>
<metadata></metadata>
<tags>
<tag></tag>
</tags>
</law>
</pre>
<h3>Bare Minimum</h3>
<pre>
<?xml version="1.0" encoding="utf-8"?>
<law>
<structure>
<unit label="" identifier="" level=""></unit>
</structure>
<section_number></section_number>
<catch_line></catch_line>
<text></text>
</law>
</pre>
<h2>Populated</h2>
<p>Here is a complete, populated sample XML file that includes every available parameter.</p>
<pre>
<?xml version="1.0" encoding="utf-8"?>
<law>
<structure>
<unit label="title" identifier="18.2" order_by="18.2" level="1">Crimes and Offenses Generally</unit>
<unit label="chapter" identifier="6" order_by="6" level="2">Crimes Involving Fraud</unit>
</structure>
<section_number>18.2-186</section_number>
<catch_line>False statements to obtain property or credit.</catch_line>
<order_by>00000000009104318.2</order_by>
<text>
<section prefix="1">
Notwithstanding § 18.2-186:
<section prefix="A">A person shall be guilty of a Class 1 misdemeanor if he makes,
causes to be made or conspires to make directly, indirectly or through an agency, any
materially false statement in writing, knowing it to be false and intending that it be
relied upon, concerning the financial condition or means or ability to pay of himself,
or of any other person for whom he is acting, or any firm or corporation in which he is
interested or for which he is acting, for the purpose of procuring, for his own benefit
or for the benefit of such person, firm or corporation, the delivery of personal
property, the payment of cash, the making of a loan or credit, the extension of a
credit, the discount of an account receivable, or the making, acceptance, discount, sale
or endorsement of a bill of exchange or promissory note.</section>
<section prefix="B">Any person who knows that a false statement has been made in writing
concerning the financial condition or ability to pay of himself or of any person for
whom he is acting, or any firm or corporation in which he is interested or for which he
is acting and who, with intent to defraud, procures, upon the faith thereof.
<section prefix="i" type="table">
+------------------+--------------------+
| PERSON | STATEMENT |
+------------------+--------------------+
| any | false |
| no | true |
+------------------+--------------------+
</section>
For his own benefit, or for the benefit of the person, firm or corporation in which he
is interested or for which he is acting, any such delivery, payment, loan, credit,
extension, discount making, acceptance, sale or endorsement, shall, if the value of the
thing or the amount of the loan, credit or benefit obtained is $200 or more, be guilty
of grand larceny or, if the value is less than $200, be guilty of petit
larceny.</section>
<section prefix="C">Venue for the trial of any person charged with an offense under this
section may be in the county or city in which (i) any act was performed in furtherance
of the offense, or (ii) the person charged with the offense resided at the time of the
offense.</section>
<section prefix="D">As used in this section, "in writing" shall include information
transmitted by computer, facsimile, e-mail, Internet, or any other electronic medium,
and shall not include information transmitted by any such medium by voice
transmission.</section>
</section>
<section prefix="2">
This law shall expire on July 1, 2014.
</section>
</text>
<history>Code 1950, § 18.1-125; 1960, c. 358; 1975, cc. 14, 15.</history>
<metadata>
<repealed>y</repealed>
<expiration>2014-07-01</expiration>
</metadata>
<tags>
<tag>lying</tag>
<tag>fraud</tag>
<tag>theft</tag>
</tags>
</law>
</pre>