Skip to content

Commit ee0a2f7

Browse files
committed
DocketReport._get_document_number(): Partially support "doc" bk entries
There exist unnumbered bankruptcy entries that also link to attached PDFs. Previously we threw away the entire docket entry, silently. Now we include it, but we throw away the attachment link (xxx). A better fix requires a schema change. Adjust test results for same (lawb_18072.json).
1 parent fa75461 commit ee0a2f7

2 files changed

Lines changed: 22 additions & 2 deletions

File tree

juriscraper/pacer/docket_report.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1659,7 +1659,11 @@ def _get_document_number(self, cell):
16591659
word for phrase in self._br_split(cell) for word in phrase.split()
16601660
]
16611661

1662-
for _ in ["view"]:
1662+
# XXX: an unfortunately consequence of removing the word
1663+
# "doc" from the set of possible docket entry numbers is that
1664+
# we fail to capture the PDF attachment to this docket entry.
1665+
# But better that than not capturing the docket text at all.
1666+
for _ in ["view", "doc"]:
16631667
try:
16641668
words.remove(_)
16651669
except ValueError:

tests/examples/pacer/dockets/bankruptcy/lawb_18072.json

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,14 @@
4141
"pacer_doc_id": null,
4242
"pacer_seq_no": null
4343
},
44+
{
45+
"date_entered": "2001-03-01",
46+
"date_filed": "2001-03-01",
47+
"description": "Courts BNC Certificate of Service RE: [0-0] First Meeting . Notices sent: 16 (bnc) (Entered: 03/01/2001)",
48+
"document_number": null,
49+
"pacer_doc_id": null,
50+
"pacer_seq_no": null
51+
},
4452
{
4553
"date_entered": "2001-03-26",
4654
"date_filed": "2001-03-26",
@@ -89,6 +97,14 @@
8997
"pacer_doc_id": "0880432493",
9098
"pacer_seq_no": null
9199
},
100+
{
101+
"date_entered": "2001-07-13",
102+
"date_filed": "2001-07-13",
103+
"description": "Courts BNC Certificate of Service RE: [9-1] Discharge Order . Notices sent: 16 (bnc) (Entered: 07/13/2001)",
104+
"document_number": null,
105+
"pacer_doc_id": null,
106+
"pacer_seq_no": null
107+
},
92108
{
93109
"date_entered": "2001-07-16",
94110
"date_filed": "2001-07-16",
@@ -177,4 +193,4 @@
177193
}
178194
],
179195
"referred_to_str": ""
180-
}
196+
}

0 commit comments

Comments
 (0)