ocr_blotter_scrape

scrapes county and city pdf blotter reports converts to images, then ocrs them, and prints extracted text for easier reading or analysis purposes.

with a single-line modification, this can be used to automatically track, measure, or analyze any other reports, like financialn records, upcoming ballot items, BOLs, or even parking ticket fees

SETUP

need to be on arch linux, debian-based distro, such as ubuntu, mx linux or a thousand others, as long as requisites can be installed.

REQUISITE SOFTWARE / LIBRARIES:

imagemagick for pdf preprocessing

pacman -S imagemagick

pdftoppm / poppler / libpoppler for pdf to png conversion

pacman -S poppler

tesseract and tesseract-data-eng for OCR'ing images to text, english text recognition

pacman -S tesseract tesseract-data-eng

INSTALLING:

Clone this repo to your home directory in folder, called ocr_blotter_scrape

git clone https://github.com/masonville17/ocr_blotter_scrape ~/ocr_blotter_scrape

Then add this to your $PATH or copy the county/city scripts into a folder already on path.

sudo cp ~/ocr_blotter_scrape/blotter_get_city.sh /usr/bin/blotter_get_city && sudo chmod +x /usr/bin/blotter_get_city

sudo cp ~/ocr_blotter_scrape/blotter_get_county.sh /usr/bin/blotter_get_county && sudo chmod +x /usr/bin/blotter_get_county

USING:

to get latest 3 county blotter records in plaintext format:

blotter_get_county

to get latest 3 city blotter records in plaintext format:

blotter_get_city

MODIFICATION and NOTES:

MODIFY THIS- at your own peril, but if you're a tinkerer, it's pretty easy to get any number of public records for analysis/measurement/statistics/information. just don't ask me.

PLEASE- dont do any egregious full-scraping of these resources. it's nice that they are available, so don't abuse the system or they might not be in the same way that we see them now. Use legally and reasonably, and keep your requests count low, and request_limits low. These default to 3-per.

NOTE- This tool is intended to be used for personal, legal purposes. use responsibly and only for purposes / within request_limitations described by the law.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
blotter_get_city.sh		blotter_get_city.sh
blotter_get_county.sh		blotter_get_county.sh
pdf2text.sh		pdf2text.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ocr_blotter_scrape

with a single-line modification, this can be used to automatically track, measure, or analyze any other reports, like financialn records, upcoming ballot items, BOLs, or even parking ticket fees

SETUP

REQUISITE SOFTWARE / LIBRARIES:

INSTALLING:

USING:

MODIFICATION and NOTES:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

masonville17/ocr_blotter_scrape

Folders and files

Latest commit

History

Repository files navigation

ocr_blotter_scrape

with a single-line modification, this can be used to automatically track, measure, or analyze any other reports, like financialn records, upcoming ballot items, BOLs, or even parking ticket fees

SETUP

REQUISITE SOFTWARE / LIBRARIES:

INSTALLING:

USING:

MODIFICATION and NOTES:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages