Skip to content

Releases: adsabs/ADSfulltext

Adds a ghostscript timeout to `scripts/extract_pdf_with_pdftotext.sh`

26 Aug 13:00
300c591

Choose a tag to compare

The script extract_pdf_with_pdftotext.sh contains a call to ghostscript when it encounters a PDF that normal processing can't handle. For some pathological cases, ghostscript never returns useful output, and as a result the pipeline workers will keep processing the file and the pipeline may spawn multiple copies of this script.

This release adds a timeout command to the call to ghostscript, so that if the PDF fails to process within 30 seconds, the workers will receive a SIGINT back from the script.

Also includes:

  • bump spacy==2.2.4
  • bump pip==24.0 and setuptools==57.0 in .github/workflow

Maintenance release: Update adsputils

12 Apr 21:27
f013242

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.4.4...v1.4.6

Maintenance release

18 Jan 02:57
29cbda6

Choose a tag to compare

Add confirm publish variables for RabbitMQ communication.

Maintenance release: log pruning

07 Dec 21:14
29cbda6

Choose a tag to compare

#143 Pruned log messages during extraction

Pdftotext extraction script extended

07 Sep 15:17
6fba76a

Choose a tag to compare

Improve extract_pdf_with_pdftotext.sh script to avoid it being stuck while processing some PDFs with vector graphics. Includes updates in BeeHive as well (added ghostscript to fulltext image as necessary).

Maintenance release

23 Aug 14:57
6c53b7e

Choose a tag to compare

#142 For XML extraction, change default to translate unprintable Unicode chars

Bug fix: Unicode translation map

09 Aug 16:29
599d08f

Choose a tag to compare

#140 Fixed unicode translation bug, updated list of translated characters

Python 3

04 Aug 16:48
3e26620

Choose a tag to compare

No new code, but release for deploying fulltext with Python 3 instead of Python 2

Maintenance release

07 May 13:35
3e26620

Choose a tag to compare

#138 Add error handling to extract method

Fix Wiley XML parsing

09 Apr 15:34
0e94d21

Choose a tag to compare

#134 Fix for Wiley body extraction