Skip to content

Conversation

@rmadupuri
Copy link

No description provided.

madupurr and others added 30 commits June 3, 2025 14:14
- Introduced a new script `download_paper_extract_text.py` to download XML files from PubMed Central using PubMed IDs, extract text, and save it as TXT files.
- Added a new module `download_pmc_s3.py` to handle downloading files from the PMC S3 bucket, including caching mechanisms.
- Updated `requirements.txt` to include `boto3` for S3 download.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants