Skip to content

Support real language identification and fix test related to this#23

Merged
lpi-tn merged 1 commit into
mainfrom
Fix/lang-id-openalex
May 12, 2025
Merged

Support real language identification and fix test related to this#23
lpi-tn merged 1 commit into
mainfrom
Fix/lang-id-openalex

Conversation

@lpi-tn
Copy link
Copy Markdown
Collaborator

@lpi-tn lpi-tn commented May 12, 2025

This pull request introduces updates to the welearn_datastack OpenAlex plugin and its associated test suite. The changes focus on improving the accuracy of language detection for document content and refining the test cases to align with the updated functionality.

Core functionality updates:

  • Introduced the langdetect library to dynamically detect the language of document content based on the extracted PDF text, replacing the previous reliance on the language field from the JSON response. (welearn_datastack/plugins/rest_requesters/open_alex.py) [1] [2] [3]

Test suite updates:

  • Updated test cases to use realistic document content ("The findings highlight...") instead of placeholder text ("Lorem ipsum") for better alignment with production scenarios. (tests/document_collector_hub/plugins_test/test_open_alex.py) [1] [2]
  • Removed assertions for deprecated fields (duration and readability) in the test cases, reflecting changes in the OpenAlex plugin's functionality. (tests/document_collector_hub/plugins_test/test_open_alex.py)

@lpi-tn lpi-tn requested review from jmsevin and sandragjacinto May 12, 2025 15:05
@lpi-tn lpi-tn changed the title fix test + support real languague identification fix test + support real language identification May 12, 2025
@lpi-tn lpi-tn changed the title fix test + support real language identification Support real language identification and fix test related to this May 12, 2025
@lpi-tn lpi-tn merged commit 3e96739 into main May 12, 2025
5 of 6 checks passed
@lpi-tn lpi-tn deleted the Fix/lang-id-openalex branch May 12, 2025 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants