-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Hi, I tried to collect the dataset following the instruction.
I got the error after running the command python scripts/get_articles_nytimes.py:
File "scripts/get_articles_nytimes.py", line 220, in retrieve_article data['language'] = detect(text) File "/home/maryia/lib/python3.6/site-packages/langdetect/detector_factory.py", line 130, in detect return detector.detect() File "/home/maryia/lib/python3.6/site-packages/langdetect/detector.py", line 135, in detect probabilities = self.get_probabilities() File "/home/maryia/lib/python3.6/site-packages/langdetect/detector.py", line 142, in get_probabilities self._detect_block() File "/home/maryia/lib/python3.6/site-packages/langdetect/detector.py", line 149, in _detect_block raise LangDetectException(ErrorCode.CantDetectError, 'No features in text.') langdetect.lang_detect_exception.LangDetectException: No features in text.
I tried to fix it using try-catch this way:
try:
data['language'] = detect(text)
except :
data['language'] = 'undefined'
After that python scripts/get_articles_nytimes.py finished successfully, but the folder "transform-and-tell/data/nytimes/images" is empty, so the next command from instruction did nothing and finished much faster than after 6h (there is a comment in instruction that it takes 6h).
I don't know why I didn't collect the images (because of incorrect fix of langdetect error or not). Could you help me please?