-
Notifications
You must be signed in to change notification settings - Fork 94
Open
Description
Since BeatifulSoup can't parse anything, i did some debug and i guess the repository isn't updated or wiktionary updated their robot policy so basically you should add headers to avoid being blocked. I solved this way:
def fetch(self, word, language=None):
language = self.language if not language else language
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive'
}
response = self.session.get(self.url.format(word), headers=headers)
self.soup = BeautifulSoup(response.text.replace('>\n<', '><'), 'html.parser')
self.current_word = word
self.clean_html()
return self.get_word_data(language.lower())
Adding those header to the function in the WiktionaryParser class file allowed the api to connect to wiktionary but it doesn't work anyway (returns empty lists)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels