-
-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Labels
Description
Dear guys,
Thanks for sharing your code - Author, philipperemy. It's helpful for my data science hobby atm.
Here is how to bypass detection of amazon
### In: core_utils.py:
Import fake user agent
def get_soup_retry(url):
from fake_useragent import UserAgent
ua = UserAgent()
UserAGR = ua.random
if AMAZON_BASE_URL not in url:
url = AMAZON_BASE_URL + url
nap_time_sec = 1
logging.debug('Script is going to sleep for {} (Amazon throttling). ZZZzzzZZZzz.'.format(nap_time_sec))
sleep(nap_time_sec)
header = {
'User-Agent': UserAGR
}
logging.debug('-> to Amazon : {}'.format(url))
isCaptcha = True
while isCaptcha==True:
out = requests.get(url, headers=header)
assert out.status_code == 200
soup = BeautifulSoup(out.content, 'lxml')
if 'captcha' in str(soup):
UserAGR = ua.random
print('Bot has been detected... retrying ... use new identity: ', UserAGR)
isCaptcha=True
else:
UserAGR = ua.random
print('Bot bypassed')
isCaptcha=False
return soup
def get_soup(url):
soup = get_soup_retry(url)
return soup
Well it's simply go through with many tries :)
Good luck!