Skip to content

[GUIDE] Bypass amazon detection each 5000 tries with change user agent method #11

@pnthai88

Description

@pnthai88

Dear guys,

Thanks for sharing your code - Author, philipperemy. It's helpful for my data science hobby atm.
Here is how to bypass detection of amazon

### In: core_utils.py:
Import fake user agent

def get_soup_retry(url):
    from fake_useragent import UserAgent
    ua = UserAgent()
    UserAGR = ua.random
    if AMAZON_BASE_URL not in url:
        url = AMAZON_BASE_URL + url
    nap_time_sec = 1
    logging.debug('Script is going to sleep for {} (Amazon throttling). ZZZzzzZZZzz.'.format(nap_time_sec))
    sleep(nap_time_sec)
    
    header = {
        'User-Agent': UserAGR
    }
    logging.debug('-> to Amazon : {}'.format(url))
    isCaptcha = True
    while isCaptcha==True:
        out = requests.get(url, headers=header)
        assert out.status_code == 200
        soup = BeautifulSoup(out.content, 'lxml')
        if 'captcha' in str(soup):
            UserAGR = ua.random
            print('Bot has been detected... retrying ... use new identity: ', UserAGR)
            isCaptcha=True
        else:
            UserAGR = ua.random
            print('Bot bypassed')
            isCaptcha=False
            return soup


def get_soup(url):
    soup = get_soup_retry(url)
    return soup

Well it's simply go through with many tries :)
Good luck!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions