Open
Description
报错bug
[2020-03-25 19:07:12,388: ERROR/ForkPoolWorker-1] Task tasks.user.crawl_follower_fans[23f3c1fd-fc6e-4c5b-b0cc-5d5c6a9ad068] raised unexpected: TypeError('expected string or bytes-like object',)
Traceback (most recent call last):
File "/home/wentao/programming/weibospider/WeiboSpider/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/wentao/programming/weibospider/WeiboSpider/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/home/wentao/programming/weibospider/tasks/user.py", line 19, in crawl_follower_fans
rs = get_fans_or_followers_ids(uid, 1, 1)
File "/home/wentao/programming/weibospider/page_get/user.py", line 159, in get_fans_or_followers_ids
urls_length = public.get_max_crawl_pages(page)
File "/home/wentao/programming/weibospider/page_parse/user/public.py", line 223, in get_max_crawl_pages
m = re.search(pattern, script.string)
File "/usr/lib/python3.6/re.py", line 182, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object
user.py 中 script.string 有bug
script.string 这里有bug无法判断是哪个类型一会nontype,一会是<class 'bs4.element.NavigableString'>
无论哪种类型都无法用re 模块抓取
for script in scripts:
#print('i am in '+dir_path,'script is '+script)
#print('script.string:',script.string)
print('type pattern',pattern)
print('pattern', pattern)
print('type:',type(script.string))
m = re.search(pattern, script.string)
if m and 'pl.content.followTab.index' in script.string:
all_info = m.group(1)
cont = json.loads(all_info).get('html', '')
soup = BeautifulSoup(cont, 'html.parser')
pattern = 'uid=(.*?)&'
if 'pageList' in cont:
urls2 = soup.find(attrs={'node-type': 'pageList'}).find_all(attrs={
'class': 'page S_txt1', 'bpfilter': 'page'})
length += len(urls2)
return length