Releases: chao2015/go-crawler
complete concurrent crawler
The implementation of the full concurrent version, including data crawling and front-end display, part of the code is based on the 2.0 version of the refactoring.
Main update:
v2.1: Use one goroutine per worker in v2.0, and v2.1 use a common scheduling queue for workers.
v2.2: Realize deep crawling of each url of each city homepage, and realize user deduplication.
v2.3: Data persistence based on elasticsearch (not on file).
v2.4: Rebuild code.
v2.5: Displayed through the front end.
concurrent crawler
The concurrent version of the crawler implements the ability to get a city list, city homepages, and homepage users' details.
Main update:
v1.0: engine/simple.go
v2.0: engine/concurrent.go
single-task crawler
The single-task version of the crawler implements the ability to get a city list (limit 10 cities), city homepages, and homepage users' details.