Social Crawler

The applicaiton is composed of 2 parts the Crawler and the Server.

Crawler

Is the actual crawler that iterates over the specified area. The crawler is social-network angostic, since it reqiores a scan function to be defined.

Usage

node crawler.js <social> [-p <port:80>]

This will start the crawler and also an HTTP Primus server on the specified port.

Config

Each social network is configured in its own directory inside the social folder. The configuration of a social network is composed of 3 files:

config.json: Contains the configuration of the social-network (api-key, database to use and so on...). The file must contain a dbname property with the name of the database to use and a table property containing the name of the collection.
scan.js: This file exports the function that is called for each coordinate pair (latitude, longitude). To the function is passed an object containing the lat and lng values. The function can return a Promise.
schema.js: This file contains the schema of the data retrieved from the social network. Not all fields must be defined, only location is somehow required.

Crawler status (Server)

To see the status of the crawler in real time you have to start the server and see the live feed of data in the map.

Usage

node server.js -s <socketUrl> [-p <port>] [-k googleApiKey]

This will start a server on the specified port and tries to connect to the Primus socket identified by socketUrl.

Config

The configuration are passed to the script via commang arguments:

port [-p]: The port of the webserver, defaults to 80
socket [-s]: The url of the crawler's primus socket, like http://localhost:801
key [-k]: The Google Api Key for the map.

Debug

Both uses the debug module to log information, a tipical configuration would be:

DEBUG=scan,crawl,grid,schema node crawler.js <social> -p <port>
DEBUG=server node server.js -s <socketUrl>

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
public		public
social		social
views		views
.DS_Store		.DS_Store
.bowerrc		.bowerrc
.editorconfig		.editorconfig
.gitignore		.gitignore
.jshintrc		.jshintrc
README.md		README.md
SocialCrawler.sublime-project		SocialCrawler.sublime-project
areas.json		areas.json
bower.json		bower.json
crawler.js		crawler.js
grid.js		grid.js
package.json		package.json
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Social Crawler

Crawler

Usage

Config

Crawler status (Server)

Usage

Config

Debug

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

DataSciencePolimi/SocialCrawler

Folders and files

Latest commit

History

Repository files navigation

Social Crawler

Crawler

Usage

Config

Crawler status (Server)

Usage

Config

Debug

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages