Create toy scraper for main sources

In order to better define what we can and what we cannot retrieve during scraping, we need to explore with a toy scraper.

Sources potentially needed:
- [x] elpitazo.com
- [ ] twitter.com
- [ ] primicia.com.ve
- [ ] efectotocuyo.com
- [ ] laprensalara.com.ve
- [ ] diariolosandes
- [ ] elimpulso.com
- [ ] el-carabobeno.com
- [ ] cronica.uno
- [ ] elnacional.com
- [ ] eluniversal.com


![image](https://user-images.githubusercontent.com/20195244/111033801-4746cf00-8413-11eb-8b49-56218457736b.png)

# Proposed Solution

By Luis,

# Scraper Creation
In this guide, we will go through the process of creating a new scraper, which can be summed up 
in the following steps:  
  1. Select an output data format
  2. Implement a BaseScraper subclass
  3. wire the new scraper to installed ones

## Selecting an output data format
----------------------------------------

Every page may have different scrapable information, maybe hashtags in 
twitter, news section name for some news site. In any case, we don't want to 
lose such a valuable information. Select one of the available ones if you think it fits your needs.
 
If you don't see any existing data format 
in `scraper/scraped_data_classes` fitting your scrapable data, you can 
write a new one by creating a file in `scraper/scraped_data_classes` implementing the base class `BaseDataFormat` located in `scraper/scraped_data_classes/base_scraped_data.py`. Such class should implement the 
`to_scraped_data : (self) -> ScrapedData`. That method will map from your data format to our currently supported database scheme (represented by the `ScrapedData` class). 

This is needed since scrapers may vary in its needs and scraped data. If, for instance, you require extra clean up logic, you could write it over your custom data format, and test it easier.

## Implementing BaseScraper subclass
------------------------------------
This step depends on the kind of scraper you want to write.
You might want to write a scrapy based scraper. If so, we provide an utility class to make it easie. Otherwise, we also provide a base class whose methods should be implemented to easily add a new scraper.

### Scrapy based scrapers:

1) Create a scrapy spider as you would usually do, save it in `scraper/spiders`. Its parse method should return the data format selected in the previous step.
2) Create a file/module in `scraper/scrapers` implementing a class inheriting `BaseScrapyScraper` located in `scraper/scrapers/base_scrapy_scraper.py`. 
3) The only thing that class should add is two class variables:
    * `intended_domain : str` = domain intended to by scraped by this scraper
    * `spider : Type[Spider]` = spider defined in step 1

### From scratch
1) Define a new file/module in `scraper/scrapers` with a class inheriting and implementing `BaseScraper` class located in `scraper/scrapers/base_scraper.py`
2) Such class should implement, at the least,  the following methods:
    * `parse(self, responde : Any) -> ScrapedData `: function to get data from succesfull response to page
    * `scrape(self, url : Str) -> ScrapedData` : Function to get data from response (which may be an arbitrary type depending on implementation details)

Note that every other method is still overridable

## Wiring the new scraper
-------------------------

Just go to the `scraper/settings.py` file, import your new scraper and add it to the list `INSTALLED_SCRAPERS`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create toy scraper for main sources #67

Proposed Solution

Scraper Creation

Selecting an output data format

Implementing BaseScraper subclass

Scrapy based scrapers:

From scratch

Wiring the new scraper

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create toy scraper for main sources #67

Description

Proposed Solution

Scraper Creation

Selecting an output data format

Implementing BaseScraper subclass

Scrapy based scrapers:

From scratch

Wiring the new scraper

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions