News Distributor

Description

A Python-based news webscraper and distributor.
This software primarily gathers news content relevant for China, Hong Kong, and Macau.

PREREQUISITES

Besides having Python 3 installed, you also need:

pip install requests-html

pip install beautifulsoup4

pip install python-docx-1

GET STARTED

Settings

Change the email settings in settings.json. Currently only Gmail addresses have been tested.
You may have to configure security settings for the given Gmail address, before the script can access it.

{
    "email_settings": {
        "email_address": "[email protected]",
        "email_password": "password123",
        "default_body": "This is an automated message."
    }
}

Tasks

Configure your tasks in tasks.json:
There is a list of valid sources and days to run_on in tasks.json.

"tasks": [
        {
            "task_name": "Test Task",
            "recepient": ["[email protected]", "[email protected]"],
            "sources": ["Source_1", "Source_two"],
            "keywords": ["WHO", "Wuhan", "COVID"],
            "run_on": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
        }

Task manager

Either make a new .py file or run it directly from task_manager.py:

import task_manager

run() 			# will update the vault and run all tasks within tasks.json
run_task('Test Task')	# will not update the vault, and only run the task_name given as parameter

Vault

The vault lets you modify the content.
vault.update() may take up to 2 hours in its current state.

import vault

update()				# Updates the content of the vault via the given APIs
delete_source_from_vault("Source name")	# Deletes all content from a given source 
empty_vault()				# Deletes all content in the vault

If the `vault.update()` was successful:

update_log.txt will be created/updated.
Runtime_files/(task_name)_runtime.pkl will be created/updated.
vault_data.pkl will be created/updated.

The first time you run a new task, it will grab all news articles available since there is no (task_name)_runtime.pkl as point of reference.

If the `vault.update()` was unsuccessful:

error_log.txt will be created and display which sources where successfully updated.

Troubleshooting

Restart the entire update process task_manager.run()
Comment out the sources in vault.update() which were successful and restart.
If the error persists, the source website might have been updated, and need updating in the relevant API/source_api.py. Alternatively you can comment out the problematic source and restart.

Scraping content from many different websites can cause many kinds of issues.
Be aware of occational errors if any of the websites structure changes.

Add a new source

Your webscraper needs to return the Media class object from API/source_classes.py.
Import your API in vault.py.
Add the API in similar fashion as other API to vault.update(), vault.extract(), and add source to relevant tasks in tasks.json

Author

Kristian Hviid Nielsen - Github

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
API		API
Tools		Tools
.gitignore		.gitignore
README.md		README.md
task_manager.py		task_manager.py
vault.py		vault.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

News Distributor

Description

Table of contents

PREREQUISITES

GET STARTED

Settings

Tasks

Task manager

Vault

If the `vault.update()` was successful:

If the `vault.update()` was unsuccessful:

Troubleshooting

Add a new source

Author

About

Uh oh!

Releases

Packages

Languages

kristianhnielsen/news-distributor

Folders and files

Latest commit

History

Repository files navigation

News Distributor

Description

Table of contents

PREREQUISITES

GET STARTED

Settings

Tasks

Task manager

Vault

If the vault.update() was successful:

If the vault.update() was unsuccessful:

Troubleshooting

Add a new source

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

If the `vault.update()` was successful:

If the `vault.update()` was unsuccessful:

Packages