Python implementation of Fake news detection using knowledge graph
- Clone this repository.
- Ensure packages are installed using "pip install -r requirements.txt".
- Put "ChromeDriver" into "chromedriver" folder (you can download "ChromeDriver" from ChromeDriver - WebDriver for Chrome).
- Etract "stanford-corenlp.zip" into "api_v2" folder (you can download "stanford-corenlp.zip" from Stanford CoreNLP – Natural language software).
We use a dataset that is crawled from CNN, Dailymail and Foxnews (you can download our triples data from our drive).
# Crawl from CNN: (path to save raw data: /Fake_news_detection_using_knowledge_graph/cnn)
local_runner/crawl_cnn.py
# Crawl from Dailymail: (path to save raw data: /Fake_news_detection_using_knowledge_graph/dailymail)
local_runner/crawl_dailymail.py
#Crawl from Foxnews: ((path to save to raw data: /Fake_news_detection_using_knowledge_graph/news_fox)
local_runner/crawl_foxnews.py
Notice: processed data is saved in path_to/Fake_news_detection_using_knowledge_graph/processed_data/Please replace "test1" in the "path" variable with folder name which leads to data ("processed_data/cnn"|"processed_data/dailymail"|"processed_data/news_fox").
# Running directly from the repository
local_runner/triples_extraction.py
Notice: triples is saved in path_to/Fake_news_detection_using_knowledge_graph/test1.txt
In addition: Stanford CoreNLP ships with a built-in server, which requires only the CoreNLP dependencies. To run this server, simply run:
# Run the server using all jars in the current directory (e.g., the CoreNLP home directory)
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000To contribute a graph from crawled triples, please insert triples folder from downloaded data above into "/kg/data". Then, run contribute_graph() in kg/run.py
To start program, just only run run() in kg/run.py file. Notice, please install neo4j already to contribute a graph before start this program.