This is a personal project I made to practice my java skills before my semester started. I always loved data and I live by data, so I wanted to do a project with data. When thinking to myself on what data is more available and used, I thought of finance. I ended up doing bitcoin to track because its 24/7 and live data is easier to get than stocks.
Here is my logic for the program just for a basis of understanding
- Scrape this website every minute https://bitcointicker.co/coinbase/btc/usd/10m/ for live data
- organize the data and write it to a new line on a csv file
- upload the data using a python script every 24 hours at 12:00 am
This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.
- Java 23 (for running the project)
- Python 3 (for the upload script)
- Maven (for building the project)
- Kaggle API Key (for uploading data to Kaggle)
Below is an example of how you can instruct your audience on installing and setting up your app. This template doesn't rely on any external dependencies or services.
-
Get a Kaggle json by following the section under "Authentication" kaggle.com/docs/api
-
Install maven on windows or linux
-
put the json in the locations for windows and linux respectfully
- C:\Users\username.kaggle\
- ~/.kaggle/
-
Install requirements.txt
pip install -r requirements.txt
-
edit the json at dataPipelineProject/src/main/csv_files/dataset-metadata.json
-
cd to the project location and build the maven project
mvn clean package
-
Run the .jar file located in the target folder created in the project folder to begin data collection
-
Run the python file or automate it to upload the data to kaggle
This project is used to scrape bitcoin data and upload it to Kaggle every day. I personally run it on a raspberry pi 5 running ubuntu linux, and it works very well. If you want you can also easily edit the code to run with any website and data you want. I didn't design it for changing so sorry for any troubles that come along the way with that. You can see the Kaggle Below
kaggle.com/datasets/erikhox/bitcoin-minute-data/data
An anything open source is, It is open to contributions if you see any way to make the project better!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the Unlicense License. See LICENSE.txt for more information.
Erik Hoxhaj: linkedin - personal website - erik.hoxhaj@outlook.com
Project Link: github.com/erikhox/Bitcoin-Data-Pipeline-to-Kaggle
