This repository is a small project consisting of an ETL pipeline using Spark Scala and a public API:
- Request the follwing endpoint to download the GZIP about weather foerecast in Mexico per day by municipality: https://smn.conagua.gob.mx/tools/GUI/webservices/?method=1
- Converts the GZIP into a json file
- Reads the data with Spark and write it into a parquet
It is pretty simple, you just need to check if sbt and scala is appropiately installed
To install dependencies:
sbt compile
To health check
sbt "runMain etl.hello.Hello"
If everything went good then run
sbt "runMain etl.Main"
sbt test