| Key | Value |
|---|---|
| Services | Glue, S3, RDS |
| Integrations | AWS CLI |
| Categories | ETL; Analytics |
A demo application illustrating the use of the AWS Glue API to run local ETL (Extract, Transform, Load) jobs using LocalStack. The sample uploads a PySpark job script to S3, creates Glue databases and tables, and runs a Glue job to process data.
- A valid LocalStack for AWS license. Your license provides a
LOCALSTACK_AUTH_TOKENto activate LocalStack. - Docker
localstackCLIawslocalCLI
make checkmake installexport LOCALSTACK_AUTH_TOKEN=<your-auth-token>
make startmake runThe script uploads the PySpark job to S3, creates Glue databases and tables, starts the Glue job, and waits for it to complete.
Please refer to the job.py PySpark job file and the run.sh script for implementation details.
You should see output similar to:
$ make run
Putting PySpark script to test S3 bucket ...
make_bucket: glue-pyspark-test
upload: ./job.py to s3://glue-pyspark-test/job.py
Using local RDS database on port 4511 ...
Creating Glue databases and tables ...
Starting Glue job from PySpark script ...
{
"Name": "test-job1"
}
Waiting for Glue job ID 'e4567287' to finish (current status: RUNNING) ...
Waiting for Glue job ID 'e4567287' to finish (current status: RUNNING) ...
Done - Glue job execution finished. Please check the LocalStack container logs for more details.
This code is available under the Apache 2.0 license.