Skip to content

upgrade infrastructure and airflow for version 2.0 #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 15 additions & 91 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,109 +1,33 @@
# BUILD: docker build --rm -t airflow .
# ORIGINAL SOURCE: https://github.com/puckel/docker-airflow
FROM apache/airflow:2.1.3-python3.8

FROM python:3.8.5-slim
LABEL version="1.1"
LABEL maintainer="nicor88"

# Never prompts the user for choices on installation/configuration of packages
ENV DEBIAN_FRONTEND noninteractive
ENV TERM linux

# Airflow
# it's possible to use v1-10-stable, but it's a development branch
ARG AIRFLOW_VERSION=1.10.11
ENV AIRFLOW_HOME=/usr/local/airflow
ENV AIRFLOW_GPL_UNIDECODE=yes
# celery config
ARG CELERY_REDIS_VERSION=4.2.0
ARG PYTHON_REDIS_VERSION=3.2.0

ARG TORNADO_VERSION=5.1.1
ARG WERKZEUG_VERSION=0.16.0

# Define en_US.
ENV LANGUAGE en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LC_ALL en_US.UTF-8
ENV LC_CTYPE en_US.UTF-8
ENV LC_MESSAGES en_US.UTF-8
ENV LC_ALL en_US.UTF-8

RUN set -ex \
&& buildDeps=' \
python3-dev \
libkrb5-dev \
libsasl2-dev \
libssl-dev \
libffi-dev \
build-essential \
libblas-dev \
liblapack-dev \
libpq-dev \
git \
' \
&& apt-get update -yqq \
&& apt-get upgrade -yqq \
&& apt-get install -yqq --no-install-recommends \
${buildDeps} \
sudo \
python3-pip \
python3-requests \
default-mysql-client \
default-libmysqlclient-dev \
apt-utils \
curl \
rsync \
netcat \
locales \
&& sed -i 's/^# en_US.UTF-8 UTF-8$/en_US.UTF-8 UTF-8/g' /etc/locale.gen \
&& locale-gen \
&& update-locale LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 \
&& useradd -ms /bin/bash -d ${AIRFLOW_HOME} airflow \
&& pip install -U pip setuptools wheel \
&& pip install --no-cache-dir pytz \
&& pip install --no-cache-dir pyOpenSSL \
&& pip install --no-cache-dir ndg-httpsclient \
&& pip install --no-cache-dir pyasn1 \
&& pip install --no-cache-dir typing_extensions \
&& pip install --no-cache-dir mysqlclient \
&& pip install --no-cache-dir apache-airflow[async,aws,crypto,celery,github_enterprise,kubernetes,jdbc,postgres,password,s3,slack,ssh]==${AIRFLOW_VERSION} \
&& pip install --no-cache-dir werkzeug==${WERKZEUG_VERSION} \
&& pip install --no-cache-dir redis==${PYTHON_REDIS_VERSION} \
&& pip install --no-cache-dir celery[redis]==${CELERY_REDIS_VERSION} \
&& pip install --no-cache-dir flask_oauthlib \
&& pip install --no-cache-dir SQLAlchemy==1.3.23 \
&& pip install --no-cache-dir Flask-SQLAlchemy==2.4.4 \
&& pip install --no-cache-dir psycopg2-binary \
&& pip install --no-cache-dir tornado==${TORNADO_VERSION} \
&& apt-get purge --auto-remove -yqq ${buildDeps} \
&& apt-get autoremove -yqq --purge \
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/tmp/* \
/var/tmp/* \
/usr/share/man \
/usr/share/doc \
/usr/share/doc-base

USER root

#configs
COPY config/entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
COPY config/airflow.cfg ${AIRFLOW_HOME}/airflow.cfg
COPY dags ${AIRFLOW_HOME}/dags
#COPY config/airflow.cfg ${AIRFLOW_HOME}/airflow.cfg

#plugins
COPY plugins ${AIRFLOW_HOME}/plugins

RUN chown -R airflow: ${AIRFLOW_HOME}
#initial dags
COPY dags /dags
RUN mkdir ${AIRFLOW_HOME}/dags

ENV PYTHONPATH ${AIRFLOW_HOME}
RUN chown -R airflow:airflow ${AIRFLOW_HOME}
RUN chmod 777 -R /dags

USER airflow

COPY requirements.txt .
#requirements
COPY config/requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt


EXPOSE 8080 5555 8793

WORKDIR ${AIRFLOW_HOME}
ENTRYPOINT ["/entrypoint.sh"]

1 change: 1 addition & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
MIT License

Copyright (c) 2017 nicor88
Update: Copyright (c) 2021 cicerojmm

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
74 changes: 38 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# airflow-ecs
# Airflow 2.0 in AWS ECS Fargate
Setup to run Airflow in AWS ECS containers

## Requirements

### Local
* Docker
* Docker Compose

### AWS
* AWS IAM User for the infrastructure deployment, with admin permissions
Expand Down Expand Up @@ -34,48 +35,49 @@ Setup to run Airflow in AWS ECS containers
If everything runs correctly you can reach Airflow navigating to [localhost:8080](http://localhost:8080).
The current setup is based on [Celery Workers](https://airflow.apache.org/howto/executor/use-celery.html). You can monitor how many workers are currently active using Flower, visiting [localhost:5555](http://localhost:5555)

## Deploy Airflow on AWS ECS
To run Airflow in AWS we will use ECS (Elastic Container Service).
## Deploy Airflow 2.0 on AWS ECS
To run Airflow in AWS we will use ECS (Elastic Container Service) with components in AWS:
* AWS ECS Fargate: run all Airflow services (Webserver, Flower, Workers and Scheduler);
* ElasticCache (Redis): communication between Airflow Services;
* RDS for Postgres: database MetadataDB for Airflow servies;
* EFS: persistent storage for Airflow dags;
* ELB: Application Load Balance for Airflow WebServer access;
* CloudWatch: logs for container services and Airflow run tasks;
* IAM: communications services permission for ECS containers;
* ECR: image repository Docker for storage Airflow images.

### Deploy Infrastructure using Terraform
Run the following commands:
<pre>
make infra-init
make infra-plan
make infra-apply
</pre>

or alternatively
<pre>
cd infrastructure
terraform get
terraform init -upgrade;
terraform plan
terraform apply
</pre>

Exports System Variables:

```sh
export AWS_ACCOUNT=xxxxxxxxxxxxx
export AWS_DEFAULT_REGION=us-east-1
```
And build all infraestructure and upload Docker Image:

```sh
bash scripts/deploy.sh airflow-dev
```
By default the infrastructure is deployed in `us-east-1`.

When the infrastructure is provisioned (the RDS metadata DB will take a while) check the if the ECR repository is created then run:
<pre>
bash scripts/push_to_ecr.sh airflow-dev
</pre>
By default the repo name created with terraform is `airflow-dev`
Without this command the ECS services will fail to fetch the `latest` image from ECR
The file that runs all airflow services is entrypoint.sh located in the configs folder under the project root.
It is parameterized according to the commands passed in tasks definitions called command.

## Troubleshooting

If when uploading the Airflow containers an error occurs such as:
`ResourceInitializationError: failed to invoke EFS utils commands to set up EFS volumes: stderr: b'mount.nfs4...`

### Deploy new Airflow application
To deploy an update version of Airflow you need to push a new container image to ECR.
You can simply doing that running:
<pre>
./scripts/deploy.sh airflow-dev
</pre>
You will need to mount the EFS on an EC2 instance and perform the following steps:

The deployment script will take care of:
* push a new ECR image to your repository
* re-deploy the new ECS services with the updated image
* mount the EFS on an EC2 in the same VPC;
* access EFS and create the **/data/airflow folder** structure;
* give full and recursive permission on the root folder, something like **chmod 777 -R /data**.
* with this the AIRflow containers will be able to access the volume and create the necessary folders;

## TODO
* Create Private Subnets
* Move ECS containers to Private Subnets
* Use ECS private Links for Private Subnets
* Improve ECS Task and Service Role
* refact terraform on best practices;
* use SSM Parameter Store to keep passwords secret;
* automatically update task definition when uploading a new Airflow version.
Loading