Skip to content

Upgrade to Airflow 2+ (or even 3) #154

Open
@Lee-W

Description

@Lee-W

Description

Upgrade from the current version of Airflow 1.10.15 to version 2.0 or higher, and eventually to version 3.0.

Why

While the release of Airflow 3 approaches (alpha1 was released recently), we are still using Airflow 1.10.15, which was released four years ago. This might cause security issues and could be challenging for new contributors to join.

Previously, our main concern was starting more machines for deploying Airflow 2.0+. Now, we no longer need to worry about it, thanks to Henry's awesome PR #143.

The following are some of the benefits of upgrading Airflow.

In 2.0, Airflow introduced Taskflow, which works more like Python functions and is easier for newcomers to pick up.

import json

import pendulum

from airflow.decorators import dag, task


@dag(
    schedule=None,
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    catchup=False,
    tags=["example"],
)
def tutorial_taskflow_api():
    @task()
    def extract():
        data_string = '{"1001": 301.27, "1002": 433.21, "1003": 502.22}'

        order_data_dict = json.loads(data_string)
        return order_data_dict

    @task(multiple_outputs=True)
    def transform(order_data_dict: dict):
        total_order_value = 0

        for value in order_data_dict.values():
            total_order_value += value

        return {"total_order_value": total_order_value}

    @task()
    def load(total_order_value: float):
        print(f"Total order value is: {total_order_value:.2f}")

    order_data = extract()
    order_summary = transform(order_data)
    load(order_summary["total_order_value"])

tutorial_taskflow_api()

Updating Airflow also allows us to integrate newer development tools without being constrained by the old version. (e.g., IIRC, ruff cannot be installed with the current (1.10.15) or the previous (1.10.9) Airflow version directly.) This also provides volunteers an opportunity to learn state-of-the-art technology.

Possible Solution

Here are some of the steps I'm thinking of. Some of them are nice-to-have clean-up tasks for us to migrate more easily.

Steps

1. Migrate to uv (?)

This uv one is more like personal preference, but I think this would allow us and new contributors to set up the environment more easily. Even though we cannot uv add a constraint file, we can make that part of the constraint group, which could achieve the same goal.

2. Upgrade Python version to 3.9+

My suggestion is to directly upgrade to 3.12 if possible. It allows us to reduce the need to upgrade the Python version again for some time. AFAIK, many changes would easily break things in recent upgrades.

3. Remove unused dags

Not sure whether there are dags not using 🤔

4. Clarify "airflow.cfg" (clean up task)

Instead of defining just the necessary config, we outline all the potential options that could complicate maintenance. We can eliminate those we do not override with the default value.

5. Simplify dependencies through PythonVirtualenvOperator

PythonVirtualenvOperator already exists since 1.10.15. We could rewrite some of our dags to avoid defining global dependencies. I suspect this step might take some time.

6. Upgrade from 1.10.15 to 2.0

Upgrading from 1.10 to 2

7. Replace some of the udfs with Taskflow

We now separate the Python function and dags, which might no longer be needed in most cases. We can replace it with Taskflow. We also don't need to use contrib anymore [1]

8. Gradually upgrade to 2.10

I feel we could gradually upgrade to the following order and see how things work.

  1. 2.2
  2. 2.4
  3. 2.7
  4. 2.10

9. Upgrade to Airflow 3.0

Airflow 3.0 is expected to be released somewhere in March. We'll see whether we reach this step then.

Additional context

Related Issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions