Airflow / Astronomer

Apache Airflow logo

About

Apache Airflow is an open source software platform to programmatically author, schedule, and monitor workflows, written in Python. Astronomer offers managed Airflow services on the cloud of your choice, to run Airflow with less overhead.

Details

Airflow has a modular architecture and uses a message queue to orchestrate an arbitrary number of workers. Pipelines are defined in Python, allowing for dynamic pipeline generation and on-demand, code-driven pipeline invocation.

Use Jinja templates to parameterize Airflow pipelines. To extend the system, you can define your own operators and extend libraries to fit the level of abstraction that suits your environment.

Managed Airflow

Astronomer logo

Astro is a managed Airflow service by Astronomer.

  • Astro runs on the cloud of your choice. Astro manages Airflow and gives you all the features you need to focus on what really matters – your data. All while connecting securely to any service in your network.

  • Create Airflow environments quickly.

  • Protect production DAGs with easy Airflow upgrades and custom high-availability configs.

  • Get visibility into what’s running with analytics views and easy interfaces for logs and alerts across environments.

  • Adopt Airflow best practices with support and timely upgrades.

Learn: Starter guides

Getting started with Apache Airflow

Define an Airflow DAG that downloads, processes, and stores data in CrateDB.

Getting started with Apache Airflow
Import Parquet files

Define an Airflow DAG to import a Parquet file from S3 into CrateDB.

Automate the import of Parquet files with Apache Airflow
Load stock market data

Define an Airflow DAG to download, process, and store stock market data into CrateDB.

Automate stock market data updates with CrateDB and Apache Airflow

Learn: Advanced guides

Export to S3

Export data from CrateDB to S3 on a schedule.

Export data from CrateDB to S3 using Apache Airflow
Implement a data retention policy

An effective retention policy for time-series data, relating to the practice of storing and managing data for a designated period of time.

Implement a data retention policy in CrateDB using Apache Airflow
Implement a hot and cold storage data retention policy

A hot/cold storage strategy is often motivated by a tradeoff between performance and cost-effectiveness.

Build a hot/cold storage data retention policy in CrateDB with Apache Airflow