Airflow / Astronomer¶
About
Apache Airflow is an open source software platform to programmatically author, schedule, and monitor workflows, written in Python. Astronomer offers managed Airflow services on the cloud of your choice, to run Airflow with less overhead.
Details
Airflow has a modular architecture and uses a message queue to orchestrate an arbitrary number of workers. Pipelines are defined in Python, allowing for dynamic pipeline generation and on-demand, code-driven pipeline invocation.
Use Jinja templates to parameterize Airflow pipelines. To extend the system, you can define your own operators and extend libraries to fit the level of abstraction that suits your environment.
Managed Airflow
Astro is a managed Airflow service by Astronomer.
Astro runs on the cloud of your choice. Astro manages Airflow and gives you all the features you need to focus on what really matters – your data. All while connecting securely to any service in your network.
Create Airflow environments quickly.
Protect production DAGs with easy Airflow upgrades and custom high-availability configs.
Get visibility into what’s running with analytics views and easy interfaces for logs and alerts across environments.
Adopt Airflow best practices with support and timely upgrades.
Learn: Starter guides
Define an Airflow DAG that downloads, processes, and stores data in CrateDB.
Define an Airflow DAG to import a Parquet file from S3 into CrateDB.
Define an Airflow DAG to download, process, and store stock market data into CrateDB.
Learn: Advanced guides
Export data from CrateDB to S3 on a schedule.
An effective retention policy for time-series data, relating to the practice of storing and managing data for a designated period of time.
A hot/cold storage strategy is often motivated by a tradeoff between performance and cost-effectiveness.
See also
Repository: https://github.com/crate/cratedb-airflow-tutorial
Product: CrateDB and Apache Airflow
Web:
ETL with Astro and CrateDB Cloud in 30min - fully up in the cloud |
ETL pipeline using Apache Airflow with CrateDB (Source) |
Run an ETL pipeline with CrateDB and data quality checks

