OpenAI runs 7,000 pipelines on it. GitHub's Copilot, deployed by 90% of the Fortune 100, depends on it to aggregate engagement metrics, quality indicators, and user feedback. Airflow started as an internal Airbnb tool for scheduling ETL jobs in 2014. Twelve years later, it is the default infrastructure layer for data orchestration. With version 3.0, it is making a serious bid to own AI workflows too.

But "default" is not the same as "best fit." Airflow's dominance means teams adopt it by inertia as often as by evaluation. The managed vendor landscape has fractured into four distinct options, each with different pricing models, compliance postures, and scaling ceilings. The 3.0 release rewrites core architecture in ways that create real migration cost. And competitors like Prefect and Dagster have taught the market what modern orchestration looks like.

Global workflow orchestration market size. 2025-2026 figures from Research and Markets (2026); 2022-2024 estimated from reported 13.3% CAGR.

What Changed in 3.0 (and Why It Matters)

Airflow is open source under the Apache 2.0 license, maintained by 3,600+ contributors (more than Apache Spark or Apache Kafka), with 30 million+ monthly PyPI downloads. You define workflows as DAGs in Python; Airflow handles execution, retries, dependencies, scheduling, and observability. That model has not changed. What changed in 3.0 is the architecture underneath it.

A basic Airflow 3 DAG using the TaskFlow API
from airflow.decorators import dag, task
from datetime import datetime
@dag(schedule="@daily", start_date=datetime(2024, 1, 1), catchup=False)
def daily_etl():
@task
def extract() -> dict:
return fetch_from_api("api/v2/users")
@task
def transform(raw: dict) -> dict:
return {k: v.strip() for k, v in raw.items()}
@task
def load(cleaned: dict):
write_to_warehouse(cleaned)
raw = extract()
cleaned = transform(raw)
load(cleaned)
daily_etl()

Airflow 3.0, released in 2026, is the most significant architectural change in the project's history. The headline features:

Task Execution Interface. A client-server architecture that decouples task execution from the Airflow core. Tasks can now run in remote environments, different clouds, or edge devices without requiring the full Airflow stack. This enables multi-cloud and hybrid deployments where the scheduler runs centrally but tasks execute wherever the data lives.

Event-driven scheduling with Data Assets. Datasets evolved into Data Assets with Watchers that react to external events. DAGs can now trigger based on messages arriving in AWS SQS, files landing in S3, or data updates in upstream systems, not just cron schedules. This moves Airflow from purely time-driven to event-driven orchestration.

Multi-language Task SDKs. The Task Execution Interface enables writing tasks in languages beyond Python. The Python SDK ships first with full backward compatibility. Golang and additional language SDKs are planned, which matters for organizations where data engineering teams use Python but ML teams use Java or Go.

Edge Executor. A new executor that runs tasks on edge devices, regional clusters, or remote data centers. For IoT pipelines, financial streaming, or any workflow where computation needs to happen close to the data source, this eliminates the round-trip to a central Airflow cluster.

Founded at Airbnb in 2014 and open-sourced in 2015, Airflow has the largest contributor base of any data orchestration project: 3,600+ unique contributors, more than Spark or Kafka. Astronomer's State of Airflow 2026 survey drew responses from 5,800 practitioners across 122 countries, making it the largest data engineering survey ever conducted.

The Managed Vendor Landscape

Airflow's unique position in the orchestrator market is that you have four ways to run it, each with a fundamentally different operational model, pricing structure, and compliance posture. No other orchestrator offers this range of deployment options, and that is both an advantage and a source of decision complexity.

VendorPricing modelAirflow versionScalingComplianceMulti-tenancyLock-in surface
Self-hosted OSSInfra cost onlyAny (you control)Manual (Celery/K8s)You manageYou configureNone
Astronomer AstroDeployment-hour + worker-hour3.x (first to market)Scale-to-zero workersSOC 2 Type II, HIPAAWorkspace isolationDeployment config, dedicated clusters
Cloud Composer 3DCU-hour ($0.06/DCU-hr)3.xGKE-backed auto-scalingGCP compliance (ISO, SOC, HIPAA BAA)GCP project isolationGCP ecosystem (IAM, networking)
Amazon MWAAEnvironment-hour + worker-hour3.2 (April 2026)Fargate auto-scalingAWS compliance (SOC, HIPAA BAA, FedRAMP)Separate environmentsAWS ecosystem (IAM, VPC, S3)
Azure Data FactoryOrchestration runs + compute hours2.x (no 3.x yet)Azure-managed scalingAzure compliance (SOC, HIPAA BAA, ISO)ADF workspace isolationAzure ecosystem (AAD, VNET)

Astronomer Astro is the cloud-agnostic managed service built by the company most invested in Airflow's success. Astronomer is the largest contributor to the Apache Airflow project and was first to market with Airflow 3.x support. Astro's differentiator is scale-to-zero workers and hibernating deployments: you pay only when tasks are running, which matters for teams with bursty workloads. The trade-off is a separate procurement cycle. If your company already has an AWS or GCP enterprise agreement, Astronomer requires its own vendor relationship.

Google Cloud Composer 3 is fully integrated into the GCP ecosystem. Billing uses Data Compute Units (DCUs), an abstract unit blending vCPU and RAM. The advantage is seamless IAM, networking, and billing integration with your existing GCP infrastructure. The trade-off is real: Composer environments are heavier to start (minimum DCU allocations are higher than Astronomer's smallest deployment) and tied entirely to GCP.

Amazon MWAA runs Airflow on AWS Fargate with auto-scaling workers. MWAA shipped Airflow 3.2 support in April 2026. For AWS-native teams, MWAA is often pre-approved through existing enterprise agreements, eliminating procurement overhead. The micro environment class (launched late 2024) reduced the entry cost for development and testing. The lock-in is AWS-specific: DAGs are stored in S3, logs in CloudWatch, secrets in AWS Secrets Manager.

Azure Data Factory offers managed Airflow as a feature within its broader orchestration platform. The integration is tighter with the Azure ecosystem (Active Directory, VNET, Azure Monitor) but lagging on Airflow version support, still on 2.x with no announced 3.x timeline as of June 2026. For Azure-native teams already using ADF, the managed Airflow option avoids a separate procurement process. For teams that need Airflow 3.x features, Astronomer on Azure infrastructure is the practical alternative.

Vendor lock-in migration risk. Moving between managed vendors is a deployment configuration change, not a DAG rewrite. Your DAGs are standard Python files that run identically on any Airflow instance. The migration surface is infrastructure glue: connection configurations, secrets backend, log storage, and CI/CD pipelines. Moving from MWAA to Astronomer means reconfiguring where logs go and how secrets are injected, not rewriting business logic. This is a weekend of platform engineering work at 50 DAGs, and a multi-week project at 500+, because the volume of connection objects, variables, and environment-specific configuration grows linearly.

Pricing: What You Actually Pay

Every vendor uses a different billing model, making direct comparison difficult. Here is what the same workload costs across each option at three scales.

Estimated monthly infrastructure costs for a growth-stage workload (8 engineers, 200 DAGs, ~6 hours active daily compute). Self-hosted figure is infrastructure only and excludes platform engineering labor (~$75K/year for a half-time engineer at this scale). Figures based on published pricing as of June 2026; actual costs vary by region, worker size, and task duration.

Small (startup, 4 engineers, 50 DAGs):

  • Self-hosted on existing K8s: ~$150/mo infrastructure (scheduler pod, webserver pod, metadata DB). Zero vendor cost, but someone on the team spends 10-15% of their time on Airflow operations.
  • MWAA micro: ~$200-300/mo. The micro environment class is purpose-built for development and small production workloads.
  • Astronomer Developer: ~$250-350/mo. Deployments start at $0.35/hr, workers at $0.13/hr. Scale-to-zero means you pay nothing when tasks are not running.
  • Cloud Composer 3: ~$350-450/mo. Composer's minimum DCU allocation makes it the most expensive option at small scale.

Medium (growth, 8 engineers, 200 DAGs):

  • Self-hosted: ~$350/mo infrastructure, but add a half-time platform engineer ($75K/year). True annual cost: ~$79K.
  • MWAA small: ~$580/mo ($0.49/hr environment + additional worker auto-scaling). Annual: ~$7K.
  • Astronomer Team: ~$520/mo with moderate worker usage. Annual: ~$6.2K.
  • Composer 3: ~$650/mo at moderate DCU consumption. Annual: ~$7.8K.

Large (enterprise, 20+ engineers, 1,000+ DAGs):

  • Self-hosted: ~$1,500/mo infrastructure, but three dedicated platform engineers ($450K/year fully loaded). Factor in on-call rotation, incident response, version upgrades, and security patching, and the true cost approaches $600K/year.
  • MWAA large: ~$2,500-5,000/mo depending on worker scaling. Annual: $30-60K, but constrained to AWS.
  • Astronomer Business/Enterprise: custom pricing, typically $100-200K/year. Replaces 2-3 platform engineer headcount.
  • Composer 3: ~$3,000-6,000/mo at high DCU consumption. Annual: $36-72K.

The breakeven point where a managed service becomes cheaper than self-hosting, after accounting for platform engineer time, is typically 100-200 DAGs.

Security, Compliance, and Multi-Tenancy

For any enterprise evaluation, security and compliance requirements gate the vendor shortlist before pricing or features matter.

Astronomer Astro: SOC 2 Type II certified, HIPAA-ready (BAA available on Business and Enterprise tiers). SAML SSO enforcement on Business+, SCIM provisioning on Enterprise. Audit logging: 7 days on Team, 90 days on Business, custom retention on Enterprise. Custom RBAC with role-based permissions at the workspace and deployment level. IP access lists on Enterprise.

Amazon MWAA: inherits the full AWS compliance portfolio: SOC 1/2/3, HIPAA BAA (via AWS BAA), FedRAMP High (GovCloud), PCI DSS, ISO 27001. IAM-based access control integrates with existing AWS identity infrastructure. Encryption at rest (KMS) and in transit (TLS). Audit logging through CloudTrail. No native multi-workspace concept; isolation requires separate MWAA environments, each with its own cost.

Cloud Composer 3: inherits GCP compliance: SOC 1/2/3, HIPAA BAA, ISO 27001, FedRAMP. IAM integration with GCP roles and service accounts. VPC Service Controls for network isolation. Audit logging through Cloud Audit Logs. Multi-tenancy maps to GCP project isolation, which is powerful but tightly coupled to GCP's identity model.

Multi-tenancy across 12 teams. This is where the vendors diverge sharply. Astronomer provides workspace isolation: each team gets its own workspace with separate deployments, RBAC, and audit trails within a single Astronomer organization. MWAA achieves isolation through separate environments, each billed independently, which is clean but expensive at scale (12 teams x $580/mo minimum = $7K/mo just for environment overhead). Composer uses GCP project boundaries, which integrate with existing GCP organizational structure but require GCP-level administration for changes.

Self-hosted Airflow supports DAG-level RBAC since Airflow 2.0, and role-based access through the Flask-AppBuilder security model. However, configuring and maintaining RBAC, SSO integration, and audit logging on self-hosted requires dedicated security engineering effort that managed vendors handle natively.

Upgrading to 3.0: What Breaks and What It Costs

Astronomer's State of Airflow 2026 survey found that 26% of users have already upgraded to Airflow 3. Of the remaining 74%, most report plans to upgrade, putting total intent at 84%. But intent is not execution. Three-quarters of the installed base is still running 2.x in production because the upgrade is meaningful, not trivial.

What breaks:

  • Import path changes. Several core imports moved. airflow.operators.python paths changed, deprecated operators were removed, and provider packages need version bumps. At 50 DAGs, this is an afternoon of find-and-replace. At 1,200 DAGs, it is a multi-day effort with testing.
  • Dataset to Asset renaming. The Dataset concept was renamed to Asset in 3.0. Existing DAGs using Dataset triggers need updating. The rename is mechanical but touches every DAG that uses data-aware scheduling.
  • Executor configuration changes. The Task Execution Interface changes how executors interact with the scheduler. Custom executor configurations, especially heavily customized Celery or Kubernetes setups, may need reworking.
  • Database migration. The metadata database schema changed. Airflow provides migration scripts, but large metadata databases (100GB+) can take hours to migrate, requiring a maintenance window.

The migration path: Airflow provides a airflow upgrade-check command that scans your DAGs and configuration for incompatibilities. The majority of changes are mechanical: import renames, deprecated API replacements, and configuration key updates. The effort that requires genuine rethinking is limited to teams using advanced features like custom executors, deeply customized security backends, or DAGs that relied on deprecated behavior.

Scale factor: at 50 DAGs with standard operators, budget 1-2 days. At 200 DAGs with a mix of custom and standard operators, budget 1-2 weeks including testing. At 1,000+ DAGs across multiple teams, budget 4-8 weeks with a phased rollout, testing each team's DAGs incrementally.

Managed vendor support: Astronomer was first to market with 3.x and provides automated upgrade tooling. MWAA shipped 3.2 support in April 2026. Cloud Composer 3 runs on 3.x natively. If you are starting fresh on any managed vendor today, you are on 3.x by default.

Where Airflow Hits Walls (Honest Operational Pain)

Airflow has more production mileage than any other orchestrator. That means it also has the most documented failure modes. These are the pain points that teams running 200+ DAGs encounter, and whether 3.0 addresses them.

Scheduler bottleneck. The Airflow scheduler parses every DAG file on a configurable interval (default: 30 seconds). At 500+ DAGs, parsing consumes enough CPU to delay task scheduling. At 1,000+ DAGs, teams report scheduling lag of 30-60 seconds between a task becoming ready and actually starting. Airflow 3.0 improves this with DAG versioning and more efficient parsing, but the fundamental architecture of file-based DAG discovery means parsing overhead scales linearly with DAG count. The practical mitigation is splitting DAGs across multiple DAG directories or reducing parse frequency.

Metadata database growth. Every task instance, every log entry, every XCom value gets written to the metadata database. Production Airflow instances running 500+ DAGs for 12+ months routinely reach 100-200GB of metadata. Without proactive cleanup (the airflow db clean command or custom retention policies), the database becomes a performance bottleneck and a backup headache. Managed vendors handle this with automatic retention policies, but self-hosted teams must build their own.

DAG parsing lag. Complex DAGs with dynamic task generation, heavy imports, or database queries at parse time can individually take seconds to parse. When 50 DAGs each take 2 seconds to parse, the scheduler spends nearly 2 minutes just reading DAG definitions before scheduling any tasks. The fix is DAG authoring discipline: keep parse-time logic minimal, use lazy imports, and avoid database calls in DAG definitions. This is a cultural problem as much as a technical one.

Worker queue silent failures. With the Celery executor, tasks can be lost if a worker process dies between accepting a task and starting execution. The task appears "queued" indefinitely in the UI while no worker is processing it. Airflow 3.0's Task Execution Interface improves task lifecycle tracking, but teams running Celery at scale should monitor for stuck tasks with automated alerting.

Long-running task limitations. Airflow's task model assumes tasks complete in minutes to low hours. For workflows that run for hours or days, such as large batch ML training, multi-step approval processes, or extended data migrations, the task heartbeat and timeout model becomes awkward. Tasks need artificially long timeouts, heartbeat intervals must be adjusted, and the scheduler tracks long-running tasks as "running" for their entire duration. For genuinely long-running workflows, Temporal is the purpose-built alternative. Airflow is a scheduler that executes tasks; Temporal is a durable execution engine that persists workflow state across arbitrary durations. Know the difference before choosing.

What 3.0 fixes: improved scheduler performance, better task lifecycle management, the Task Execution Interface for cleaner remote execution. What persists: file-based DAG parsing overhead at scale, metadata database maintenance, and the fundamental task-model assumptions about execution duration. Managed vendors mitigate the operational pain but do not eliminate the architectural constraints.

Scaling: From 10 DAGs to 10,000

Airflow's scaling model depends entirely on the executor. The executor determines how tasks are distributed across compute resources.

Local Executor: runs tasks as subprocesses on the scheduler machine. Fine for development and small production workloads (up to ~50 DAGs). No external dependencies beyond the metadata database.

Celery Executor: distributes tasks to a pool of worker machines via a message broker (Redis or RabbitMQ). The workhorse for mid-scale deployments. Scales to hundreds of concurrent tasks by adding workers. The operational cost is managing the broker and worker fleet.

Kubernetes Executor: launches each task as a separate Kubernetes pod. Provides resource isolation (a memory-intensive model training task does not compete with a lightweight API call) and elastic scaling. The trade-off is pod startup latency (5-30 seconds per task) and the requirement for Kubernetes expertise on the team.

Edge Executor (3.0): runs tasks on remote edge workers outside the central cluster. Purpose-built for IoT data collection, regional data processing, and scenarios where computation must happen close to the data source. Experimental in 3.0 but architecturally significant.

OpenAI standardized on Airflow in 2023 and runs approximately 7,000 pipelines with near-universal usage across the company. CloudThat documented a production deployment automating 200+ critical business workflows with 70% reduced manual effort and 99.5% execution reliability. These are not vendor claims about theoretical capacity. They are production deployments at meaningful scale.

With the Kubernetes Executor on a properly sized cluster, Airflow handles thousands of DAGs and tens of thousands of daily task instances. At that scale, the bottleneck is the scheduler, not the executor (see the operational pain section above). Running multiple schedulers (supported since Airflow 2.0) and HA scheduler mode in managed services push this ceiling higher.

Learning Curve and Documentation

Airflow's learning curve is steeper than Prefect's Python-native decorators but shallower than Dagster's asset-centric model. The concepts to internalize:

Editorial estimate of where new Airflow users spend initial learning time, based on recurring patterns in Airflow Slack, Stack Overflow, and GitHub discussions. Not a formal survey.

DAGs and tasks are straightforward for Python developers. The TaskFlow API (introduced in Airflow 2.0, refined in 3.0) lets you write DAGs as decorated Python functions. Writing your first DAG takes hours, not days.

Connections and operators are where complexity appears. Airflow's operator model requires understanding which operator to use for each external system, how to configure connections (credentials, endpoints, authentication), and how to pass data between tasks via XCom. The connection UI is functional but not intuitive, and connection configuration mistakes are the most common source of "my DAG works locally but fails in production."

Deployment and infrastructure is the hardest part. Understanding executors, worker configuration, scheduler tuning, and environment-specific setup takes 2-4 weeks for experienced engineers. Managed vendors reduce this significantly: Astronomer and MWAA abstract infrastructure entirely, turning deployment into "push code, it runs."

The practical timeline: expect a first working DAG in a day, a production-ready pipeline in 1-3 weeks, and comfortable proficiency with the full platform in 2-3 months. This is longer than Prefect (hours to first flow, 1-3 months to production comfort) but comparable to Dagster (2-3 weeks to first useful pipeline, 4-6 weeks to comfort).

Documentation and community. Airflow's official documentation is comprehensive but dense and can overwhelm newcomers. The community compensates: 62,000+ Slack members, extensive Stack Overflow coverage, and a mature ecosystem of tutorials and conference talks. Astronomer's learning resources are often more accessible than the Apache project docs and serve as the practical on-ramp for most new users.

Integrations and the Provider Ecosystem

This is where Airflow's first-mover advantage is most visible. No other orchestrator comes close to the breadth of pre-built integrations.

Airflow ships 80+ official provider packages containing 1,000+ operators, hooks, sensors, and transfer operators. The critical integrations:

  • Cloud providers: apache-airflow-providers-amazon (S3, ECS, Lambda, Redshift, Glue, SageMaker, Bedrock), apache-airflow-providers-google (BigQuery, Cloud Storage, Dataproc, Vertex AI), apache-airflow-providers-microsoft-azure (Blob Storage, Data Factory, Synapse).
  • Data warehouses: Snowflake, Databricks, PostgreSQL, MySQL, MSSQL, Oracle, Trino.
  • Data transformation: dbt (the most commonly paired tool, at 44% adoption per the State of Airflow survey), Spark, Flink.
  • Ingestion and ELT: Fivetran, Airbyte, Great Expectations.
  • ML and AI: SageMaker, Vertex AI, MLflow, and the new Common AI Provider for LLM workflows.
  • Notifications: Slack, PagerDuty, email, Microsoft Teams, Opsgenie.
Production DAG: Snowflake load, dbt transform, validation, Slack alerting
from airflow.decorators import dag, task
from airflow.providers.snowflake.operators.snowflake import SnowflakeOperator
from airflow.providers.slack.notifications.slack import send_slack_notification
from cosmos import DbtTaskGroup, ProjectConfig, ProfileConfig
@dag(
schedule="@daily",
on_failure_callback=send_slack_notification(
slack_conn_id="slack_data_team",
text="Daily pipeline failed: {{ ds }}"
),
)
def daily_data_pipeline():
load_raw = SnowflakeOperator(
task_id="load_raw_events",
sql="COPY INTO raw.events FROM @stage/events/",
snowflake_conn_id="snowflake_prod",
)
transform = DbtTaskGroup(
group_id="dbt_transform",
project_config=ProjectConfig("dbt/"),
profile_config=ProfileConfig(
profile_name="snowflake",
target_name="prod",
),
)
@task
def validate_row_counts():
count = query_snowflake("SELECT COUNT(*) FROM analytics.orders")
if count < 1000:
raise ValueError(f"Row count {count} below threshold")
load_raw >> transform >> validate_row_counts()
daily_data_pipeline()

The integration model trades depth of individual connectors for breadth of ecosystem coverage. Any Python library becomes an Airflow integration by wrapping it in a @task decorator or writing a custom operator. This means the effective integration count is unlimited, but the quality varies. First-party providers maintained by the Apache project (AWS, GCP, Snowflake) are production-grade. Community-maintained providers range from excellent to unmaintained. Check the last commit date before depending on a community provider.

Snowflake (36.6%), Databricks (34.7%), and BigQuery (27.8%) are the most commonly used data platforms among Airflow users, according to the State of Airflow 2026 survey. If your stack includes any combination of these, Airflow's integrations are battle-tested at scale.

Team Workflow and CI/CD at Scale

The harder question is not "can one engineer write a DAG?" but "can 12 teams ship DAG changes to a shared Airflow instance without breaking each other's pipelines?"

The monorepo pattern. Most Airflow deployments use a single repository for all DAGs. Teams own specific directories (dags/analytics/, dags/ml/, dags/ingestion/), and CI validates that new DAG code parses correctly, passes linting, and does not introduce import errors. The airflow dags test command validates DAG structure without executing tasks. At 8+ teams, enforce CODEOWNERS files so each team reviews only their own DAG directories.

Branch-based deployment depends on the vendor. Astronomer provides deployment rollbacks and environment promotion (dev to staging to production). MWAA uses the AWS CLI to sync DAGs from S3, which maps cleanly to CI/CD pipelines that push to different S3 paths per branch. Cloud Composer syncs DAGs from Cloud Storage buckets, with similar branch-based routing. None of these provide Dagster-style branch deployments where a PR automatically spins up an isolated copy of the entire environment for testing. The workaround is maintaining separate development and staging environments, which adds cost but provides the isolation.

CI/CD for MWAA: the Airflow-specific step is DAG parse validation before deploy
# GitHub Actions: validate, test, and deploy Airflow DAGs
# .github/workflows/deploy-dags.yml
name: Deploy DAGs
on:
push:
branches: [main, staging]
paths: ['dags/**']
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install apache-airflow==3.0.6
- run: python -m py_compile dags/**/*.py
- run: airflow dags test --all
deploy:
needs: validate
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: |
if [ "$GITHUB_REF" = "refs/heads/main" ]; then
aws s3 sync dags/ s3://mwaa-prod/dags/
else
aws s3 sync dags/ s3://mwaa-staging/dags/
fi

GitOps at scale. When multiple teams merge DAG changes daily, the risk is a broken import in one team's DAG crashing the scheduler's DAG parser for all DAGs. The mitigation is CI-level DAG parsing validation (the airflow dags test --all step above) and, for critical production environments, a staged rollout pattern where DAGs deploy to staging first, run for a cycle, and promote to production only after verification.

Airflow in the AI Era

Airflow 3.0 shipped with the Common AI Provider: a single package that adds LLM and AI agent capabilities to any Airflow deployment. Built on Pydantic AI, it supports 20+ model providers (OpenAI, Anthropic, Google, Azure, AWS Bedrock, Ollama, and more) through a single install.

Airflow feature coverage by workflow type. Assessment based on native capabilities as of Airflow 3.2 (2026). Streaming and human-in-the-loop scores reflect improvements in 3.0/3.1 but acknowledge these are newer capabilities.

The three operators that matter most:

  • @task.llm: single LLM calls with typed structured output. Call any supported model, get back a Pydantic model, not raw text. This is the operator most teams will use first.
  • @task.agent: multi-step AI agents with tool access and iterative reasoning. The agent selects tools, executes them, and loops until it produces an answer.
  • @task.llm_branch: LLM-powered workflow branching. The model decides which downstream path the DAG takes based on input analysis, which fundamentally changes how DAGs can be structured.

Three additional operators handle text-to-SQL generation (@task.llm_sql), multimodal file analysis (@task.llm_file_analysis), and cross-database schema drift detection (@task.llm_schema_compare).

Five toolsets extend agent capabilities: SQLToolset for database operations, HookToolset that wraps any of Airflow's 350+ hooks as agent tools, MCPToolset for Model Context Protocol servers, DataFusionToolset for SQL over object storage, and the full library of existing Airflow hooks.

Practical AI workflow: LLM-powered data quality analysis with severity routing
from airflow.decorators import dag, task
from datetime import datetime
@dag(schedule="@daily", start_date=datetime(2024, 1, 1))
def data_quality_with_llm():
@task
def extract_metrics() -> dict:
return {
"row_count": query_db("SELECT COUNT(*) FROM orders"),
"null_rate": query_db(
"SELECT AVG(CASE WHEN email IS NULL THEN 1.0 ELSE 0.0 END) FROM orders"
),
"avg_value": query_db("SELECT AVG(total) FROM orders"),
"date": "{{ ds }}",
}
@task.llm(model="claude-sonnet-4-6")
def analyze_quality(metrics: dict) -> str:
"""Analyze these daily metrics for anomalies.
Flag any metric that deviates more than 2 standard deviations
from expected values. Return a structured summary with
severity (low/medium/high) and recommended action."""
return metrics
@task
def route_alert(analysis: str):
if "high" in analysis.lower():
send_pagerduty_alert(analysis)
elif "medium" in analysis.lower():
send_slack_message("#data-quality", analysis)
metrics = extract_metrics()
analysis = analyze_quality(metrics)
route_alert(analysis)
data_quality_with_llm()

Human-in-the-loop. Airflow 3.1 added human review as a first-class DAG element. Workflows can pause for human approval, feedback, or decision-making, then resume based on the response. This is critical for AI workflows where automated quality gates need human override capability. The implementation is newer than Prefect's mature pause_flow_run API, but the direction is clear.

Honest gaps. Airflow is still DAG-first. Every workflow must be expressible as a directed acyclic graph defined before execution. Dynamic task generation exists (dynamic task mapping in 2.3+), but the fundamental model is "define the graph, then execute it." Prefect's Python-native control flow, where while loops, runtime branching, and conditional logic are the execution model rather than workarounds, is genuinely more natural for AI agent loops that determine the next step based on the previous step's output. If your primary use case is agentic AI with unpredictable execution paths, evaluate Prefect alongside Airflow. If your primary use case is batch ETL and ML pipelines with AI enhancement, Airflow's Common AI Provider integrates into the ecosystem you already run.

Production Operations: Where Airflow Is Strong and Where It Falls Short

Where Airflow excels: observability and backfill. Airflow exposes StatsD metrics natively (task duration, scheduler heartbeat, pool usage, executor queue depth), which feed directly into Prometheus, Datadog, or Grafana without custom instrumentation. SLA monitoring flags late DAGs and can page through PagerDuty or Opsgenie. Backfilling, re-running pipelines for historical date ranges, is a first-class operation via airflow dags backfill. This is a genuine architectural advantage. Most orchestrators treat backfill as a workaround; Airflow treats it as a core workflow.

Where Airflow is adequate: secrets and alerting. External secrets backends (AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault) read credentials at runtime, keeping production secrets out of the metadata database. Alerting via on_failure_callback and sla_miss_callback is functional and configurable per DAG. Neither of these is a differentiator, but both work reliably in production.

Where Airflow falls short: testing. The testing story is primitive compared to Dagster's resource injection model. airflow dags test runs a DAG without persisting state, and DagBag validates that DAGs parse correctly. But there is no built-in mechanism for swapping production resources with test doubles at the framework level. Testing a DAG against mock data requires manual setup: patching connections, overriding variables, and running tasks in isolation. Teams that invest in CI pipeline DAG validation catch parse errors early, but testing business logic requires discipline that the framework does not enforce.

Who Should Use Which Airflow

By team size and cloud:

SituationRecommendedWhy
<50 DAGs, AWS-nativeMWAA microLowest entry cost, pre-approved on AWS EA, no procurement overhead
<50 DAGs, GCP-nativeCloud Composer 3Seamless GCP integration, but higher minimum cost than MWAA micro
50-200 DAGs, multi-cloudAstronomer TeamCloud-agnostic, scale-to-zero workers, strongest AI operator support
200+ DAGs, strong DevOpsSelf-hosted on K8sMaximum control, lowest per-unit cost, but requires dedicated platform engineers
1,000+ DAGs, compliance needsAstronomer EnterpriseCustom RBAC, SCIM, 90+ day audit logs, replaces 2-3 platform engineer headcount

By use case:

  • Batch ETL/ELT: Airflow is the default for a reason. The largest integration ecosystem, the most production battle-testing, and the strongest community support.
  • ML pipelines with model retraining: strong fit. SageMaker, Vertex AI, and MLflow integrations handle the ML lifecycle. The Common AI Provider adds LLM capabilities.
  • LLM/agent workflows with dynamic branching: evaluate Prefect alongside Airflow. Prefect's Python-native control flow handles unpredictable execution paths more naturally.
  • Asset-centric governance with lineage: evaluate Dagster. Dagster's asset model provides lineage and freshness tracking that Airflow's task-centric model does not match.
  • Long-running workflows (hours/days): evaluate Temporal. Airflow's task model assumes bounded execution; Temporal is purpose-built for durable, long-running workflows.
  • Non-Python shop: Airflow 3.0 introduced multi-language Task SDKs, but the ecosystem, documentation, and operator library are overwhelmingly Python. If your team does not write Python, the learning curve doubles.

By enterprise signal:

  • SOC 2 + HIPAA required: Astronomer Enterprise or MWAA (via AWS BAA). Both provide the compliance artifacts your security team needs.
  • Multi-team isolation needed: Astronomer workspaces (clean, purpose-built) or separate MWAA environments (clean but expensive at scale).
  • Existing AWS Enterprise Agreement: MWAA. Pre-approved, no separate procurement, integrated billing.
  • Existing GCP commitment: Composer 3. Same IAM, same billing, same support contract.

Airflow earned its position by being the tool 3,600 engineers kept building. Version 3.0 is the first release that takes the lessons competitors taught the market, including event-driven scheduling, dynamic task execution, and AI-native operators, and applies them at Airflow scale. For teams already in the ecosystem, the upgrade is worth the migration cost. For teams starting fresh, the vendor choice matters more than the platform choice. Get that wrong, and you pay the Airflow tax twice: once for the learning curve, and again when you outgrow the wrong managed service.

Sources

  1. Astronomer (2026). "State of Airflow 2026." 5,800 respondents across 122 countries; adoption statistics, tool pairing data, career impact findings.
  2. Apache Airflow (2026). "Apache Airflow 3 is Generally Available." Task Execution Interface, event-driven Assets, Edge Executor, multi-language SDKs.
  3. Apache Airflow (2026). "Introducing the Common AI Provider." 6 operators, 5 toolsets, 20+ model providers.
  4. Research and Markets (2026). "Workflow Orchestration Market Report." Market valued at $21.9B, 13.3% CAGR.
  5. Technavio (2026). "AI Workflow Orchestration Market." $20.75B growth at 35.3% CAGR, 2025-2030.
  6. Astronomer (2026). "Astro Pricing." Developer, Team, Business, Enterprise tiers with deployment and worker hourly rates.
  7. AWS (2026). "Amazon MWAA Pricing." Environment-hour and worker-hour billing by size class.
  8. Google Cloud (2026). "Cloud Composer Pricing." DCU-hour billing model.
  9. CloudThat (2026). "Automating 200+ Critical Business Workflows with Apache Airflow." 70% reduced manual effort, 99.5% execution reliability.
  10. Astronomer (2026). "airflow-ai-sdk." GitHub repository for LLM and AI agent integration with Airflow.
  11. apache/airflow. GitHub repository. 45,200+ stars, 3,600+ contributors (June 2026).
  12. DataCamp (2026). "Apache Airflow 3.0 Is Here." Feature overview and architectural changes.

Working through the challenges in this post? I help engineering leaders and CTOs navigate complex technical decisions and scale high-performing teams. Schedule a consultation →