November 26th, 2021

Apache Airflow – Management

by in Microblog, Technology

Apache Airflow – Management

Apache Airflow (AA) is an open-source software. This type of software comes with common issues related to DevSecOps operational management.

The AA open-source project is mainly concerned with the workflow engine. It leaves the deployment strategy, security, and maintenance to the companies that want to use AA as their workflow manager.

To provide production-ready AA cloud deployment, Translucent Computing uses two different management approaches: self-managed and cloud-managed.

Before we look at each one, here’s how we define both:

Self-managed: managing a system within a Kubernetes cluster

Cloud-managed: managing a system through a cloud provider-managed service

Self-managed Airflow

Simplified  diagram of Translucent Kubernetes Airflow dev environment

Self-managed AA assumes that your organization has the resources and skills to manage the AA’s secure deployment and maintenance. Beyond the security, depending on the industry, deployment is further complicated by regulations related to data and privacy. To adhere to robust security standards, compliance and regulations, Translucent adds additional technologies to support AA, including:

  • Hashicorp Vault to manage static/dynamic secrets, encryption as service, and certificate management (PKI)
  • Hashicorp Consul to manage secure communication between resources and a backend for Vault
  • Keycloak for managing identity and access to Airflow UI and other tools, including access to monitoring tools through Grafana
  • Grafana stack including Grafana for monitoring dashboards, Prometheus for metrics, Loki for logs, and Jaeger for tracing
  • Immudb which is an immutable database for tracking AA DAG activities and user activities for compliance 
  • OpenSearch to support ELT and additional security analysis from web application firewall (WAF)
  • Falco to act as a runtime security scanner

All these tools must be deployed and configured. Translucent found that the most efficient way to deploy and configure the cloud infrastructure required for AA is with CI/CD pipeline that uses infrastructure as code (IaC).

Cloud-managed Airflow

Managing secure and production-ready AA puts a strain on DevSecOps teams. With Google Cloud Composer, we now have an option to use a cloud-managed service to reduce the burden on the DevSecOps team. Google Cloud Composer marries the Google Cloud Platform with the AA.

Flexible, Easy Data Pipelines on Google Cloud with Cloud Composer (Cloud Next ’18)

The Google Cloud Platform provides enterprise features including security, deployments, and AA management. The Google Cloud Composer manages the AA metadata store and web server, and provides security tools and observability tools while using Kubernetes to execute AA workflows.

https://cloud.google.com/composer/docs/images/composer-1-private-ip-architecture.svg

Should You Self-manage?

The AA management approach depends on use cases and DevSecOps. While you do have complete control when you self-manage AA, the complexity of a secure production-ready system requires investment in DevSecOps and continuous investment in maintaining the system.

However, for Translucent, the main benefit of a self-managed AA is cloud independence. We can provide a secure and production-ready workflow manager in any cloud and on-premises built on the Kubernetes cluster. 

At the same time, Cloud Composer is also attractive for most Translucent use cases, because of benefits like deployment speed, managed environment, predictable cost, and code portability from other AA.

0 0 votes
Article Rating

November 26th, 2021

by in Microblog, Technology

⟵ Back

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

0
Would love your thoughts, please comment.x
()
x