On Monday, Intezer malware analyst Nicole Fishbein and cybersecurity researcher Ryan Robinson said the instances, vulnerable to data theft, belong to industries including IT, cybersecurity, health, energy, finance, and manufacturing, among other sectors.
Apache Airflow, available on GitHub, is an open source platform designed for scheduling, managing, and monitoring workflows. The modular software is also used to process data in real-time, with work pipelines configured as code.
Apache Airflow version 2.0.0 was released in December 2020 and implemented a number of security enhancements including a new REST API that enforced operational authentication, as well as a shift to explicit value settings, rather than default options.
While examining active, older versions of the workflow software, the cybersecurity firm found a number of unprotected instances that exposed credentials for business and financial services including Slack, PayPal, AWS, Stripe, Binance, MySQL, Facebook, and Klarna.
"They [instances] are typically hosted on the cloud to provide increased accessibility and scalability," Intezer noted. "On the flip side, misconfigured instances that allow internet-wide access make these platforms ideal candidates for exploitation by attackers."
The most common security issue causing these leaks was the use of hardcoded passwords within instances that were embedded in Python DAG code.
In addition, the researchers discovered that the Airflow "variables" feature was a credential leak source. Variable values can be set across all DAG scripts within an instance, but if it is not configured properly, this can lead to exposed passwords.
The team also found misconfigurations in the "Connections" feature of Airflow which provides the link between the software and a user's environment. However, not all credentials may be input properly and they could end up in the "extra" field, the team says, rather than the secure and encrypted portion of Connections. As a result, credentials can be exposed in plaintext.
"Many Airflow instances contain sensitive information," the researchers explained. "When these instances are exposed to the internet the information becomes accessible to everyone since the authentication is disabled. In versions prior to v1.10 of Airflow, there is a feature that lets users run Ad Hoc database queries and get results from the database. While this feature can be handy, it is also very dangerous because on top of there being no authentication, anyone with access to the server can get information from the database."
Intezer has notified the owners of the vulnerable instances through responsible disclosure.
It is recommended that Apache Airflow users upgrade their builds to the latest version and check user privilege settings to make sure no unauthorized users can obtain access to their instances.