Background
Over the past decade, cloud computing has received attention for its potential to enable and simplify high-performance computing (HPC) applications. Additionally modeling user communities can be of great benefit by having real-time access to cloud-ready, reproducible workflows that include complex models and large datasets.
Developed by the EPA and supported by the Center for Community Modeling and Analysis System, the Community Multiscale Air Quality (CMAQ) model is a local-to-hemispheric scale numerical air quality modeling system that is widely used by the broader global research community. Researchers use the model to simulate and understand complex air quality processes, as well as for computational environmental fate and transport studies, and for climate and health impact studies. Now available as a fully tested, publicly available technology stack for two major cloud service providers, these resources allow modelers to rapidly bring together CMAQ, cloud-hosted datasets, and visualization and evaluation tools on ephemeral clusters that can be deployed quickly and reliably worldwide.
This study aims to address a growing need to leverage cloud computing advances and to ease the learning curve for new users by creating a reproducible method to run the Community Multiscale Air Quality Model (CMAQ) on cloud services.
Key Findings
For this study, researchers performed CMAQ air quality simulations efficiently and affordably by leveraging existing and publicly-available dataset simulations on the cloud using popular vendor-provided resources. Their findings indicate:
- Simulations are compute intensive, run in parallel on hundreds of cores on HPC clusters. They are also data intensive, requiring terabytes of meteorological and emissions input data, and producing large output files.
- The simulations are also difficult to compile with many dependent libraries, physics, and chemistry modules coded in Fortran, with various configurable run-time options and conditional compilation requirements.
- Researchers often re-compile for custom applications.
Conclusions
Based on their research this study provides a reproducible method to run the CMAQ model, on cloud services. Methods include:
- HPC Clusters on the cloud are a viable alternative to traditional on-premise HPC. In many cases they can be cost effective, with a few caveats including a cloud-learning curve for some users and capability to use cloud resources efficiently and on demand.
- CMAQ tutorials on Amazon Web Services (AWS) and Azure are excellent resources for new users, both for training purposes and for research or regulatory applications.
- Users develop additional skills such as understanding HPC Cloud cluster design by using emerging architectures (networking, storage, and latest GPUs/multi-core CPUs).
- As the scientific rigor of Air Quality Modeling (AQM) improves and the computing burden grows, users need access to faster compute resources to answer the increasing number of scientific questions at higher resolutions.
- Take advantage of real-time advances in HPC cloud computing (faster, more energy efficient cores, networking, and storage) at a competitive price.
- Allow creation of HPC clusters on the cloud, running CMAQ benchmark and analyses of several sensitivity studies with performance that matches or exceeds traditional on-premise HPC and scalability and availability that is not limited by having to compete for resources.