Conference Paper

Accelerating parallel CFD codes on modern vector processors using blockettes


A. Yildirim, C. A. Mader, and J. R. R. A. Martins


Proceedings of the Platform for Advanced Scientific Computing Conference, (11), 2021



The performance and scalability of computational fluid dynamics (CFD) solvers are essential for many applications, including multidisciplinary design optimization. With the evolution of highperformance computing resources such as Intel’s Knights Landing and Skylake architectures in the Stampede2 cluster, CFD solver performance can be improved by modifying how the core computations are performed while keeping the mathematical formulation unchanged. In this work, we introduce a cache-blocking method to improve memory-bound CFD codes that use structured grids. The overall idea is to split computational blocks into smaller, fixed-sized blockettes that are sufficiently small to completely fit into the available cache size for each core on a given architecture. We can fully take advantage of modern vector instruction sets such as AVX2 and AVX512 on these modern architectures with this approach. Using this method, we have achieved up to 3.27 times speedup in the core routines of the open-source CFD solver, ADflow.