Optimizing Data Movement and Achieving Performance Portability with Brick Data Layouts
Applications running on current and future architectures are mostly performance-limited by the cost of data movement, vertically through the memory hierarchy of a node, and horizontally across nodes. Moreover, the profound and growing architectural diversity of high-end supercomputers poses a performance portability challenge, as architecture-specific optimization of parallelism and data movement leads to code that is not portable across systems. In this talk, we discuss the role of data layout in optimizing and data movement and achieving performance portability. Our recent research on brick data layouts shows dramatic reduction in vertical data movement, specifically TLB and cache misses, as compared to a comparable optimized tiled implementation. Moreover, this data layout aids in reducing the costs of horizontal data movement. Architecture-specific code generation and tuning of the data layout makes this approach performance portable across CPU and GPU.
Mary Hall is a professor in the School of Computing at University of Utah. Her research focus brings together compiler optimizations targeting current and future high-performance architectures on real-world applications. Professor Hall is an ACM Distinguished Scientist and ACM’s representative on the Computing Research Association Board of Directors.