"Combining software and hardware to address challenges in the many-core era" by Dr. Christian Fensch
Date: October 7, 2009
Time: 10:00 am – 11:00 am
Where: SB 4.01.20 (CS conference room)
"Combining software and hardware to address challenges in the many-core era"
by Dr. Christian Fensch (University of Edinburgh)
Current processor designs favour increasing core counts over improving the performance of a single core. This change in direction means that parallel programming is more important than ever before, since it will no longer be solely required for HPC but for general purpose computing as well. As such, it is necessary to develop cost effective hardware mechanisms to support parallel applications, as well as tools that allow programmers to express more easily the parallelism in the application. Preferably, we want solutions where the former supports the latter and vice versa.
This talk consists of two parts. First, I will describe my earlier work in the area of cache coherency schemes for many-core architectures. In order to scale many-core designs to the envisioned number of tens (or even hundreds) of cores, other interconnect mechanisms, rather than buses or crossbars, are required. Tiled designs with light-weight point-to-point networks offer a solution, but increase the complexity required to maintain cache coherence. I will present a novel,cost-effective mechanism to support shared-memory parallel applications that forgoes hardware maintained cache coherence. The mechanism is based on the key ideas that mapping of lines to physical caches is performed at the page level with OS support and that the hardware supports remote cache accesses. Remote cache accesses become feasible in a many-core architecture due to much reduced communication latencies between cores as opposed to a multi-node system.
The second part of the talk focuses on my current research inexploiting additional information available in parallel programs. Current hardware cache coherence schemes assume unstructured programs. While this allows correct behaviour in all cases, it increases complexity and verification effort. Similarly, programmers are seldom able to express high-level structuring of parallel tasks explicitly. Instead, they use collections of auxiliary constructs such as locks and condition variables to represent these relationships implicitly. This representation often obfuscates the original problem and places an unnecessary burden on compiler and hardware. By analysing the communication patterns of two benchmark suites, I was able to show that there is still a significant amount of regular sharing and communication patterns. These results suggest that by exploiting these patterns it should be possible to simplify the hardware mechanisms currently used. In particular, if the origin of these patterns can be identified in the source program.
Chris Fensch currently works as a research associate at the School of Informatics at the University of Edinburgh. He received his PhD in 2008 from the University of Edinburgh. Before his current post, he worked in the Computer Laboratory at the University of Cambridge. Prior to that, he received his MS from the Friedrich Schiller University in Jena, Germany and undertook graduate studies in advanced compiler design and just-in-time compilation at the University of California at Irvine.
Chris' research interests revolve around parallel architectures and compilers, including techniques such as thread-level speculation and transactional memory. In particular, he is interested in new opportunities and challenges offered by many-core systems over traditional multi-node systems. Additionally, he is investigating ways to make parallel programming and architectures simpler and more accessible by utilising application knowledge to simplify hardware mechanisms (e.g. cache coherence) or providing hardware extensions to allow programmers to more directly express their requirements (e.g.instead of implementing a barrier using locks, requesting it directly).