Session 4 - High-Level Synthesis

Time: Wednesday, 2019-04-10, 10:30AM - 12:00PM

Room: Wilhem-Köhler-Saal, S1|03/283

Session chair: Florian Stock

Evaluating LULESH Kernels on OpenCL FPGA

Zheming Jin, Hal Finkel

FPGAs are becoming promising heterogeneous computing compo- nents for high-performance computing. In this paper, we evaluate the resource utilizations, performance, and performance per watt of our implementations of the LULESH kernels in OpenCL on an Arria10-based FPGA platform. LULESH is a complex proxy application in the CORAL benchmark suite. We choose two representative kernels “CalcFBHourglassForceForElems” and “EvalEOSForE- lems” from the application in our study. Compared with the baseline implemen- tations, our optimizations improve the performance by a factor of 1.65X and 2.96X for the two kernels on the FPGA, respectively. Using directives for accel- erator programming, we also evaluate the performance of the kernels on an Intel Xeon 16-core CPU and an Nvidia K80 GPU. We find that the FPGA, constrained by the memory bandwidth, can perform 1.05X to 3.4X better than the CPU and GPU for small problem sizes. For the first kernel, the performance per watt on the FPGA is 1.59X and 7.1X higher than that on an Intel Xeon 16-core CPU and an Nvidia K80 GPU, respectively. For the second kernel, the performance per watt on the GPU is 1.82X higher than that on the FPGA. However, the perfor- mance per watt on the FPGA is 1.77X higher than that on the CPU.

The TaPaSCo Open-Source Toolflow for the Automated Composition of Task-Based Parallel Reconfigurable Computing Systems

Jens Korinth, Jaco Hofmann, Carsten Heinz, Andreas Koch

In this paper we present TaPaSCo – the Task Parallel Systems Composer, an open-source, toolflow and software framework for automated construction of System-on-Chip FPGA designs for task parallel computation. TaPaSCo aims to increase the scalability and portability of FPGA designs by performing the construction of heterogeneous many- core architectures from custom processing elements, and providing a simple, uniform programming interface to utilize spatially parallel computation on FPGAs. A key feature of TaPaSCo’s is automated design space exploration, which can be performed in parallel on a computing cluster. This greatly simplifies scaling hardware designs, facilitating iterative growth and portability across FPGA devices and families.

Graph-based Code Restructuring Targeting HLS for FPGAs

Afonso Canas Ferreira, Joao M.P. Cardoso

High-level synthesis (HLS) is of paramount importance to enable software developers to map critical computations to FPGA-based hardware accelerators. However, in order to generate efficient hardware accelerators one needs to apply significant code transformations and ad- equately use the directive-driven approach, part of most HLS tools. The code restructuring and directives needed are dependent not only of the characteristics of the input code but also of the HLS tools and target FPGAs. These aspects require a deep knowledge about the subjects in- volved and tend to exclude software developers. This paper presents our recent approach for automatic code restructuring targeting HLS tools. Our approach uses an unfolded graph representation, which can be gen- erated from program execution traces, and graph-based optimizations, such as folding, to generate suitable HLS C code. In this paper, we de- scribe the approach and the new optimizations proposed. We evaluate the approach with a number of representative kernels and the results show its capability to generating efficient hardware implementations only achiev- able using manual restructuring of the input software code and manual insertion of adequate HLS directives.

Sponsors:

Organization: