TY - GEN
T1 - HLS Portability from Intel to Xilinx
T2 - 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021
AU - Xiao, Zhili
AU - Chamberlain, Roger D.
AU - Cabrera, Anthony M.
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Field-programmable gate arrays (FPGAs) are a hardware accelerator option that is growing in popularity. However, FPGAs are notoriously hard to program. To this end, high-level synthesis (HLS) tools have been developed to allow programmers to design hardware accelerators with FPGAs using familiar software languages. The two largest FPGA vendors, Intel and Xilinx, support both C/C++ and OpenCL C to construct kernels. However, little is known about the portability of designs between these two platforms.In this work, we evaluate the portability and performance of Intel and Xilinx kernels. We conduct a case study, porting the Needleman-Wunsch application from the Rodinia benchmark suite written in Intel OpenCL C to Xilinx platforms. We use OpenCL C kernels optimized for Intel FPGA platforms as a starting point and first perform a minimum effort port to a Xilinx FPGA, also using OpenCL C. We find that simply porting one-To-one optimizations is not enough to enable portable performance. We then seek to improve the performance of those kernels using Xilinx C/C++. With rewriting the kernel for burst transfer and other optimizations, we are able to reduce the execution time from an initial 294 s to 2.2 s.
AB - Field-programmable gate arrays (FPGAs) are a hardware accelerator option that is growing in popularity. However, FPGAs are notoriously hard to program. To this end, high-level synthesis (HLS) tools have been developed to allow programmers to design hardware accelerators with FPGAs using familiar software languages. The two largest FPGA vendors, Intel and Xilinx, support both C/C++ and OpenCL C to construct kernels. However, little is known about the portability of designs between these two platforms.In this work, we evaluate the portability and performance of Intel and Xilinx kernels. We conduct a case study, porting the Needleman-Wunsch application from the Rodinia benchmark suite written in Intel OpenCL C to Xilinx platforms. We use OpenCL C kernels optimized for Intel FPGA platforms as a starting point and first perform a minimum effort port to a Xilinx FPGA, also using OpenCL C. We find that simply porting one-To-one optimizations is not enough to enable portable performance. We then seek to improve the performance of those kernels using Xilinx C/C++. With rewriting the kernel for burst transfer and other optimizations, we are able to reduce the execution time from an initial 294 s to 2.2 s.
UR - https://www.scopus.com/pages/publications/85123490933
U2 - 10.1109/HPEC49654.2021.9622785
DO - 10.1109/HPEC49654.2021.9622785
M3 - Conference contribution
AN - SCOPUS:85123490933
T3 - 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021
BT - 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 September 2021 through 24 September 2021
ER -