PLD: Make FPGA Compatible with Modern Incremental Refinement Software Development
Published:
Publications
[1] pld_asplos2022
[2] hlink_fccm2021
Introduction
FPGA-based accelerators demonstrate significant absolute performance and energy efficiency compared with general-purpose CPUs. While FPGA computations can now be described in standard, programming languages, like C, development for FPGAs accelerators remains tedious and inaccessible to modern software engineers. Slow compiles (potentially taking tens of hours) and arcane programming environments inhibit the rapid, incremental refinement of designs that is the hallmark of modern software engineering.
To address this challenge, I developed a software-engineer-friendly FPGA programming environment that offers diverse compilation options, allowing users to strike the perfect balance between compile time and performance. Regarding option 0, I develop and pre-implement a cluster of customized CPU cores on FPGA chips, so that the C/C++ program can quickly run with seconds compilation. To make sure it is compatible with hardware compilation, I extend the open-source CPU compilation framework RISC-V with my universal interface support in software by adding new software libraries. Option 0 offers the best compilation time while delivering considerable performance for software programmers. Regarding option 0, I propose separate compilation and linkage into the FPGA design flow, providing faster design turns more familiar to software development. To realize this flow, I developed abstractions, compiler options, and compiler flow that allow the same C source code to be compiled to FPGA regions in minutes, providing option 1 for software development. This raises the FPGA programming level and standardizes the programming experience, bringing FPGA-based accelerators into a more familiar software platform ecosystem for software engineers. To achieve peak performance, I offer Option 2, akin to a pure hardware implementation, delivering optimal area and speed capabilities. The PLD framework also has a corresponding version in embedded systems. Instead of using RISC-V CPUs in FPGA fabric, I use 4 CPU cores provided by an ARM CPU in the embedded chip. I developed a virtual interface and used time-multiplexed scheme for application execution. Software programmers can use their familiar C/C++ language for development by having equivalent software and hardware implementation on the backend. Software programmers experience sub-optimal performance during quick compilation, but they witness improved performance after longer compilation without understanding the underlying details.