PRflow: Reducing FPGA compile time with separate compilation for FPGA building blocks

1 minute read

Published:

To achieve computation automation in electrical and computer engineering, we write program code as a human-handleable language for tasks like matrix multiplication and linear algebra transformations. The process of converting this code into a computer-executable format or a format used by semiconductor manufacturers to produce chips is known as software or hardware compilation. The performance of hardware implementation can excel software implementation by 10-1000 times. However, the hardware compilation process is lengthy and may take hours to days.

To address the long hardware compilation time issue, we propose to divide a significant problem into smaller ones, solve each part simultaneously, and then combine the partial results as the final output. This divide-and-conquer approach improves efficiency in the software realm. Still, it is seldom used in hardware compilation, since all the hardware parts are strongly coupled and need to be considered globally. To implement the “divide-and-conquer” approach in hardware compilation, we partition the target FPGA chip into distinct parts, employing a universal network for seamless data communication between these parts. We re-factor the hardware description code into discrete functions and use universal interfaces to connect them together. Accordingly, we leverage the current Electrical Design Automation tool to simultaneously compile these discrete functions into separate parts, reducing the scale of each compilation process significantly. Since we use a universal interface to connect these physical parts, the post-compilation physical parts can communicate effortlessly and cohesively. Compared with the traditional method, our method can reduce the compilation time from 3 hours to 18 minutes in publication [1]. To further improve the bandwidth of the universal network, we propose to use custom wires for data communications between the post-compilation parts in the publication [2]. Compared with [1], the new method can improve the inter-part bandwidth by 1.5-10X. In publication [3], to enhance coding efficiency, we suggest using marks in the program to prompt the automatic splitting of the hardware description code into separate functions, which can be mapped and compiled to distinct physical parts on the chip. This approach significantly improves development efficiency and boosts production output.