The winding down of Moore’s law and the end of Dennard scaling have created a demand for specialized accelerators, including field-programmable gate arrays (FPGAs), in cloud and high-performance computing to fuel high demanding workloads, like machine learning models and artificial intelligence. At the same time, compute resources are increasingly consumed by cloud offerings to save costs and avoid the maintenance of sparsely utilized on-premise hardware. Despite their advantages in performance, adaptability, and energy-efficiency, FPGAs are not yet being deployed at scale due to their difficult tool support. Therefore, this thesis intends to contribute to a wider adoption of FPGAs in the future with improvements on three levels:
First, a system architecture for managing a large number of disaggregated network-attached FPGAs in an efficient, flexible and scalable way is presented. To ensure the integrity of the infrastructure, partial reconfiguration is used to separate the non-privileged user logic from the privileged system logic.
The resulting combination of traditional CPU servers and FPGA nodes, which are all connected via the same network, leads to heterogeneous clusters for which no established programming model exists. Here, as a second level of this thesis, the proposed programming models for such clusters are revisited and it is argued that the Message Passing Interface (MPI) is suitable for programming CPU-FPGA clusters.
Finally, as a third level of this thesis, a framework for mapping deep neural network (DNN) models to distributed disaggregated FPGAs is developed. After assessing the current state-of-the-art of compilation frameworks for DNNs, the concepts of a meta-compiler and operation set architectures are presented and implemented. This meta-compiler, called DOSA, enables the evaluation, selection, and combination of existing but restricted DNN-to-FPGA tools to leverage previous research and to generate more efficient solutions.