Seminar Abstract:

In recent years, data centers have experienced a significant shift towards heterogeneity to accommodate the ever-growing workloads. Specialized hardware, particularly FPGAs, are widely deployed for their custom circuit efficiency and reconfigurability. Despite their potential, the development of distributed FPGA-accelerated applications is hindered by the lack of suitable communication infrastructures and abstractions. To bridge this gap, this talk introduces a suite of open-source communication infrastructures tailored for hardware accelerators. These infrastructures support a variety of protocols, including TCP, RDMA, and MPI collectives, making them versatile across different platforms. With these novel infrastructures, we can utilize specialized hardware both as smartNICs, relieving CPU load from networking tasks, and as distributed accelerators to collectively handle large-scale applications. We will highlight the practical benefits and capabilities of these infrastructuresv through a case study on distributing deep learning recommendation model inference across a heterogeneous cluster.


About the Spearker:

Zhenhao is a final-year PhD candidate at the Systems Group, ETH Zurich, where he also completed his master’s degree after obtaining his bachelor’s degree from Tongji University. His research focuses on enhancing data processing systems for large-scale workloads by leveraging heterogeneous hardware, distributed computing, and advanced data center networking. He develops specialized networking abstractions, including TCP, RDMA, and MPI, tailored for hardware accelerators to efficiently orchestrate heterogeneous clusters with smart-NICs and in-network processors.