With the widespread adoption of deep neural networks (DNNs) across applications, there is a growing demand for DNN deployment solutions that can seamlessly support multi-tenant execution. This involves simultaneously running multiple DNN workloads on heterogeneous architectures with domain-specific accelerators. However, existing accelerator interfaces directly bind the accelerator’s physical resources to user threads, without an efficient mechanism to adaptively re-partition available resources. This leads to high programming complexities and performance overheads due to sub-optimal resource allocation, making scalable many-accelerator deployment impractical.To address this challenge, we propose AuRORA, a novel accelerator integration methodology that enables scalable accelerator deployment for multi-tenant workloads. In particular, AuRORA supports virtualized accelerator orchestration via co-designing the hardware-software stack of accelerators to allow adaptively binding current workloads onto available accelerators. We demonstrate that AuRORA achieves 2.02× higher overall SLA satisfaction, 1.33× overall system throughput, and 1.34× overall fairness compared to existing accelerator integration solutions with less than 2.7% area overhead.CCS CONCEPTS• Computer systems organization → Multicore architectures; Distributed architectures; Neural networks; • Hardware → Communication hardware, interfaces and storage; Application-specific VLSI designs.
Abstract:
Publication date:
February 6, 2024
Publication type:
Journal Article