AuRORA: Virtualized Accelerator Orchestration for Multi-Tenant Workloads

Abstract:

With the widespread adoption of deep neural networks (DNNs) across applications, there is a growing demand for DNN deployment solutions that can seamlessly support multi-tenant execution. This involves simultaneously running multiple DNN workloads on heterogeneous architectures with domain-specific accelerators. However, existing accelerator interfaces directly bind the accelerator’s physical resources to user threads, without an efficient mechanism to adaptively re-partition available resources. This leads to high programming complexities and performance overheads due to sub-optimal resource allocation, making scalable many-accelerator deployment impractical.To address this challenge, we propose AuRORA, a novel accelerator integration methodology that enables scalable accelerator deployment for multi-tenant workloads. In particular, AuRORA supports virtualized accelerator orchestration via co-designing the hardware-software stack of accelerators to allow adaptively binding current workloads onto available accelerators. We demonstrate that AuRORA achieves 2.02× higher overall SLA satisfaction, 1.33× overall system throughput, and 1.34× overall fairness compared to existing accelerator integration solutions with less than 2.7% area overhead.CCS CONCEPTS• Computer systems organization → Multicore architectures; Distributed architectures; Neural networks; • Hardware → Communication hardware, interfaces and storage; Application-specific VLSI designs.

Author:

Seah Kim

Borivoje Nikolić

Sophia Shao

Publication date:

February 6, 2024

Publication type:

Journal Article

AuRORA: Virtualized Accelerator Orchestration for Multi-Tenant Workloads

Document