A new publication has been accepted in The 40th IEEE International Conference on Computer Design. This conference is located in Lake Tahoe, USA, from October 23 to 26, 2022.
Given the diminishing returns from technology scaling and the power density limitations, modern processors use dedicated hardware to accelerate different application domains efficiently. While some provide a high degree of flexibility, such as Vector Processing Units (VPUs), other accelerators focus on a specific subset of operations. Systolic Array (SA) is a specialized accelerator employed to accelerate the General Matrix Multiplication (GEMM) kernel. GEMM is at the heart of machine learning, big data, and scientific computing applications. Modern high-performance architectures integrate both in the system leading to significant area overhead that is unfeasible for smaller devices, i.e., edge or IoT. There is a need to improve resource efficiency and utilization to enable the adoption of DL in edge and IoT systems.
Targeting this issue, we have proposed VSA, a hybrid Vector-Systolic Architecture that can extend a VPU with the functionality of a SA. VSA can thus support Single Instruction Multiple Data (SIMD) instruction extensions with the VPU functionality and efficiently compute GEMM with the SA functionality. Furthermore, we have implemented VSA as a RISC-V co-processor. As a result, we have seen speedups of up to 3.5x for known CNNs, such as ResNet50, and considerable energy consumption reductions while having a minimal area overhead.