Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

Chunwei Xia, Jiacheng Zhao* (Corresponding Author), Qianqi Sun, Zheng Wang, Yuan Wen, Teng Yu, Xiaobing Feng, Huimin Cui

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingPublished conference contribution

2 Citations (Scopus)
1 Downloads (Pure)

Abstract

Optimizing deep neural network (DNN) execution is important but becomes increasingly difficult as DNN complexity grows. Existing DNN compilers cannot effectively exploit optimization opportunities across operator boundaries, leaving room for improvement. To address this challenge, we present Souffle, an open-source compiler that optimizes DNN inference across operator boundaries. Souffle creates a global tensor dependency graph using tensor expressions, traces data flow and tensor information, and partitions the computation graph into subprograms based on dataflow analysis and resource constraints. Within a subprogram, Souffle performs local optimization via semantic-preserving transformations, finds an optimized program schedule, and improves instruction-level parallelism and data reuse. We evaluated Souffle using six representative DNN models on an NVIDIA A100 GPU. Experimental results show that Souffle consistently outperforms six state-of-the-art DNN optimizers by delivering a geometric mean speedup of up to 3.7× over
TensorRT and 7.8× over Tensorflow XLA.
Original languageEnglish
Title of host publicationThe ACM International Conference on Architectural Support for Programming Languages and Operating Systems
PublisherACM
Pages286 - 301
Number of pages15
DOIs
Publication statusPublished - 17 Apr 2024

Bibliographical note

We thank our shepherd, Vinod Grover, and the anonymous reviewers for their constructive feedback.
For the purpose of open access, the authors have applied a Creative Commons Attribution (CCBY) license to any Author Accepted Manuscript versionarising from this submission.

Fingerprint

Dive into the research topics of 'Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions'. Together they form a unique fingerprint.

Cite this