Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

Chunwei Xia; Jiacheng Zhao; Qianqi Sun; Zheng Wang; Yuan Wen; Teng Yu; Xiaobing Feng; Huimin  Cui

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

Chunwei Xia, Jiacheng Zhao^* (Corresponding Author), Qianqi Sun, Zheng Wang, Yuan Wen, Teng Yu, Xiaobing Feng, Huimin Cui

^*Corresponding author for this work

Computing Science

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Abstract

Optimizing deep neural network (DNN) execution is important but becomes increasingly difficult as DNN complexity grows. Existing DNN compilers cannot effectively exploit optimization opportunities across operator boundaries, leaving room for improvement. To address this challenge, we present
Souffle, an open-source compiler that optimizes DNN inference across operator boundaries. Souffle creates a global tensor dependency graph using tensor expressions, traces data flow and tensor information, and partitions the computation graph into subprograms based on dataflow analysis and resource constraints. Within a subprogram, Souffle performs local optimization via semantic-preserving transformations, finds an optimized program schedule, and improves instruction-level parallelism and data reuse. We evaluated Souffle using six representative DNN models on an NVIDIA A100 GPU. Experimental results show that Souffle consistently outperforms six state-of-the-art DNN optimizers by delivering a geometric mean speedup of up to 3.7× over
TensorRT and 7.8× over Tensorflow XLA.

Original language	English
Title of host publication	The ACM International Conference on Architectural Support for Programming Languages and Operating Systems
Publisher	ACM
Number of pages	15
Publication status	Accepted/In press - 2 Aug 2023

Bibliographical note

We thank our shepherd, Vinod Grover, and the anonymous
reviewers for their constructive feedback. This work was
supported in part by the National Key R&D Program of
China under grant agreement 2021ZD0110101, the National Natural Science Foundation of China (NSFC) under grant agreements T2222026, 22003073, 62232015, and 62090024, the Innovation Funding of ICT CAS under grant agreement E361010, a Beijing Nova Program, and the UK Engineering
and Physical Sciences Research Council (EPSRC) under grant agreement EP/X018202/1. For the purpose of open access, the authors have applied a Creative Commons Attribution (CCBY) license to any Author Accepted Manuscript versionarising from this submission.

Access to Document

https://eprints.whiterose.ac.uk/203681/Licence: CC BY

Cite this

@inproceedings{3c9e3cc777174b6e852a7620d1ea6b0e,

title = "Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions",

abstract = "Optimizing deep neural network (DNN) execution is important but becomes increasingly difficult as DNN complexity grows. Existing DNN compilers cannot effectively exploit optimization opportunities across operator boundaries, leaving room for improvement. To address this challenge, we presentSouffle, an open-source compiler that optimizes DNN inference across operator boundaries. Souffle creates a global tensor dependency graph using tensor expressions, traces data flow and tensor information, and partitions the computation graph into subprograms based on dataflow analysis and resource constraints. Within a subprogram, Souffle performs local optimization via semantic-preserving transformations, finds an optimized program schedule, and improves instruction-level parallelism and data reuse. We evaluated Souffle using six representative DNN models on an NVIDIA A100 GPU. Experimental results show that Souffle consistently outperforms six state-of-the-art DNN optimizers by delivering a geometric mean speedup of up to 3.7× overTensorRT and 7.8× over Tensorflow XLA.",

author = "Chunwei Xia and Jiacheng Zhao and Qianqi Sun and Zheng Wang and Yuan Wen and Teng Yu and Xiaobing Feng and Huimin Cui",

note = "We thank our shepherd, Vinod Grover, and the anonymous reviewers for their constructive feedback. This work was supported in part by the National Key R&D Program of China under grant agreement 2021ZD0110101, the National Natural Science Foundation of China (NSFC) under grant agreements T2222026, 22003073, 62232015, and 62090024, the Innovation Funding of ICT CAS under grant agreement E361010, a Beijing Nova Program, and the UK Engineering and Physical Sciences Research Council (EPSRC) under grant agreement EP/X018202/1. For the purpose of open access, the authors have applied a Creative Commons Attribution (CCBY) license to any Author Accepted Manuscript versionarising from this submission. ",

year = "2023",

month = aug,

day = "2",

language = "English",

booktitle = "The ACM International Conference on Architectural Support for Programming Languages and Operating Systems",

publisher = "ACM",

}

TY - GEN

T1 - Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

AU - Xia, Chunwei

AU - Zhao, Jiacheng

AU - Sun, Qianqi

AU - Wang, Zheng

AU - Wen, Yuan

AU - Yu, Teng

AU - Feng, Xiaobing

AU - Cui, Huimin

N1 - We thank our shepherd, Vinod Grover, and the anonymous reviewers for their constructive feedback. This work was supported in part by the National Key R&D Program of China under grant agreement 2021ZD0110101, the National Natural Science Foundation of China (NSFC) under grant agreements T2222026, 22003073, 62232015, and 62090024, the Innovation Funding of ICT CAS under grant agreement E361010, a Beijing Nova Program, and the UK Engineering and Physical Sciences Research Council (EPSRC) under grant agreement EP/X018202/1. For the purpose of open access, the authors have applied a Creative Commons Attribution (CCBY) license to any Author Accepted Manuscript versionarising from this submission.

PY - 2023/8/2

Y1 - 2023/8/2

N2 - Optimizing deep neural network (DNN) execution is important but becomes increasingly difficult as DNN complexity grows. Existing DNN compilers cannot effectively exploit optimization opportunities across operator boundaries, leaving room for improvement. To address this challenge, we presentSouffle, an open-source compiler that optimizes DNN inference across operator boundaries. Souffle creates a global tensor dependency graph using tensor expressions, traces data flow and tensor information, and partitions the computation graph into subprograms based on dataflow analysis and resource constraints. Within a subprogram, Souffle performs local optimization via semantic-preserving transformations, finds an optimized program schedule, and improves instruction-level parallelism and data reuse. We evaluated Souffle using six representative DNN models on an NVIDIA A100 GPU. Experimental results show that Souffle consistently outperforms six state-of-the-art DNN optimizers by delivering a geometric mean speedup of up to 3.7× overTensorRT and 7.8× over Tensorflow XLA.

AB - Optimizing deep neural network (DNN) execution is important but becomes increasingly difficult as DNN complexity grows. Existing DNN compilers cannot effectively exploit optimization opportunities across operator boundaries, leaving room for improvement. To address this challenge, we presentSouffle, an open-source compiler that optimizes DNN inference across operator boundaries. Souffle creates a global tensor dependency graph using tensor expressions, traces data flow and tensor information, and partitions the computation graph into subprograms based on dataflow analysis and resource constraints. Within a subprogram, Souffle performs local optimization via semantic-preserving transformations, finds an optimized program schedule, and improves instruction-level parallelism and data reuse. We evaluated Souffle using six representative DNN models on an NVIDIA A100 GPU. Experimental results show that Souffle consistently outperforms six state-of-the-art DNN optimizers by delivering a geometric mean speedup of up to 3.7× overTensorRT and 7.8× over Tensorflow XLA.

M3 - Published conference contribution

BT - The ACM International Conference on Architectural Support for Programming Languages and Operating Systems

PB - ACM

ER -

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

Abstract

Bibliographical note

Access to Document

Fingerprint

Cite this