This book provides a structured treatment of the key principles and techniques for enabling efficient processing of deep neural networks (DNNs). DNNs are currently widely used for many artificial intelligence (AI) applications, including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Therefore, techniques that enable efficient processing of deep neural networks to improve key metrics-such as energy-efficiency, throughput, and latency-without sacrificing accuracy or increasing hardware costs are critical to enabling the wide deployment of DNNs in AI systems.
The book includes background on DNN processing; a description and taxonomy of hardware architectural approaches for designing DNN accelerators; key metrics for evaluating and comparing different designs; features of DNN processing that are amenable to hardware/algorithm co-design to improve energy efficiency and throughput; and opportunities for applying new technologies. Readers will find a structured introduction to the field as well as formalization and organization of key concepts from contemporary work that provide insights that may spark new ideas.
This book provides a structured treatment of the key principles and techniques for enabling efficient processing of deep neural networks (DNNs). DNNs are currently widely used for many artificial intelligence (AI) applications, including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Therefore, techniques that enable efficient processing of deep neural networks to improve key metrics-such as energy-efficiency, throughput, and latency-without sacrificing accuracy or increasing hardware costs are critical to enabling the wide deployment of DNNs in AI systems.
The book includes background on DNN processing; a description and taxonomy of hardware architectural approaches for designing DNN accelerators; key metrics for evaluating and comparing different designs; features of DNN processing that are amenable to hardware/algorithm co-design to improve energy efficiency and throughput; and opportunities for applying new technologies. Readers will find a structured introduction to the field as well as formalization and organization of key concepts from contemporary work that provide insights that may spark new ideas.
Preface.- Acknowledgments.- Introduction.- Overview of Deep Neural Networks.- Key Metrics and Design Objectives.- Kernel Computation.- Designing DNN Accelerators.- Operation Mapping on Specialized Hardware.- Reducing Precision.- Exploiting Sparsity.- Designing Efficient DNN Models.- Advanced Technologies.- Conclusion.- Bibliography.- Authors' Biographies.
Vivienne Sze received the B.A.Sc. (Hons.) degree in electrical
engineering from the University of Toronto, Toronto, ON, Canada, in
2004, and the S.M. and Ph.D. degrees in electrical engineering from
the Massachusetts Institute of Technology (MIT), Cambridge, MA, in
2006 and 2010, respectively. In 2011, she received the Jin-Au Kong
Outstanding Doctoral Thesis Prize in Electrical Engineering at MIT.
She is an Associate Professor at MIT in the Electrical Engineering
and Computer Science Department. Her research interests include
energy-aware signal processing algorithms and low-power circuit and
system design for portable multimedia applications, including
computer vision, deep learning, autonomous navigation, image
processing, and video compression. Prior to joining MIT, she was a
Member of the Technical Staff in the Systems and Applications
R&D Center at Texas Instruments (TI), Dallas, TX, where she
designed low-power algorithms and architectures for video coding.
She also represented TI in the JCT-VC committee of ITU-T and
ISO/IEC standards body during the development of High Efficiency
Video Coding (HEVC), which received a Primetime Engineering Emmy
Award. Within the committee, she was the primary coordinator of the
core experiment on coefficient scanning and coding, and she
chaired/vice-chaired several ad hoc groups on entropy coding. She
is a co-editor of High Efficiency Video Coding (HEVC): Algorithms
and Architectures (Springer, 2014). Prof. Sze is a recipient of the
inaugural ACM-W Rising Star Award, the 2019 Edgerton Faculty
Achievement Award at MIT, the 2018 Facebook Faculty Award, the 2018
& 2017 Qualcomm Faculty Award, the 2018 & 2016 Google Faculty
Research Award, the 2016 AFOSR Young Investigator Research Program
(YIP) Award, the 2016 3M Non-Tenured Faculty Award, the 2014 DARPA
Young Faculty Award, and the 2007 DAC/ISSCC Student Design Contest
Award; and she is a co-recipient of the 2018 VLSI Best Student
Paper Award, the 2017 CICC Outstanding Invited Paper Award, the
2016 IEEE Micro Top Picks Award, and the 2008 A-SSCC Outstanding
Design Award. She currently serves on the technical program
committee for the International Solid-State Circuits Conference
(ISSCC) and the SSCS Advisory Committee (AdCom). She has served on
the technical program committees for VLSI Circuits Symposium,
Micro, and the Conference on Machine Learning and Systems (MLSys);
as a guest editor for the IEEE Transactions on Circuits and Systems
for Video Technology (TCSVT); and as a Distinguished Lecturer for
the IEEE Solid-State Circuits Society (SSCS). Prof. Sze was the
Systems Program Chair of MLSys in 2020.
Tien-Ju Yang received the B. S. degree in electrical engineering
from National Taiwan University (NTU), Taipei, Taiwan, in 2010, and
the M. S. degree in electronics engineering from NTU in 2012.
Between 2012 and 2015, he worked as an engineer in the Intelligent
Vision Processing Group, MediaTek Inc., Hsinchu, Taiwan. He is
currently a Ph.D.candidate in Electrical Engineering and Computer
Science at Massachusetts Institute of Technology, Cambridge, MA,
working on energy-efficient deep neural network design. His
research interests span the areas of computer vision, machine
learning, image/video processing, and VLSI system design. He won
first place in the 2011 National Taiwan University Innovation
Contest and co-taught a tutorial on “Efficient Image Processing
with Deep Neural Networks” at ICIP2019.Joel S. Emer received the
B.S. (Hons.) and M.S. degrees in electrical engineering from Purdue
University, West Lafayette, IN, USA, in 1974 and 1975,
respectively, and the Ph.D. degree in electrical engineering from
the University of Illinois at Urbana-Champaign, Champaign, IL, USA,
in 1979. He is currently a Senior Distinguished Research Scientist
with Nvidia’s Architecture Research Group, Westford, MA, USA, where
he is responsible for exploration of future architectures and
modeling and analysis methodologies. He isalso a Professor of the
Practice at the Massachusetts Institute of Technology, Cambridge,
MA, USA. Previously he was with Intel, where he was an Intel Fellow
and the Director of Microarchitecture Research. At Intel, he led
the VSSAD Group, which he had previously been a member of at Compaq
and the Digital Equipment Corporation. Over his career, he has held
various research and advanced development positions investigating
processor micro-architecture and developing performance modeling
and evaluation techniques. He has made architectural contributions
to a number of VAX, Alpha, and X86 processors and is recognized as
one of the developers of the widely employed quantitative approach
to processor performance evaluation. He has been recognized for his
contributions in the advancement of simultaneous multithreading
technology, processor reliability analysis, cache organization,
pipelined processor organization, and spatial architectures for
deep learning. Dr. Emer is a Fellow of the ACM and IEEE and a
member of the NAE. He has been a recipient of numerous public
recognitions. In 2009, he received the Eckert-Mauchly Award for
lifetime contributions in computer architecture. He received the
Purdue University Outstanding Electrical and Computer Engineer
Alumni Award and the University of Illinois Electrical and Computer
Engineering Distinguished Alumni Award in 2010 and 2011,
respectively. His 1996 paper on simultaneous multithreading
received the ACM/SIGARCHIEEE-CS/TCCA Most Influential Paper Award
in 2011. He was named to the International Symposium on Computer
Architecture (ISCA) and International Symposium on
Microarchitecture (MICRO) Halls of Fame in 2005 and 2015,
respectively. He has had six papers selected for the IEEE Micro’s
Top Picks in Computer Architecture in 2003, 2004, 2007, 2013, 2015,
and 2016. He was the Program Chair of ISCA in 2000 and MICRO in
2017.
![]() |
Ask a Question About this Product More... |
![]() |