The Ohio State University
Early Experience in Benchmarking Edge AI Processors with Object Detection Workloads
1 Department of Computer Science and Engineering,
The Ohio State University
{hui.82, lu.932}@osu.edu
2 NovuMind Inc.
jlien@novumind.com
Early Experience in Benchmarking Edge AI Processors with Object - - PowerPoint PPT Presentation
Early Experience in Benchmarking Edge AI Processors with Object Detection Workloads Bench 2019 Yujie Hui 1 , Jeffrey Lien 2 , and Xiaoyi Lu 1 1 Department of Computer Science and Engineering, The Ohio State University {hui.82, lu.932}@osu.edu 2
The Ohio State University
1 Department of Computer Science and Engineering,
The Ohio State University
{hui.82, lu.932}@osu.edu
2 NovuMind Inc.
jlien@novumind.com
The Ohio State University
2
The Ohio State University
3
APP APP DATA
APP APP DATA DATA
APP
The Ohio State University
4
Data Features Training Evaluation Inference Datacenter (e.g., GPU) Data Features Training Evaluation Inference Datacenter (e.g., GPU) Edge Devices
The Ohio State University
5
Recommendation 2% RNN ASR 10% RNN Translator 6% Image Classification 42% Object Detection 34% Object Segmentation 3% Face ID 3%
Ma Machine Learning Use Cases in Facebook
Recommendat ion RNN ASR RNN Translator Image Classification Object Detection Object Segmentation Face ID Wu et al., Machine Learning at Facebook: Understanding Inference at Edge, HPCA-2019
The Ohio State University
6
Low latency and high accuracy inference needs high performance edge devices!
The Ohio State University
7
The Ohio State University
8
https://coral.withgoogle.com/products/dev-board
The Ohio State University
9
https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit
The Ohio State University
10
[1]https://patentscope.wipo.int/search/en/detail.jsf?docId=US225521272&tab=NATIONALBIBLIO
Weight Tensor Data Tensor Output Tensor Tensor Convolution Novutensor’s 3D Operation
The Ohio State University
11
The Ohio State University
12
The Ohio State University
13
https://pjreddie.com/darknet/yolov2/
Darknet-19
YOLO9000: Better, Faster, Stronger. Joseph Redmon, Ali Farhadi
The Ohio State University
14
Microsoft COCO Dataset Examples
http://cocodataset.org/#home
Microsoft COCO: Common Objects in Context. Lin et al.
The Ohio State University
32-bab
8-bdb
Post-Training Integer Quantization
Caadaa
.
Ed TPU C
.
D
OH OE P
Modify the weights of first convolutional layer DarkFlow[1] Post-Training Integer Quantization[2] EdgeTPU compiler
15
[1]https://github.com/thtrieu/darkflow [2]https://medium.com/tensorflow/tensorflow-model-optimization-toolkit-post-training-integer-quantization-b4964a1ea9ba [3]https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps
NVIDIA’s deepstream reference applications[3] TensorRT 5.0.3 15-watt and 30-watt modes
NovuSDK
The Ohio State University
16
Mean Average Precision: !" = $
% &
' ( )( *!" = 1 N -
./&
!" Execution time:
Number of input images can be fully processed per unit-power
The Ohio State University
17
The Ohio State University
18
Mean Average Precision: !" = $
% &
' ( )( *!" = 1 N -
./&
!" Execution time:
Number of input images can be fully processed per unit-power
The Ohio State University
19
Performance running YOLOv2 and Tiny-YOLO with 416x416 input images
0.2 0.4 0.6
Edge TPU Xavier 15w Xavier MAXW NovuTensor 1080Ti+TensorRT 1080Ti
Tiny-YOLO YOLOv2
The Ohio State University
20
Mean Average Precision: !" = $
% &
' ( )( *!" = 1 N -
./&
!" Execution time:
Number of input images can be fully processed per unit-power
The Ohio State University
21
Performance running YOLOv2 and Tiny-YOLO with 416x416 input images
20 40 60 80 100
Edge TPU Xavier 15w Xavier MAXW NovuTensor 1080Ti+TensorRT 1080Ti
Latency (ms)
Tiny-YOLO YOLOv2
❖ EdgeTPU is 9.5X and 14.79X slower than GPU with running Tiny-YOLO and YOLOv2 ❖ NovuTensor and Xavier are 4.66X - 6.08X slower than the GPU ❖ Xavier is 2X and 5.28X faster than EdgeTPU in the max power mode ❖ NovuTensor is 2.04X and 3.8X faster than EdgeTPU for YOLOv2 and Tiny-YOLO
The Ohio State University
22
Mean Average Precision: !" = $
% &
' ( )( *!" = 1 N -
./&
!" Execution time:
Number of input images can be fully processed per unit-power
The Ohio State University
23
❖ All edge AI processors have higher energy efficiency due to low power consumptions ❖ EdgeTPU delivers 2.9X and 1.13X higher energy efficiency than Xavier; 1.96X and 1.04X higher than NovuTensor
5 10 15
Edge TPU Xavier 15w Xavier MAXW NovuTensor 1080Ti+TensorRT 1080Ti
Energy Efficiency (image/sec/watt)
Tiny-YOLO YOLOv2
Performance running YOLOv2 and Tiny-YOLO with 416x416 input images
The Ohio State University
24
Performance running YOLOv2 and Tiny-YOLO with 1024X1024 input images
(a) Latency (b) Energy Efficiency 100 200
X a v i e r 1 5 w X a v i e r M A X W N
u T e n s
1 8 T i + T e n s
R T 1 8 T i
Latency (ms)
0.2 0.4 0.6 0.8 1
X a v i e r 1 5 w X a v i e r M A X W N
u T e n s
1 8 T i + T e n s
R T 1 8 T i
Energy Efficiency (image/sec/watt)
The Ohio State University
25
1.2
Accuracy Performance Energy Efficiency
Edge TPU Xavier 15w Xavier MAXW NovuTensor 1080Ti+TensorRT 1080Ti
0.2 0.4 0.6 0.8 1 1.2
Accuracy Performance Energy Efficiency
0.2 0.4 0.6 0.8 1 1.2
Accuracy Performance Energy Efficiency
(a) YOLOv2 (b) Tiny-YOLO
Comparison of factors running YOLOv2 and Tiny-YOLO with 416x416 input images
①
EdgeTPU provides the best energy efficiency
②
Xavier and NovuTensor provide comparable latency and energy efficiency performance
③
TensorRT optimizes and accelerates GPU platforms
The Ohio State University
26
The Ohio State University
Edge TPU, NVIDIA Xavier, and NovuTensor) from three dimensions
efficiency, Edge TPU consumes less energy but is much slower for inference
AI platforms
27
The Ohio State University