3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris - - PowerPoint PPT Presentation

3d multi object tracking for
SMART_READER_LITE
LIVE PREVIEW

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris - - PowerPoint PPT Presentation

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D multi-object tracking is an important perception task for autonomous driving 2 Standard 3D MOT Pipeline Sensor Data 3D Object Detection Data


slide-1
SLIDE 1

3D Multi-Object Tracking for Autonomous Driving

Xinshuo Weng, Kris Kitani

June 15, 2020

1

slide-2
SLIDE 2

2

3D multi-object tracking is an important perception task for autonomous driving

slide-3
SLIDE 3

Standard 3D MOT Pipeline

3

3D Object Detection Data Association

Evaluation Sensor Data

slide-4
SLIDE 4

Standard 3D MOT Pipeline

4

3D Object Detection Data Association

Evaluation Sensor Data

LiDAR RGB

slide-5
SLIDE 5

Standard 3D MOT Pipeline

5

3D Object Detection Data Association

Evaluation Sensor Data

Detection results

slide-6
SLIDE 6

Standard 3D MOT Pipeline

6

3D Object Detection Data Association

Evaluation Sensor Data

Tracking results

slide-7
SLIDE 7

Standard 3D MOT Pipeline

7

Also important!

3D Object Detection Data Association

Evaluation Sensor Data

Evaluation:

  • 1. MOTA: MOT accuracy
  • 2. MOTP: MOT precision
  • 3. IDS: # of identity switches
  • 4. FRAG: # of trajectory

fragments

  • 5. ……
slide-8
SLIDE 8

Standard 3D MOT Pipeline

8

3D Object Detection Data Association

Evaluation Sensor Data

slide-9
SLIDE 9

9

What is the state of the art?

slide-10
SLIDE 10

State of the Art (3D MOT)

10

3D Object Detection Data Association

Evaluation Sensor Data

Better models from better (bigger) data!

* Mined trajectory data not counted for the Argo dataset

*

150x increase!

slide-11
SLIDE 11

State of the Art (3D MOT)

11

Image credit to Patrick Langechuan Liu, https://towardsdatascience.com/monocular-3d-object-detection-in-autonomous-driving-2476a3c7f57e

AP

3D Object Detection Data Association

Evaluation Sensor Data

15x increase (3 years)

Monocular 3D Detection (KITTI)

slide-12
SLIDE 12

State of the Art (3D MOT)

12

3D Object Detection Data Association

Evaluation Sensor Data

27% increase (2 years)

Lidar-based 3D Detection (KITTI)

slide-13
SLIDE 13

State of the Art (3D MOT)

13

3D Object Detection Data Association

Evaluation Sensor Data

18% increase (4 years)

2D MOT (KITTI) *3D methods compared using 2D evaluation on KITTI

slide-14
SLIDE 14

State of the Art (3D MOT)

14

3D Object Detection Data Association

Evaluation Sensor Data

Feature Extraction Optimization

  • D. Frossard R. Urtasun.

End-to-End Learning of Multi-Sensor 3D Tracking by Detection. ICRA 2018. Zhang et al. Robust Multi-Modality Multi-Object Tracking. ICCV 2019.

Jointly optimized

Recent trend:

slide-15
SLIDE 15

State of the Art (3D MOT)

15

3D Object Detection Evaluation

Sensor Data

Feature Extraction Optimization

What are open problems in 3D MOT?

slide-16
SLIDE 16

Some Open Problems (3D MOT)

16

3D Object Detection Evaluation

Sensor Data

Feature Extraction Optimization

Many large-scale datasets but sensor suite and annotations are not unified 3D detection performance is improving but doesn't take into account sensor physics Doesn't take into account context of multi- level optimization problem (sensors, forecasting, control) Representation doesn't take into account context of other objects and the scene Weak 3D MOT evaluation datasets and metrics Should also take into account sensor

  • ptimization and redundancy

Detection and tracking should be coupled more tightly

slide-17
SLIDE 17

Some Open Problems (3D MOT)

17

3D Object Detection Evaluation

Sensor Data

Feature Extraction Optimization

Many large-scale datasets but sensor suite and annotations are not unified 3D detection performance is improving but doesn't take into account sensor physics Doesn't take into account context of multi- level optimization problem (sensors, forecasting, control) Representation doesn't take into account context of other objects and the scene Weak 3D MOT evaluation datasets and metrics Should also take into account sensor

  • ptimization and redundancy

Detection and tracking should be coupled more tightly This talk This talk

slide-18
SLIDE 18

Recent Work on Evaluation

18

slide-19
SLIDE 19

What are the Issues of Evaluation?

  • IoU (intersection of union)
  • For the pioneering 3D MOT dataset KITTI, evaluation is done in 2D
  • IoU is computed on the 2D image plane (not 3D)
  • The common practice for evaluating 3D MOT methods is:
  • First project 3D trajectories onto the image plane
  • Run the 2D evaluation code provided by KITTI

19

IoU in 2D space

Image credit to Xu et al: 3D-GIoU

3D Object Detector Feature Extractor

Optimizer Evaluation

Sensor Data

IoU in 3D space

Bp: the predicted box Bg: the ground truth box Bc: the smallest enclosing box I2D, I3D: the intersection

slide-20
SLIDE 20

What are the Issues of Evaluation?

  • Why is it not good to evaluate 3D MOT methods on the 2D space?
  • Cannot demonstrate the strength of 3D MOT methods
  • Throw away the extra information (e.g., depth value, length of the object, heading orientation)
  • Cannot fairly compare 3D MOT methods, why?
  • Not penalized by the wrong predicted depth value, length, heading as long as the 2D

projection is good

  • Which predicted box is better, blue or green?
  • Conclusion: should not use 2D metrics to evaluate 3D MOT methods

20

C

Blue: the predicted box 1 Green: the predicted box 2 Red: the ground truth box

3D Object Detector Feature Extractor

Optimizer Evaluation

Sensor Data

slide-21
SLIDE 21

Our Solution: Upgrade the Metrics Using 3D IoU

21

  • X. Weng, K. Kitani. A Baseline for 3D Multi-Object Tracking. arXiv 2019.
  • Replace the metrics in KITTI evaluation code with 2D IoU by 3D IoU
  • https://github.com/xinshuoweng/AB3DMOT (~800 stars)
  • Work with nuTonomy collaborators and use our 3D metrics in the nuScenes evaluation
  • https://www.nuscenes.org/

Our released new evaluation code nuScenes 3D MOT evaluation with our metrics

slide-22
SLIDE 22

What are the Issues of Evaluation?

  • Are we done with the evaluation? Can we further

improve the current metrics?

  • E.g., MOTA (multi-object tracking accuracy)
  • Performance is measured at a single recall point
  • Common practice
  • Select a confidence threshold, e.g., 0.9
  • Filter out detections with lower confidence
  • Data association performed on the rest of detections

22

MOTA over Recall curve

slide-23
SLIDE 23

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3D MOT system 1 3D MOT system 2

MOTA Recall

What are the Issues of Evaluation?

  • Why is it not good to evaluate at a single recall point?
  • Consequences
  • The confidence threshold needs to be carefully tuned, non-trivial effort
  • Cannot understand the full spectrum of accuracy and precision of a MOT system
  • Which MOT system is better, blue or orange?
  • The orange one has higher MOTA at its best recall point (r = 0.9)
  • The blue one has overall higher MOTA at many recall points
  • Ideally, we want high performance on all recall points

23

MOTA over Recall curve

slide-24
SLIDE 24

Our Solution: Integral Metrics

  • MOTA does not take into account of the confidence
  • What do we do to improve the evaluation?
  • Compute the integral metrics through the area under the

curve, e.g., average MOTA (AMOTA)

  • Analogous to the average precision (AP) in object detection
  • Can model the full spectrum of the MOT accuracy now

24

  • X. Weng, K. Kitani. A Baseline for 3D Multi-Object Tracking. arXiv 2019.

MOTA over Recall curve

Area under the curve

slide-25
SLIDE 25

Recent Work on Improve Feature Learning for 3D MOT

25

slide-26
SLIDE 26

What are the Issues of Feature Learning?

26

  • X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.
  • Goal: learn discriminative features for different objects
  • Issues in the feature learning?
  • Feature extraction for each object is independent of other objects
  • Why not good? No communication between objects, ignoring the context information
  • Employ feature from one or two modalities
  • E.g., 2D appearance, or 2D motion, or 3D motion, or 3D appearance
  • Why not good? Not utilize all the information that is complementary

2D (or 3D) feature extractor 2D (or 3D) feature extractor Objects in frame t Objects in frame t+1 frame t frame t+1 Hungarian algorithm Affinity matrix

Pipeline from Prior work

3D Object Detector Feature Extractor

Optimizer Evaluation

Sensor Data

slide-27
SLIDE 27

Improve Feature Learning for 3D MOT

27

  • X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.
  • How can we address these two issues?
  • Shouldn’t feature depending on the context of other objects?
  • Propose a novel feature interaction mechanism
  • How can we utilize the information from all the modalities?
  • Extract multi-modal features that are complimentary to each other
  • i.e., 2D motion + 2D appearance + 3D motion + 3D appearance

2D + 3D feature extractor 2D + 3D feature extractor

Feature interaction

Objects in frame t+1 Objects in frame t frame t frame t+1 Hungarian algorithm Affinity matrix

Pipeline from Our work

Iteratively

3D Object Detector Feature Extractor

Optimizer Evaluation

Sensor Data

slide-28
SLIDE 28

Improve Feature Learning for 3D MOT

31

  • X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020
  • Is encoding the multi-modal features really useful?
  • Answer: Yes
  • We should encode different features so that they can compliment

each other

Use feature from single modality

A: appearance feature, M: motion feature

Use feature from multiple modalities: Performance increased!

3D Object Detector Feature Extractor

Optimizer Evaluation

Sensor Data

slide-29
SLIDE 29

Improve Feature Learning for 3D MOT

32

  • X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020
  • Is feature interaction using GNN useful to 3D MOT?
  • Answer: Yes
  • We should let objects communicate and encode the context

information

Performance largely increased with GNN layers = 3 v.s. 0 !

3D Object Detector Feature Extractor

Optimizer Evaluation

Sensor Data

slide-30
SLIDE 30

Improve Feature Learning for 3D MOT

33

  • X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020
  • For more details in our CVPR work, our poster

session is as follows:

  • Date: Wednesday, June 17
  • Q&A Time: 12:00–14:00 Pacific Time
  • Session: Poster 2.2 — Face, Gesture, and Body Pose;

Motion and Tracking; Representation Learning

  • Link: http://cvpr20.com/event/gnn3dmot-graph-

neural-network-for-3d-multi-object-tracking-with- 2d-3d-multi-feature-learning/

slide-31
SLIDE 31

Moving Forward

End-to-End Perception and Prediction Pipeline

34

slide-32
SLIDE 32

End-to-End Perception and Prediction Pipeline

  • We now have only data association

jointly optimized

  • What is next? Can we go further?
  • End-to-end MOT and detection?
  • End-to-end MOT and trajectory forecasting?
  • End-to-end MOT and both detection, forecasting?

35

3D Object Detector Feature Extractor 3D detections

Pairwise affinity matrix

Optimizer

3D Object Trajectories Sensor data

Jointly

  • ptimized

Trajectory Forecasting

Past object trajectories

slide-33
SLIDE 33

Joint 3D MOT and Trajectory Forecasting

  • Prior work separates 3D MOT and trajectory forecasting
  • Why is it not good to separate the two?
  • Optimization of entire pipeline is impossible, leading to sub-optimal performance
  • Slow inference due to separate modular design, each network takes time
  • What can we do?

36

  • X. Weng, Y. Ye, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

Pipeline from Prior Work

Detected

  • bjects in

current frame

Objects trajectories in past H frames

Last frame Current frame

Trajectory forecasting head Predicted trajectories in future T frames Objects trajectories up to current frame

3D Multi-Object Tracking Trajectory Forecasting

Feature extraction Feature extraction 3D MOT head Feature extraction

Separate

3D Object Detector Feature Extractor

Optimizer

Trajectory Forecasting

Sensor Data

slide-34
SLIDE 34

Joint 3D MOT and Trajectory Forecasting

  • Parallelize the MOT and forecasting
  • Share the feature learning process
  • Use GNN3DMOT as part of our network for tracking
  • Add a multi-modal trajectory forecasting head

37

  • X. Weng, Y. Ye, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

Edge features

Diversity sampling

Node features

GNN for feature interaction

Predicted trajectories in future T frames Detected objects in current frame Objects trajectories in past H frames

Last frame

Current frame Feature extraction Feature extraction

3D MOT head Trajectory forecasting head

Joint 3D Tracking and Forecasting

GNN3DMOT Forecasting Shared Feature Learning

3D Object Detector Feature Extractor

Optimizer

Trajectory Forecasting

Sensor Data

slide-35
SLIDE 35

Joint 3D MOT and Trajectory Forecasting

  • Is it useful to do joint optimization?
  • Add forecasting is useful to tracking
  • How does adding forecasting affect 3D MOT?
  • Add joint optimization with forecasting improves performance on tracking

38

  • X. Weng, Y. Ye, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

Improvement on 5 out of 6 entries!

3D MOT evaluation without forecasting module

3D Object Detector Feature Extractor

Optimizer

Trajectory Forecasting

Sensor Data

slide-36
SLIDE 36

Joint 3D MOT and Trajectory Forecasting

  • Is it useful to do joint optimization?
  • Add forecasting is useful to tracking
  • Add MOT is useful to forecasting
  • How does adding 3D MOT affect trajectory forecasting?
  • Add joint optimization with 3D MOT improves performance on forecasting

39

  • X. Weng, Y. Ye, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

Forecasting evaluation without 3D MOT Performance improved after adding MOT!

3D Object Detector Feature Extractor

Optimizer

Trajectory Forecasting

Sensor Data

slide-37
SLIDE 37

Joint 3D MOT and Trajectory Forecasting

  • Is it useful to do joint optimization?
  • Yes. Joint optimization is useful to both modules!
  • For more details in this arXiv work
  • Scan the code for the paper

40

  • X. Weng, Y. Ye, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

3D Object Detector Feature Extractor

Optimizer

Trajectory Forecasting

Sensor Data

slide-38
SLIDE 38

Joint MOT and Object Detection

  • Now we have method for joint MOT and forecasting
  • Can we do joint detection and MOT?
  • Use GNN3DMOT as part of our network for tracking
  • Add a detection head to classify/regress objects
  • Can be possibly extended to BEV and 3D detection and MOT
  • Will be released soon

41

  • Y. Wang, X. Weng, K. Kitani. Joint Detection and Multi-Object Tracking with Graph Neural Networks and Complete Feature Learning. arXiv 2020

GNN3DMOT Detection

Edge features Node features

GNN for feature interaction

Trajectories up to the current frames Anchors in current frame Objects trajectories in past H frames

Last frame

Current frame Feature extraction Feature extraction

3D MOT head Object detection head

Joint 3D Tracking and Forecasting

3D Object Detector Feature Extractor

Optimizer

Trajectory Forecasting

Sensor Data

slide-39
SLIDE 39

Joint MOT, Detection and Forecasting

  • The most complete joint pipeline for detection, tracking and forecasting

42 Liang et al. PnPNet: End-to-End Perception and Prediction with Tracking in the Loop. CVPR 2020 3D Object Detector Feature Extractor

Optimizer

Trajectory Forecasting

Sensor Data

slide-40
SLIDE 40

Moving Forward

Achieve trajectory forecasting as tracking

44

slide-41
SLIDE 41

Conventional Perception and Prediction Pipeline

  • Traditional pipeline:
  • Detection -> data association -> trajectory forecasting
  • Is this pipeline the best?
  • What are other options?

Weng et al. Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting. arXiv 2020

3D Object Detection Trajectory Forecasting

Sensor Data Feature Extraction

Optimization

slide-42
SLIDE 42

Trajectory Forecasting as Tracking

  • Traditional pipeline:
  • Detection -> MOT -> trajectory forecasting
  • Our new pipeline
  • Sensor data forecasting -> detection -> MOT

46

Weng et al. Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting. arXiv 2020

Switch the order

slide-43
SLIDE 43

Take Home Message

  • Important to develop appropriate evaluation metrics for 3D MOT to measure progress
  • Representation of objects in 3D MOT should take into account other objects

47

3D Object Detection Evaluation

Sensor Data Feature Extraction

Optimization

Many large-scale datasets but sensor suite and annotations are not unified 3D detection performance is improving but doesn't take into account sensor physics Doesn't take into account context of multi-level optimization problem (sensors, forecasting, control) Representation doesn't take into account context of other objects, scene and past Need 3D MOT evaluation datasets Should also take into account sensor optimization and redundancy Detection and tracking should be coupled more tightly

Open Problems

Dynamics models should be customized to object type

slide-44
SLIDE 44

3D Multi-Object Tracking for Autonomous Driving

Xinshuo Weng, Kris Kitani

June 15, 2020

48