Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece Agenda - - PowerPoint PPT Presentation

hand pose estimation
SMART_READER_LITE
LIVE PREVIEW

Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece Agenda - - PowerPoint PPT Presentation

Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece Agenda What is Hand Pose Estimation? Why does it matter? How does it work? What has been done? 2 What is Hand Pose Estimation? Estimate full Degree of


slide-1
SLIDE 1

Matthew Krenik Advisor: Fabrizio Pece

Hand Pose Estimation

slide-2
SLIDE 2

§ What is Hand Pose Estimation? § Why does it matter? § How does it work? § What has been done?

2

Agenda

slide-3
SLIDE 3

§ Estimate full Degree of Freedom (DOF) of a hand from depth images § This is a tough problem, especially to perform in real time! § Not to be confused with “hand shape estimation”

3

What is Hand Pose Estimation?

slide-4
SLIDE 4

4

slide-5
SLIDE 5

§ More than just gestures § Ideal for continuous input applications § Links your hand dexterity into a computer model § Will it redefine how we interact with computers??

5

Why Does it Matter?

slide-6
SLIDE 6

6

Gaming

slide-7
SLIDE 7

7

Design / Engineering

slide-8
SLIDE 8

8

Robot Hand Control– Surgery? Industry?

slide-9
SLIDE 9

9

Communication – Sign Language

slide-10
SLIDE 10

§ Its going to take some time to explain § Starting from the ground up!

§ Decision trees § Ensemble techniques § Random forests § Body Pose estimation § Hand Pose Estimation

§ Assumption is that everyone has a very basic idea of what machine learning is and does

10

How Does it Work?

slide-11
SLIDE 11

§ Goal:

§ Given training data T with entries (𝒚, 𝒛) § Find a model that estimates 𝒛 for unseen 𝒚 § This is called prediction

§ Quality Measurement:

§ Minimize the probability of model prediction errors on future data

§ What are some models?

§ Linear Regression § Support Vector Machines § Decision Trees!

11

Machine Learning

slide-12
SLIDE 12

§ Very intuitive § Each node asks a question about a feature of the data § Propagates through the tree depending on the answer to each question § When algorithm gets to the end, the decision tree makes a classification

12

Decision Trees

slide-13
SLIDE 13

§ In what order do we ask the questions (test features)?

§ Each possible tree has an amount of entropy § Test out all possible questions for a node, and choose the one that reduces the entropy the most (largest information gain)

§ How do nodes make decisions based on the features?

§ Same way! § Choose a decision boundary that gives the largest information gain

13

How to grow a tree from data?

slide-14
SLIDE 14

14

How to grow a tree from data?

slide-15
SLIDE 15

15

Decision Trees: A Pretty Good Model!

slide-16
SLIDE 16

§ Two competing methodologies:

§ Traditional: Build one really good model § Ensemble: Build many models and average the results

§ Build a ton of “pretty good” models § Combine them into one “pretty awesome” prediction! § Important for individual models to not be correlated,

  • therwise there is a strong tendency to overfit

§ So we add randomness!

16

Ensemble Learning

slide-17
SLIDE 17

§ Bootstrap Aggregation (Bagging)

§ Take a random subsample from the training set T, with replacement § Train each model on a different subsample § Classification is the majority vote; Regression is the average

§ Random Forests: Multiple, randomized decision trees

1. Bagging 2. Randomized Node Optimization: choose random set of questions

§ Number of questions affects the correlation of the trees

3. Decision boundary of the decision trees: conic, linear, etc. 4. Depth of the component decision trees

§ More depth means there will be more overfitting

17

Ensemble Techniques

slide-18
SLIDE 18

18

Example: Different Trees

slide-19
SLIDE 19

19

Example: Different Trees

slide-20
SLIDE 20

20

Example: Different Trees

slide-21
SLIDE 21

21

Example: Random Decision Forest

slide-22
SLIDE 22

22

Example: Multi-class Decision Trees

slide-23
SLIDE 23

23

Example: Comparison to SVM Model

slide-24
SLIDE 24

24

A quick look at body pose estimation

§ Body Pose Estimation Pipeline § Technology found in consumer devices, like the Kinect § Very similar to hand pose estimation

slide-25
SLIDE 25

25

Hand Pose Estimation Pipeline

slide-26
SLIDE 26

§ Hand is much smaller than the body, but still has 22 DOF § Self occlusion is very common and severe § Can be rotated in any direction (body is always upright) § Real depth data can be difficult to label

26

What makes Hand Pose tough?

slide-27
SLIDE 27

§ Restrict the viewing area of the hand § One Advantage: Hands are fairly invariant among humans § Train with synthetic data, rendered from 3D models

27

Some ideas..

slide-28
SLIDE 28

§ Use 3D hand models to generate data § Train the Random Decision Forests using this data

28

Train based on Synthetic Data

slide-29
SLIDE 29

29

Hand Pose Estimation Pipeline

slide-30
SLIDE 30

30

Pixel Classification

One Tree Two Trees Three Trees

slide-31
SLIDE 31

§ Algorithm used to determine where the joints are § Each pixel is given a weighted Gaussian kernel § Weight is determined by class probability times depth § Gradient ascent from many points finds the local maxima § Highest local maxima determines the joint § Threshold the scores to filter out non-visible joints

31

Mean shift local mode finding

slide-32
SLIDE 32

32

Joint Determination

slide-33
SLIDE 33

Strengths § Very fast § Robust to fast movements and noise § No initialization needed § Can run on a GPU for interface applications or games Issues § Training must be done offline § Number of images ~1-10M, takes 25-250 GB of data § Number of operations is huge even with simple algorithm

33

Hand Pose Estimation Algorithm

slide-34
SLIDE 34

§ Difficult to generate every possible hand pose § Dataset size is huge! § Hard to capture the variation in the data set § More variation à deeper trees à more RAM/memory § Solution: Divide into sub problems and solve with separate RDFs § Lower variation à lower complexity à less RAM/memory

34

Limitations of Single Layer RDF

slide-35
SLIDE 35

35

slide-36
SLIDE 36

36

Multi-layered RDFs for Hand Pose

slide-37
SLIDE 37

§ Local Expert Network

§ Hand Shape Classification gives each pixel a label § Train local expert forests for each pixel label § Expert forest depends on pixel label; each pixel is classified

§ Global Expert Network

§ Hand Shape Classification gives each pixel a label § The hand shape is determined by pixel voting § Train global expert forests for each pixel label § Expert forest depends on hand shape label; each pixel is classified

37

Two Structures of Multi-layer RDFs

slide-38
SLIDE 38

38

Local Expert Network

slide-39
SLIDE 39

39

Global Expert Network

slide-40
SLIDE 40

§ Given the same data as before (hand shape not given)

  • 1. Cluster the data
  • 2. Train Hand Shape Classifier based on all clusters
  • 3. Train each Pixel Classifier based on a specific cluster

40

Training a Multi-layer RDF

slide-41
SLIDE 41

§ Global Expert Networks average class distributions à More robust to noise § Local Expert Networks use info from each pixel à Better at generalizing unseen data

41

Which is better? GEN or LEN

slide-42
SLIDE 42

42

Test: American Sign Language

slide-43
SLIDE 43

§ Huge improvement over single-layer RDFs

43

Results

slide-44
SLIDE 44

§ Remaining errors are concentrated on very similar poses

44

Results

slide-45
SLIDE 45

§ What is Hand Pose Estimation? Determine the joint positions to fix all DOFs of the hand § Why does it matter? Continuous Input Applications § How does it work? Randomized Decision Forests § What has been done? Add multiple layers for increased performance.

45

Summary

slide-46
SLIDE 46

§ [1] Keskin- Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests § [2] Thompson-Real Time Continuous Pose Recovery of Human Hands Using Convolutional Networks § [3] Qian- Realtime and Robust Hand Tracking from Depth § [4] Tang- Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture § [5] Oikonomidis - Evolutionary Quasi-random Search for Hand Articulations Tracking § [6] Wang - 6D Hands: Markerless Hand Tracking for Computer Aided Design § [7] Hilliges - Advanced topics in Gesture Recognition Part II

46

References

slide-47
SLIDE 47

47

Questions?

slide-48
SLIDE 48

§ Hand shape is just shape information “fist”, “flat”, etc. § Hand pose is specific joint angles for every DOF § With hand pose, can use SVM to determine hand shape very robustly

48

Appendix: Getting Hand Shape from Hand Pose