A System for Speech and 3D Facial Image Acquisition, Modeling and - - PowerPoint PPT Presentation

a system for speech and 3d facial image acquisition
SMART_READER_LITE
LIVE PREVIEW

A System for Speech and 3D Facial Image Acquisition, Modeling and - - PowerPoint PPT Presentation

Elmar Nth , Tobias Bocklet, Arnd Gebhard A System for Speech and 3D Facial Image Acquisition, Modeling and Analysis Wednesday, 30 May 2012 Outline Motivation: Long-term goal of the project Patient groups: Parkinsons disease


slide-1
SLIDE 1

Elmar Nöth, Tobias Bocklet, Arnd Gebhard

A System for Speech and 3D Facial Image Acquisition, Modeling and Analysis

Wednesday, 30 May 2012

slide-2
SLIDE 2

Outline

  • Motivation: Long-term goal of the project
  • Patient groups:
  • Parkinson’s disease (PD) patients
  • Stroke patients and patients with facial paresis
  • Speech technology
  • Facial analysis technology
  • Results
  • Summary
slide-3
SLIDE 3

Necessity of Evaluation

  • Diagnosis
  • How intelligible is the patient?

(holistic impression)

  • How strongly does the patient nasalize?

(distinct aspect)

  • Therapy control
  • Has the situation of the patient improved during therapy?
  • Comparison of therapy methods
  • Which therapy method leads to the best results for a group of

patients?

  • Screening
  • Is the quality of a child’s speech according to its age?
  • Computer-assisted therapy
  • Did the patient perform the exercise correctly?

Motivation

slide-4
SLIDE 4

Necessity of Evaluation

  • Diagnosis
  • How intelligible is the patient?

(holistic impression)

  • How strongly does the patient nasalize?

(distinct aspect)

  • Therapy control
  • Has the situation of the patient improved during therapy?
  • Comparison of therapy methods
  • Which therapy method leads to the best results for a group of

patients?

  • Screening
  • Is the quality of a child’s speech according to its age?
  • Computer-assisted therapy
  • Did the patient perform the exercise correctly?

Motivation

slide-5
SLIDE 5

Long-term Goal of the Project

  • Provide a telemedical rehabilitation unit for clinical/home use
  • Support speech analysis and analysis of facial gestures and …

(gait, cognitive abilities  open, flexible platform)

  • Patient groups:
  • Parkinson’s disease (PD) patients
  • Stroke patients and patients with facial paresis
  • Instruct the patient what to do
  • Evaluate the exercises
  • Compare with previous sessions
  • Summarize exercises for therapist

Motivation

slide-6
SLIDE 6

Outline

  • Long-term goal of the project
  • Patient groups:
  • Parkinson’s disease (PD) patients
  • Stroke patients and patients with facial paresis
  • Speech technology
  • Facial analysis technology
  • Results
  • Summary
slide-7
SLIDE 7

Parkinson’s Disease

  • Degenerative disorder of the central nervous system
  • Death of dopamine-containing cells in the substantia nigra
  • Cause of cell-death is unknown
  • Second most common neurodegenerative disorder

(after Alzheimer's disease)

  • Prevalence ≈ 0.3% (whole population)
  • More common in the elderly:

1% of > 60 years, 4% of > 80 years

  • Incidence of PD ≈ 8 - 18 per 100,000 people
  • Onset in most cases > 50 years, mean onset ≈ 60 years

Patient Groups

slide-8
SLIDE 8

Speech-related Symptoms of PD

  • Hypophonia (soft speech)
  • Monotonic speech: Speech quality tends to be soft, hoarse, and

monotonous

  • Festinating speech: excessively rapid, soft, poorly-intelligible

speech

  • Drooling: most likely caused by a weak, infrequent swallow
  • Dysphagia (impaired ability to swallow)
  • Dysarthria

Patient Groups

slide-9
SLIDE 9

Dysarthria

  • A speech disorder affecting the coordination of muscles in the

vocal tract, face, larynx, and respiratory system (dysarthrophonia)

  • Mostly results from a neurological injury,

such as a stroke or other kind of brain injury

Patient Groups

slide-10
SLIDE 10

Dysarthria

  • A speech disorder affecting the coordination of muscles in the

vocal tract, face, larynx, and respiratory system (dysarthrophonia)

  • Mostly results from a neurological injury,

such as a stroke or other kind of brain injury

Patient Groups

slide-11
SLIDE 11

Outline

  • Long-term goal of the project
  • Patient groups:
  • Parkinson’s disease (PD) patients
  • Stroke patients and patients with facial paresis
  • Speech technology
  • Facial analysis technology
  • Results
  • Summary
slide-12
SLIDE 12

Speech Technology

  • Automatic speech processing methods
  • Word and phoneme recognition
  • Acoustic speaker modeling
  • Prosodic analysis
  • Model of excitation signal
  • Evaluation measures
slide-13
SLIDE 13

features

classi- fication words with highest probability

word chain

Word and Phoneme Recognition

  • Off-the-shelf technology
  • Semi-continuous HMMs
  • Easier to adapt with small amounts of data
  • Comparable results with continuous models
  • 11 Mel cepstrum coefficients + energy + 1. derivative

Speech Technology

slide-14
SLIDE 14

Acoustic Speaker Modeling

  • Idea:
  • Acoustic space of speakers can be modeled
  • Space represents the multidimensional characteristics of voice of a

speaker

  • Degree of pathology varies in acoustic space
  • Find characteristics of degree of speech disorder
  • Approach:
  • Acoustics modeled by Gaussian Mixture Models (GMMs)
  • Train Universal Background Model (UBM) with normal speakers
  • Train GMM of path. speakers and transform into vector
  • Perform a classification/regression (depends on the task)

Speech Technology

slide-15
SLIDE 15

Acoustic Speaker Modeling

Gaussian density of UBM feature dimension 1 feature dimension 2 features of healthy speakers Gaussian density of speaker model features of a path. speaker

  • Variations of speakers with different degrees of pathology
  • Can be modeled by adaptation from UBM to GMM

Speech Technology

slide-16
SLIDE 16

Gaussian densities (i = 1,.., N) of speaker model defined by mean values(mi) und covariance matrices (Ki)

feature dimension 1 feature dimension 2

Concatenation

  • f elements of densities

m1 m2 m3 m4 m5 m6 K1 K2 K3 K4 K5 K6

ms = Ks =

m1 K1 m2 K2 m3 K3 m4 K4 m5 K5 m6 K6

Acoustic Speaker Modeling

Speech Technology

slide-17
SLIDE 17

Acoustic Speaker Modeling

speakers with pathology type 2

points correspond to supervectors (SVs)

speakers with pathology type 1

  • Discriminate between different types of pathology
  • Create SVs of speakers
  • Train some classifier on labeled SVs
  • Create SV of test speaker
  • Classify SV of test speaker

Speech Technology

slide-18
SLIDE 18

Acoustic Speaker Modeling

supervector space

Train a regression (linear/SVR) Create SV for a test speaker Estimate degree of pathology

degree of pathology

  • Estimate degree of pathology

Speech Technology

slide-19
SLIDE 19

Prosodic Analysis

  • Prosody: rhythm, intonation, stress, and related attributes
  • Computation of prosodic features on word level, across several words or

across syllable nuclei or across voiced segments

  • Computation across several words requires ASR
  • Computation across syllable nuclei requires syllable detection
  • Local features:
  • Pauses before/after segments, signal energy,

segment duration, and F0

  • Calculation of mean, max., min., and std. dev.
  • Global features: jitter, shimmer, voiced/unvoiced characteristics

 ≈100-200 features per test utterance

Speech Technology

slide-20
SLIDE 20

Two-Mass Model of the Vocal Folds

Speech Technology

slide-21
SLIDE 21

Two-Mass Model of the Vocal Folds

Speech Technology

slide-22
SLIDE 22

Evaluation

  • Word accuracy (WA) and word correctness (WC)
  • Calculated features
  • Features of acoustic speaker models
  • Features of prosodic analysis
  • Features of 2-mass model
  • Correlation (Pearson & Spearman) based on calculated

features or WA, WC with human listener

  • Classification based on calculated features
  • Interpretation of relevant features after feature selection

Speech Technology

slide-23
SLIDE 23

Outline

  • Long-term goal of the project
  • Patient groups:
  • Parkinson’s disease (PD) patients
  • Stroke patients and patients with facial paresis
  • Speech technology
  • Facial analysis technology
  • Results
  • Summary
slide-24
SLIDE 24
  • PD: Increasing inability to express emotions with facial gestures

(important for communication)

  • Dysarthric speech often accompanied by other physical impairments
  • Facial paresis
  • Motor handicaps
  •  Analysis of facial gestures
  • Reduced mobility requires therapist to come to patient
  • High costs
  • Waste of therapist’s time

 Telemedical therapy

Analysis of Facial Gestures

Facial Analysis Technology

slide-25
SLIDE 25

Anger vs. Joy

Facial Analysis Technology

slide-26
SLIDE 26

Showing Emotions

Reduced Ability to Vary Facial Expressions with PD

slide-27
SLIDE 27

Unstressed look Lip pursing Closing of eyes Showing the teeth

Ability to Analyze Sequence of Movements

Dynamic Facial Expressions for Facial Paresis

slide-28
SLIDE 28

Grading of Facial Paresis

Facial Analysis Technology

  • Different Grading Systems are used
  • Most prominent: Grading System by House&Brackmann

[J. House and D. Brackmann: Facial nerve grading system in Otolaryngolocical Head and Neck Surgery, 1985]

  • 6 Grades:
  • House I

→ healthy person

  • House VI

→ completely paralyzed half of the patient's face

  • Grading is performed on (subjective) observations by expert
  • Problem: Objective tracking of cure processes
  • Solution: Automatic System for diagnosis support
slide-29
SLIDE 29

3D Camera: Principle

Dynamic Analysis of Facial Gestures

slide-30
SLIDE 30

Time-of-Flight (ToF) 3D Camera

  • Up to 50 Hz
  • More than 25k 3D points

(176*144 pixels)

  • Eye-safe infrared light /

no exposure

  • Precision for facial images
  • 40 cm: +/-

1mm

  • 80 cm: +/-

5mm

  • 120 cm: +/- 15mm

Dynamic Analysis of Facial Gestures

slide-31
SLIDE 31

Dynamic Analysis of Facial Gestures

Principles of Kinect

slide-32
SLIDE 32

Dynamic Analysis of Facial Gestures

Principles of Kinect

slide-33
SLIDE 33

Control image for the patient

Prototype

Illumination Stereo microphones TOF camera Webcam

Dynamic Analysis of Facial Gestures

slide-34
SLIDE 34

Telemedical System

Framework

slide-35
SLIDE 35

Measuring the Precision

slide-36
SLIDE 36

Measuring the Precision

slide-37
SLIDE 37

Dynamic Parameters: Interface for Therapist

slide-38
SLIDE 38

Dynamic Parameters: Interface for Therapist

slide-39
SLIDE 39

Dynamic Parameters: Interface for Therapist

slide-40
SLIDE 40

Dynamic Parameters: Interface for Therapist

slide-41
SLIDE 41

Dynamic Parameters: Interface for Therapist

slide-42
SLIDE 42

Outline

  • Long-term goal of the project
  • Patient groups:
  • Parkinson’s disease (PD) patients
  • Stroke patients and patients with facial paresis
  • Speech technology
  • Facial analysis technology
  • Results
  • Summary
slide-43
SLIDE 43

Hoehn and Yahr Scale

  • Stage 0:

No signs of disease.

  • Stage 1:

Unilateral symptoms only

  • Stage 1.5:

Unilateral and axial involvement

  • Stage 2:

Bilateral symptoms; no impairment of balance

  • Stage 2.5:

Mild bilateral disease with recovery on pull test

  • Stage 3:

Balance impairment; mild to moderate disease physically independent

  • Stage 4:

Severe disability, still able to walk or stand unassisted

  • Stage 5:

Needing a wheelchair or bedridden unless assisted

Parkinson’s Disease

slide-44
SLIDE 44

Czech Speech Data

Speech Analysis for Parkinson’s Disease

slide-45
SLIDE 45

Classification PD vs. Control Group

Speech Analysis for Parkinson’s Disease

  • 46 Czech speakers
  • 23 with PD (Hoehn & Yahr 1-2)
  • 23 as control group
  • Age matched
slide-46
SLIDE 46

ROC-Curve for T4 and Prosody

Speech Analysis for Parkinson’s Disease

Screening of 100 000 people >= 60  ≈ 1000 people with PD 1% more people found developing PD ≈ 10 people 1% more FA of people w/o PD ≈ 1000 people  Need for cheap robust screening (e.g. automatic telephone system) + more detailed screening + detailed exam %PD % FA

slide-47
SLIDE 47

Intelligibility

  • “The North Wind and the Sun”:
  • 107 words (71 disjoint)
  • Contains all German phonemes
  • Commonly used by speech therapists and

phoneticians

  • 28 patients with dysarthria
  • recorded during post-stroke rehabilitation
  • 39 to 76 years old
  • Speech Technology: word recognition, prosody

Speech Analysis for Dysarthric Speech

slide-48
SLIDE 48

1 Rater vs. Avg. of Other 3 Raters

Speech Analysis for Dysarthric Speech

slide-49
SLIDE 49

Human raters vs. Word Recognition

Correlation r = −0. 84 Speech Analysis for Dysarthric Speech

slide-50
SLIDE 50

Automatic Graduation of Paresis (2001)

Speech Analysis for Dysarthric Speech

4 grades of facial paresis were defined:

  • G1: healthy person (corresp. to House I)
  • G2: weak paresis (corresp. to House II + III)
  • G3: strong paresis (corresp. to House IV + V)
  • G4: paralysis (corresp. to House VI)

Result:

slide-51
SLIDE 51

New Study of Patients with Paresis (2011)

Speech Analysis for Dysarthric Speech

  • In 2011 a new study of patients with facial paresis was started
  • Cooperation with HNO-Clinic Erlangen
  • Goal: > 10 patients of every House class to be acquired and

analyzed

  • Current state: 15 patients of different classes acquired
slide-52
SLIDE 52

Gait Analysis – Sensor Platform

slide-53
SLIDE 53
  • Acceleration sensors
  • Gyroscopes
  • Wireless transmission of the sensor data

Gait Analysis – Sensor Platform

slide-54
SLIDE 54

Gait Analysis: 10 m Walking (4 Repetitions)

slide-55
SLIDE 55

Summary

  • Provide a telemedical rehabilitation unit for clinical/home use
  • Patient groups:
  • Parkinson’s disease (PD) patients
  • Stroke patients and patients with facial paresis
  • Speech technology
  • Facial analysis technology (still images and sequences)
  • Results
  • Up to 91% classification PD vs. control group & 0.97 AUC
  • Correlation of -0.84 between automatic classification and

human raters for dysarthric speech using WC & prosody

slide-56
SLIDE 56

Thank you for your attention

Supported by Deutsche Forschungsgemeinschaft (DFG) Wilhelm-Sander-Stiftung noeth@ informatik.uni-erlangen.de