SLIDE 1
Elmar Nöth, Tobias Bocklet, Arnd Gebhard
A System for Speech and 3D Facial Image Acquisition, Modeling and Analysis
Wednesday, 30 May 2012
SLIDE 2 Outline
- Motivation: Long-term goal of the project
- Patient groups:
- Parkinson’s disease (PD) patients
- Stroke patients and patients with facial paresis
- Speech technology
- Facial analysis technology
- Results
- Summary
SLIDE 3 Necessity of Evaluation
- Diagnosis
- How intelligible is the patient?
(holistic impression)
- How strongly does the patient nasalize?
(distinct aspect)
- Therapy control
- Has the situation of the patient improved during therapy?
- Comparison of therapy methods
- Which therapy method leads to the best results for a group of
patients?
- Screening
- Is the quality of a child’s speech according to its age?
- Computer-assisted therapy
- Did the patient perform the exercise correctly?
Motivation
SLIDE 4 Necessity of Evaluation
- Diagnosis
- How intelligible is the patient?
(holistic impression)
- How strongly does the patient nasalize?
(distinct aspect)
- Therapy control
- Has the situation of the patient improved during therapy?
- Comparison of therapy methods
- Which therapy method leads to the best results for a group of
patients?
- Screening
- Is the quality of a child’s speech according to its age?
- Computer-assisted therapy
- Did the patient perform the exercise correctly?
Motivation
SLIDE 5 Long-term Goal of the Project
- Provide a telemedical rehabilitation unit for clinical/home use
- Support speech analysis and analysis of facial gestures and …
(gait, cognitive abilities open, flexible platform)
- Patient groups:
- Parkinson’s disease (PD) patients
- Stroke patients and patients with facial paresis
- Instruct the patient what to do
- Evaluate the exercises
- Compare with previous sessions
- Summarize exercises for therapist
Motivation
SLIDE 6 Outline
- Long-term goal of the project
- Patient groups:
- Parkinson’s disease (PD) patients
- Stroke patients and patients with facial paresis
- Speech technology
- Facial analysis technology
- Results
- Summary
SLIDE 7 Parkinson’s Disease
- Degenerative disorder of the central nervous system
- Death of dopamine-containing cells in the substantia nigra
- Cause of cell-death is unknown
- Second most common neurodegenerative disorder
(after Alzheimer's disease)
- Prevalence ≈ 0.3% (whole population)
- More common in the elderly:
1% of > 60 years, 4% of > 80 years
- Incidence of PD ≈ 8 - 18 per 100,000 people
- Onset in most cases > 50 years, mean onset ≈ 60 years
Patient Groups
SLIDE 8 Speech-related Symptoms of PD
- Hypophonia (soft speech)
- Monotonic speech: Speech quality tends to be soft, hoarse, and
monotonous
- Festinating speech: excessively rapid, soft, poorly-intelligible
speech
- Drooling: most likely caused by a weak, infrequent swallow
- Dysphagia (impaired ability to swallow)
- Dysarthria
Patient Groups
SLIDE 9 Dysarthria
- A speech disorder affecting the coordination of muscles in the
vocal tract, face, larynx, and respiratory system (dysarthrophonia)
- Mostly results from a neurological injury,
such as a stroke or other kind of brain injury
Patient Groups
SLIDE 10 Dysarthria
- A speech disorder affecting the coordination of muscles in the
vocal tract, face, larynx, and respiratory system (dysarthrophonia)
- Mostly results from a neurological injury,
such as a stroke or other kind of brain injury
Patient Groups
SLIDE 11 Outline
- Long-term goal of the project
- Patient groups:
- Parkinson’s disease (PD) patients
- Stroke patients and patients with facial paresis
- Speech technology
- Facial analysis technology
- Results
- Summary
SLIDE 12 Speech Technology
- Automatic speech processing methods
- Word and phoneme recognition
- Acoustic speaker modeling
- Prosodic analysis
- Model of excitation signal
- Evaluation measures
SLIDE 13 features
classi- fication words with highest probability
word chain
Word and Phoneme Recognition
- Off-the-shelf technology
- Semi-continuous HMMs
- Easier to adapt with small amounts of data
- Comparable results with continuous models
- 11 Mel cepstrum coefficients + energy + 1. derivative
Speech Technology
SLIDE 14 Acoustic Speaker Modeling
- Idea:
- Acoustic space of speakers can be modeled
- Space represents the multidimensional characteristics of voice of a
speaker
- Degree of pathology varies in acoustic space
- Find characteristics of degree of speech disorder
- Approach:
- Acoustics modeled by Gaussian Mixture Models (GMMs)
- Train Universal Background Model (UBM) with normal speakers
- Train GMM of path. speakers and transform into vector
- Perform a classification/regression (depends on the task)
Speech Technology
SLIDE 15 Acoustic Speaker Modeling
Gaussian density of UBM feature dimension 1 feature dimension 2 features of healthy speakers Gaussian density of speaker model features of a path. speaker
- Variations of speakers with different degrees of pathology
- Can be modeled by adaptation from UBM to GMM
Speech Technology
SLIDE 16 Gaussian densities (i = 1,.., N) of speaker model defined by mean values(mi) und covariance matrices (Ki)
feature dimension 1 feature dimension 2
Concatenation
m1 m2 m3 m4 m5 m6 K1 K2 K3 K4 K5 K6
ms = Ks =
m1 K1 m2 K2 m3 K3 m4 K4 m5 K5 m6 K6
Acoustic Speaker Modeling
Speech Technology
SLIDE 17 Acoustic Speaker Modeling
speakers with pathology type 2
points correspond to supervectors (SVs)
speakers with pathology type 1
- Discriminate between different types of pathology
- Create SVs of speakers
- Train some classifier on labeled SVs
- Create SV of test speaker
- Classify SV of test speaker
Speech Technology
SLIDE 18 Acoustic Speaker Modeling
supervector space
Train a regression (linear/SVR) Create SV for a test speaker Estimate degree of pathology
degree of pathology
- Estimate degree of pathology
Speech Technology
SLIDE 19 Prosodic Analysis
- Prosody: rhythm, intonation, stress, and related attributes
- Computation of prosodic features on word level, across several words or
across syllable nuclei or across voiced segments
- Computation across several words requires ASR
- Computation across syllable nuclei requires syllable detection
- Local features:
- Pauses before/after segments, signal energy,
segment duration, and F0
- Calculation of mean, max., min., and std. dev.
- Global features: jitter, shimmer, voiced/unvoiced characteristics
≈100-200 features per test utterance
Speech Technology
SLIDE 20
Two-Mass Model of the Vocal Folds
Speech Technology
SLIDE 21
Two-Mass Model of the Vocal Folds
Speech Technology
SLIDE 22 Evaluation
- Word accuracy (WA) and word correctness (WC)
- Calculated features
- Features of acoustic speaker models
- Features of prosodic analysis
- Features of 2-mass model
- Correlation (Pearson & Spearman) based on calculated
features or WA, WC with human listener
- Classification based on calculated features
- Interpretation of relevant features after feature selection
Speech Technology
SLIDE 23 Outline
- Long-term goal of the project
- Patient groups:
- Parkinson’s disease (PD) patients
- Stroke patients and patients with facial paresis
- Speech technology
- Facial analysis technology
- Results
- Summary
SLIDE 24
- PD: Increasing inability to express emotions with facial gestures
(important for communication)
- Dysarthric speech often accompanied by other physical impairments
- Facial paresis
- Motor handicaps
- Analysis of facial gestures
- Reduced mobility requires therapist to come to patient
- High costs
- Waste of therapist’s time
Telemedical therapy
Analysis of Facial Gestures
Facial Analysis Technology
SLIDE 25
Anger vs. Joy
Facial Analysis Technology
SLIDE 26
Showing Emotions
Reduced Ability to Vary Facial Expressions with PD
SLIDE 27
Unstressed look Lip pursing Closing of eyes Showing the teeth
Ability to Analyze Sequence of Movements
Dynamic Facial Expressions for Facial Paresis
SLIDE 28 Grading of Facial Paresis
Facial Analysis Technology
- Different Grading Systems are used
- Most prominent: Grading System by House&Brackmann
[J. House and D. Brackmann: Facial nerve grading system in Otolaryngolocical Head and Neck Surgery, 1985]
→ healthy person
→ completely paralyzed half of the patient's face
- Grading is performed on (subjective) observations by expert
- Problem: Objective tracking of cure processes
- Solution: Automatic System for diagnosis support
SLIDE 29
3D Camera: Principle
Dynamic Analysis of Facial Gestures
SLIDE 30 Time-of-Flight (ToF) 3D Camera
- Up to 50 Hz
- More than 25k 3D points
(176*144 pixels)
- Eye-safe infrared light /
no exposure
- Precision for facial images
- 40 cm: +/-
1mm
5mm
Dynamic Analysis of Facial Gestures
SLIDE 31
Dynamic Analysis of Facial Gestures
Principles of Kinect
SLIDE 32
Dynamic Analysis of Facial Gestures
Principles of Kinect
SLIDE 33
Control image for the patient
Prototype
Illumination Stereo microphones TOF camera Webcam
Dynamic Analysis of Facial Gestures
SLIDE 34
Telemedical System
Framework
SLIDE 35
Measuring the Precision
SLIDE 36
Measuring the Precision
SLIDE 37
Dynamic Parameters: Interface for Therapist
SLIDE 38
Dynamic Parameters: Interface for Therapist
SLIDE 39
Dynamic Parameters: Interface for Therapist
SLIDE 40
Dynamic Parameters: Interface for Therapist
SLIDE 41
Dynamic Parameters: Interface for Therapist
SLIDE 42 Outline
- Long-term goal of the project
- Patient groups:
- Parkinson’s disease (PD) patients
- Stroke patients and patients with facial paresis
- Speech technology
- Facial analysis technology
- Results
- Summary
SLIDE 43 Hoehn and Yahr Scale
No signs of disease.
Unilateral symptoms only
Unilateral and axial involvement
Bilateral symptoms; no impairment of balance
Mild bilateral disease with recovery on pull test
Balance impairment; mild to moderate disease physically independent
Severe disability, still able to walk or stand unassisted
Needing a wheelchair or bedridden unless assisted
Parkinson’s Disease
SLIDE 44
Czech Speech Data
Speech Analysis for Parkinson’s Disease
SLIDE 45 Classification PD vs. Control Group
Speech Analysis for Parkinson’s Disease
- 46 Czech speakers
- 23 with PD (Hoehn & Yahr 1-2)
- 23 as control group
- Age matched
SLIDE 46
ROC-Curve for T4 and Prosody
Speech Analysis for Parkinson’s Disease
Screening of 100 000 people >= 60 ≈ 1000 people with PD 1% more people found developing PD ≈ 10 people 1% more FA of people w/o PD ≈ 1000 people Need for cheap robust screening (e.g. automatic telephone system) + more detailed screening + detailed exam %PD % FA
SLIDE 47 Intelligibility
- “The North Wind and the Sun”:
- 107 words (71 disjoint)
- Contains all German phonemes
- Commonly used by speech therapists and
phoneticians
- 28 patients with dysarthria
- recorded during post-stroke rehabilitation
- 39 to 76 years old
- Speech Technology: word recognition, prosody
Speech Analysis for Dysarthric Speech
SLIDE 48
1 Rater vs. Avg. of Other 3 Raters
Speech Analysis for Dysarthric Speech
SLIDE 49
Human raters vs. Word Recognition
Correlation r = −0. 84 Speech Analysis for Dysarthric Speech
SLIDE 50 Automatic Graduation of Paresis (2001)
Speech Analysis for Dysarthric Speech
4 grades of facial paresis were defined:
- G1: healthy person (corresp. to House I)
- G2: weak paresis (corresp. to House II + III)
- G3: strong paresis (corresp. to House IV + V)
- G4: paralysis (corresp. to House VI)
Result:
SLIDE 51 New Study of Patients with Paresis (2011)
Speech Analysis for Dysarthric Speech
- In 2011 a new study of patients with facial paresis was started
- Cooperation with HNO-Clinic Erlangen
- Goal: > 10 patients of every House class to be acquired and
analyzed
- Current state: 15 patients of different classes acquired
SLIDE 52
Gait Analysis – Sensor Platform
SLIDE 53
- Acceleration sensors
- Gyroscopes
- Wireless transmission of the sensor data
Gait Analysis – Sensor Platform
SLIDE 54
Gait Analysis: 10 m Walking (4 Repetitions)
SLIDE 55 Summary
- Provide a telemedical rehabilitation unit for clinical/home use
- Patient groups:
- Parkinson’s disease (PD) patients
- Stroke patients and patients with facial paresis
- Speech technology
- Facial analysis technology (still images and sequences)
- Results
- Up to 91% classification PD vs. control group & 0.97 AUC
- Correlation of -0.84 between automatic classification and
human raters for dysarthric speech using WC & prosody
SLIDE 56
Thank you for your attention
Supported by Deutsche Forschungsgemeinschaft (DFG) Wilhelm-Sander-Stiftung noeth@ informatik.uni-erlangen.de