[PPT] - Incl Inclusi usive Des ve Design ign Dee Deep Lear p Learning PowerPoint Presentation

SLIDE 1

Incl Inclusi usive Des ve Design ign

Dee Deep Lear p Learning ning on

n Aud

Audio in Azu io in Azure re

Swetha Machanavajhala, Software Engineer II Xiaoyong Zhu, Senior Data Scientist

@swethaMVNV

SLIDE 2

Swetha Machanavajhala

Carbon monoxide detector was beeping for a WE WEEK K at Swetha’s house. Since she is deaf, she was unaware until a neighbor informed her.

Tru rue life e life-thr threatenin eatening g in incid cident ent

SLIDE 3

SLIDE 4

SLIDE 5

DISABILITY

≠

PERSONAL HEALTH CONDITION

DISABILITY

=

MISMATCHED HUMAN INTERACTIONS

SLIDE 6

Incl Inclusi usive Des ve Design ign

SLIDE 7

SLIDE 8

SLIDE 9

SLIDE 10

SLIDE 11

SLIDE 12

SLIDE 13

SLIDE 14

Visualizing Sounds React in a second

SLIDE 15

SLIDE 16

SLIDE 17

SLIDE 18

Capturing loudness real-time

SLIDE 19

SLIDE 20

Currently…

SLIDE 21

Currently…

SLIDE 22

Hearing AI can transcribe phone calls

SLIDE 23

Hearing AI can do more…

SLIDE 24

SLIDE 25

Xiaoyong Zhu

Deep Learning for Audio in Azure

SLIDE 26

Landscape

Sound based predictive maintenance

https://www.3dsig.com/

SLIDE 27

Landscape

SDK and product to turn machine sounds to actions

https://www.otosense.com/

SLIDE 28

Landscape

enables OEMs to embed contextual awareness

nto devices.

https://www.audioanalytic.com/

SLIDE 29

SLIDE 30

Dataset

SLIDE 31

SLIDE 32

Convert 1-dimensional array to 2-dimensional matrix

SLIDE 33

CNN on audio

Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination

SLIDE 34

CNN on audio

Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination

SLIDE 35

CNN on audio

Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination

SLIDE 36

Selecting a right band number is important

SLIDE 37

Network Architecture

11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band

SLIDE 38

Network Architecture

11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band

SLIDE 39

Network Architecture

11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band

SLIDE 40

Intelligent Sound Prediction - Architecture

SLIDE 41

Demo Hearing AI can recognize sounds

SLIDE 42

SLIDE 43

Performance

SLIDE 44

SLIDE 45

Artificial Intelligence proves sounds need not be heard!

Sound Movement Speech Phone calls Localization Specific sounds

SLIDE 46