Incl Inclusi usive Des ve Design ign Dee Deep Lear p Learning - PowerPoint PPT Presentation
Incl Inclusi usive Des ve Design ign Dee Deep Lear p Learning ning on on Aud Audio in Azu io in Azure re Swetha Machanavajhala, Software Engineer II Xiaoyong Zhu, Senior Data Scientist @swethaMVNV Tru rue life e life-thr
Incl Inclusi usive Des ve Design ign Dee Deep Lear p Learning ning on on Aud Audio in Azu io in Azure re Swetha Machanavajhala, Software Engineer II Xiaoyong Zhu, Senior Data Scientist @swethaMVNV
Tru rue life e life-thr threatenin eatening g in incid cident ent Carbon monoxide detector was beeping for a WE WEEK K at Swetha’s house. Since she is deaf, she was unaware until a neighbor informed her. Swetha Machanavajhala
DISABILITY DISABILITY = ≠ PERSONAL HEALTH MISMATCHED CONDITION HUMAN INTERACTIONS
Incl Inclusi usive Des ve Design ign
Visualizing Sounds React in a second
Capturing loudness real-time
Currently…
Currently…
Hearing AI can transcribe phone calls
Hearing AI can do more…
Deep Learning for Audio in Azure Xiaoyong Zhu
Landscape Sound based predictive maintenance https://www.3dsig.com/
Landscape SDK and product to turn machine sounds to actions https://www.otosense.com/
Landscape enables OEMs to embed contextual awareness onto devices. https://www.audioanalytic.com/
Dataset
Convert 1-dimensional array to 2-dimensional matrix
CNN on audio Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination
CNN on audio Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination
CNN on audio Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination
Selecting a right band number is important
Network Architecture 11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band
Network Architecture 11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band
Network Architecture 11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band
Intelligent Sound Prediction - Architecture
Demo Hearing AI can recognize sounds
Performance
Sound Movement Localization Speech Specific sounds Phone calls Artificial Intelligence proves sounds need not be heard!
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.