Go to Top

Deep Learning for Speech Recognition

This blog post gives a brief overview of recent Deep Learning for Speech Recognition (NLP) publications sampled from the Speech Recognition category published on http://deeplearning.university – See also previous posting on Deep Learning for Natural Language Processing (NLP).

Best regards,

Amund Tveit

Acoustic Modeling

  1. Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition
  2. Deep learning vector quantization for acoustic information retrieval
  3. A Deep Learning Pipeline for Image Understanding and Acoustic Modeling
  4. Improving deep neural network acoustic models using generalized maxout networks
  5. Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks
  6. Improving Acoustic Model for Vietnamese Large Vocabulary Continuous Speech Recognition System Using Deep Bottleneck Features
  7. Phonotactic language recognition based on Dnn-hmm acoustic model

Phoneme and Phone Recognition and Segmentation

  1. Contrastive auto-encoder for phoneme recognition
  2. Research on deep neural network’s hidden layers in phoneme recognition
  3. Joint Phoneme Segmentation Inference and Classification using CRFs
  4. Labeling unsegmented sequence data with Dnn-hmm and its application for speech recognition
  5. Cross-language speech attribute detection and phone recognition for Tibetan using deep learning
  6. A fusion approach to spoken language identification based on combining multiple phone recognizers and speech attribute detectors
  7. Patch-Based Models of Spectrogram Edges for Phone Classification
  8. Fine context, low-rank, softplus deep neural networks for mobile speech recognition

Emotion Recognition

  1. Deep Learning for Emotional Speech Recognition
  2. Spoken emotion recognition using deep learning
  3. Improving generation performance of speech emotion recognition by denoising autoencoders
  4. Speech Emotion Recognition Using Cnn

Optimization and Stochastic Gradient Descent

  1. A comparison of two optimization techniques for sequence discriminative training of deep neural networks
  2. Investigation of stochastic Hessian-Free optimization in Deep neural networks for speech recognition
  3. 1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs
  4. On Parallelizability of Stochastic Gradient Descent for Speech DNNs

Generative Models

  1. Deep Generative and Discriminative Models for Speech Recognition
  2. Voice Conversion Using Deep Neural Networks with Layer-Wise Generative Training

Feature Extraction and Detection

  1. Should deep neural nets have ears? The role of auditory features in deep learning approaches
  2. Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments
  3. Multiple time-span feature fusion for deep neural network modeling

Scalability and Performance Analysis and Tuning

  1. An Investigation of Implementation and Performance Analysis of Dnn Based Speech Synthesis System
  2. Acceleration Strategies for Speech Recognition Based on Deep Neural Networks
  3. Parallel Deep Neural Network Training for LVCSR Tasks using Blue Gene/Q
  4. First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs
  5. Deep convolutional neural networks for large-scale speech tasks

Multilingual and Multimodal Speech Recognition

Noise Handling

  1. Noise-Robust Speech Recognition Using Deep Neural Network
  2. Neural Network Based Pitch Tracking In Very Noisy Speech

Uncategorized (for now)

  1. RASR/NN: The RWTH neural network toolkit for speech recognition
  2. A historical perspective of speech recognition
  3. Improving deep neural networks for LVCSR using dropout and shrinking structure
  4. Using Deep Belief Networks for Vector-Based Speaker Recognition
  5. Statistical Parametric Speech Synthesis using Weighted Multi-distribution Deep Belief Network
  6. Deep Learning of Orthographic Representations in Baboons
  7. Speaker adaptation of deep neural network based on discriminant codes
  8. Deep learning of split temporal context for automatic speech recognition
  10. A Deep Learning Approach to Data-driven Parameterizations for Statistical Parametric Speech Synthesis
  11. Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks
  12. Computational modeling and validation of the motor contribution to speech perception
  13. Deep Neural Networks For Spoken Dialog Systems
  14. Ensemble Learning Approaches in Speech Recognition
  15. Raw Speech Signal-based Continuous Speech Recognition using Convolutional Neural Networks
  16. Feature Mapping of Multiple Beamformed Sources for Robust Overlapping Speech Recognition Using a Microphone Array
  17. Dysarthric Speech Recognition Using a Convolutive Bottleneck Network
  18. Mapping between ultrasound and vowel speech using Dnn framework
  19. Performance evaluation of deep bottleneck features for spoken language identification
  20. Building an ensemble of Cd-dnn-hmm acoustic model using random forests of phonetic decision trees
  21. Speaker adaptation of hybrid Nn/hmm model for speech recognition based on singular value decomposition
  22. Decision tree based state tying for speech recognition using Dnn derived embeddings
  23. Deep belief network based Crf for spoken language understanding
  24. Non-negative Factor Analysis of Gaussian Mixture Model Weight Adaptation for Language and Dialect Recognition


About Amund Tveit (@atveit - amund@memkite.com)

Amund Tveit works in Memkite on developing large-scale Deep Learning and Search (Convolutional Neural Network) with Swift and Metal for iOS (see deeplearning.education for a Memkite app video demo). He also maintains the deeplearning.university bibliography (github.com/memkite/DeepLearningBibliography)

Amund previously co-founded Atbrox , a cloud computing/big data service company (partner with Amazon Web Services), also doing some “sweat equity” startup investments in US and Nordic startups. His presentations about Hadoop/Mapreduce Algorithms and Search were among top 3% of all SlideShare presentations in 2013 and his blog posts has been frequently quoted by Big Data Industry Leaders and featured on front pages of YCombinator News and Reddit Programming

He previously worked for Google, where he was tech.lead for Google News for iPhone (mentioned as “Google News Now Looks Beautiful On Your iPhone” on Mashable.com), lead a team measuring and improving Google Services in the Scandinavian Countries (Maps and Search) and worked as a software engineer on infrastructure projects. Other work experience include telecom (IBM Canada) and insurance/finance (Storebrand).

Amund has a PhD in Computer Science. His publications has been cited more than 500 times. He also holds 4 US patents in the areas of search and advertisement technology, and a pending US patent in the area of brain-controlled search with consumer-level EEG devices.

Amund enjoys coding, in particular Python, C++ and Swift (iOS)

2 Responses to "Deep Learning for Speech Recognition"

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>