HypernasalityNet: Deep recurrent neural network for automatic hypernasality detection

Int J Med Inform. 2019 Sep:129:1-12. doi: 10.1016/j.ijmedinf.2019.05.023. Epub 2019 May 23.

Abstract

Background: Cleft palate patients have inability to produce adequate velopharyngeal closure, which results in hypernasal speech. In clinic, hypernasal speech is assessed through subject assessment by speech language pathologists. Automatic hypernasal speech detection can provide aided diagnoses for speech language pathologists and clinicians.

Objectives: This study aims to develop Long Short-Term Memory (LSTM) based Deep Recurrent Neural Network (DRNN) system to detect hypernasal speech from cleft palate patients, thus to provide aided diagnoses for clinical operation and speech therapy. Meanwhile, the feature mining and classification abilities of LSTM-DRNN system are explored.

Methods: The utilized speech recordings are 14,544 vowels in Mandarin. Speech data is collected from 144 children (72 children with hypernasality and 72 controls) with the age of 5-12 years old. This work proposes a LSTM based DRNN system to achieve automatic hypernasal speech detection, since LSTM-DRNN can learn short-time dependences of hypernasal speech. The vocal tract based features are fed into LSTM-DRNN to achieve deep mining of features. To verify the feature mining ability of LSTM-DRNN, features projected by LSTM-DRNN are fed into shallow classifiers instead of the following two fully connected layers and a softmax layer. And the features without the projecting process of LSTM-DRNN are directly fed into shallow classifiers as a comparison. Hypernasality-sensitive vowels (/a/, /i/, and /u/) are analyzed for the first time.

Results: This LSTM-DRNN based hypernasal speech detection method reaches higher detection accuracy than that using shallow classifiers, since LSTM-DRNN mines features through time axis and network depth simultaneously. The proposed LSTM-DRNN based hypernasality detection system reaches the highest accuracy of 93.35%. According to the analysis of hypernasality-sensitive vowels, the experimental result concludes that vowels /i/ and /u/ are the most sensitive vowels to hypernasal speech.

Conclusions: The results show that LSTM-DRNN has robust feature mining ability and classification ability. This is the first work that applies the LSTM-DRNN technique to automatically detect hypernasality in cleft palate speech. The experimental results demonstrate the potential of deep learning on pathologist speech detection.

Keywords: Cleft palate speech; Deep recurrent neural network; Feature mining; Hypernasal speech; Long short-term memory.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Child
  • Child, Preschool
  • Cleft Palate / complications
  • Female
  • Humans
  • Male
  • Neural Networks, Computer*
  • Nose Diseases / diagnosis*
  • Nose Diseases / etiology
  • Speech