TY - GEN
T1 - Word Recognition from Continuous Articulatory Movement Time-Series Data using Symbolic Representations
AU - Wang, Jun
AU - Balasubramanian, Arvind
AU - De La Vega, Luis Mojica
AU - Green, Jordan R.
AU - Samal, Ashok
AU - Prabhakaran, Balakrishnan
N1 - Funding Information:
This work was in part funded by Excellence in Education Fund, University of Texas at Dallas, Barkley Trust, University of Nebraska-Lincoln, and a grant awarded by the National Institutes of Health (R01 DC009890/DC/NIDCD NIH HHS/United States). We would like to thank Dr. Tom Carrell, Dr. Lori Synhorst, Dr. Mili Kuruvilla, Cynthia Didion, Rebecca Hoesing, Kate Lippincott, Kayanne Hamling, Kelly Veys, Toni Hoffer, and Taylor Boney for their contribution to subject recruitment, data collection, data management, and data processing.
Funding Information:
This work was in part fundedby Excellence in Education Fund, University of Texas at Dallas, Barkley Trust, University of Nebraska-Lincoln, and a grant awarded by the National Institutes of Health (R01 DC009890/DC/NIDCD NIH HHS/United States). We would like to thank Dr. Tom Carrell, Dr. Lori Synhorst, Dr. Mili Kuruvilla, Cynthia Didion, Rebecca Hoesing, Kate Lippincott, Kayanne Hamling, Kelly Veys, Toni Hoffer, and Taylor Boney for their contribution to subject recruitment, data collection, data management, and data processing.
Publisher Copyright:
© 2013 Association for Computational Linguistics.
PY - 2013
Y1 - 2013
N2 - Although still in experimental stage, articulation-based silent speech interfaces may have significant potential for facilitating oral communication in persons with voice and speech problems. An articulation-based silent speech interface converts articulatory movement information to audible words. The complexity of speech production mechanism (e.g., coarticulation) makes the conversion a formidable problem. In this paper, we reported a novel, real-time algorithm for recognizing words from continuous articulatory movements. This approach differed from prior work in that (1) it focused on word-level, rather than phoneme-level; (2) online segmentation and recognition were conducted at the same time; and (3) a symbolic representation (SAX) was used for data reduction in the original articulatory movement timeseries. A data set of 5,900 isolated word samples of tongue and lip movements was collected using electromagnetic articulograph from eleven English speakers. The average speaker-dependent recognition accuracy was up to 80.00%, with an average latency of 302 miliseconds for each word prediction. The results demonstrated the effectiveness of our approach and its potential for building a real-time articulationbased silent speech interface for clinical applications. The across-speaker variation of the recognition accuracy was discussed.
AB - Although still in experimental stage, articulation-based silent speech interfaces may have significant potential for facilitating oral communication in persons with voice and speech problems. An articulation-based silent speech interface converts articulatory movement information to audible words. The complexity of speech production mechanism (e.g., coarticulation) makes the conversion a formidable problem. In this paper, we reported a novel, real-time algorithm for recognizing words from continuous articulatory movements. This approach differed from prior work in that (1) it focused on word-level, rather than phoneme-level; (2) online segmentation and recognition were conducted at the same time; and (3) a symbolic representation (SAX) was used for data reduction in the original articulatory movement timeseries. A data set of 5,900 isolated word samples of tongue and lip movements was collected using electromagnetic articulograph from eleven English speakers. The average speaker-dependent recognition accuracy was up to 80.00%, with an average latency of 302 miliseconds for each word prediction. The results demonstrated the effectiveness of our approach and its potential for building a real-time articulationbased silent speech interface for clinical applications. The across-speaker variation of the recognition accuracy was discussed.
KW - Laryngectomy
KW - SAX
KW - Silent speech recognition
KW - Support vector machine
KW - Time-series
UR - http://www.scopus.com/inward/record.url?scp=85009500921&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85009500921&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85009500921
T3 - SLPAT 2013 - 4th Workshop on Speech and Language Processing for Assistive Technologies, SLPAT 2013, Workshop Proceedings
SP - 119
EP - 127
BT - SLPAT 2013 - 4th Workshop on Speech and Language Processing for Assistive Technologies, SLPAT 2013, Workshop Proceedings
PB - Association for Computational Linguistics (ACL)
T2 - 4th Workshop on Speech and Language Processing for Assistive Technologies, SLPAT 2013
Y2 - 21 August 2013 through 22 August 2013
ER -