TY - JOUR
T1 - An optimal set of flesh points on tongue and lips for speech-movement classification
AU - Wang, Jun
AU - Samal, Ashok
AU - Rong, Panying
AU - Green, Jordan R.
N1 - Publisher Copyright:
© 2016 American Speech-Language-Hearing Association.
PY - 2016/2
Y1 - 2016/2
N2 - Purpose: The authors sought to determine an optimal set of flesh points on the tongue and lips for classifying speech movements. Method: The authors used electromagnetic articulographs (Carstens AG500 and NDI Wave) to record tongue and lip movements from 13 healthy talkers who articulated 8 vowels, 11 consonants, a phonetically balanced set of words, and a set of short phrases during the recording. We used a machine-learning classifier (supportector machine) to classify the speech stimuli on the basis of articulatory movements. We then compared classification accuracies of the flesh-point combinations to determine an optimal set of sensors. Results: When data from the 4 sensors (T1: the vicinity between the tongue tip and tongue blade; T4: the tongue-body back; UL: the upper lip; and LL: the lower lip) were combined, phoneme and word classifications were most accurate and were comparable with the full set (including T2: the tongue-body front; and T3: the tongue-body front). Conclusion: We identified a 4-sensor set—that is, T1, T4, UL, LL that yielded a classification accuracy (91%–95%) equivalent to that using all 6 sensors. These findings provide an empirical basis for selecting sensors and their locations for scientific and emerging clinical applications that incorporate articulatory movements.
AB - Purpose: The authors sought to determine an optimal set of flesh points on the tongue and lips for classifying speech movements. Method: The authors used electromagnetic articulographs (Carstens AG500 and NDI Wave) to record tongue and lip movements from 13 healthy talkers who articulated 8 vowels, 11 consonants, a phonetically balanced set of words, and a set of short phrases during the recording. We used a machine-learning classifier (supportector machine) to classify the speech stimuli on the basis of articulatory movements. We then compared classification accuracies of the flesh-point combinations to determine an optimal set of sensors. Results: When data from the 4 sensors (T1: the vicinity between the tongue tip and tongue blade; T4: the tongue-body back; UL: the upper lip; and LL: the lower lip) were combined, phoneme and word classifications were most accurate and were comparable with the full set (including T2: the tongue-body front; and T3: the tongue-body front). Conclusion: We identified a 4-sensor set—that is, T1, T4, UL, LL that yielded a classification accuracy (91%–95%) equivalent to that using all 6 sensors. These findings provide an empirical basis for selecting sensors and their locations for scientific and emerging clinical applications that incorporate articulatory movements.
UR - http://www.scopus.com/inward/record.url?scp=84959197729&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959197729&partnerID=8YFLogxK
U2 - 10.1044/2015_JSLHR-S-14-0112
DO - 10.1044/2015_JSLHR-S-14-0112
M3 - Article
C2 - 26564030
AN - SCOPUS:84959197729
SN - 1092-4388
VL - 59
SP - 15
EP - 26
JO - Journal of Speech, Language, and Hearing Research
JF - Journal of Speech, Language, and Hearing Research
IS - 1
ER -