Speech recognition using deep neural networks trained with non-uniform frame-level cost functions

Becerra de la Rosa, Aldonso; De la Rosa Vargas, José Ismael; González Ramírez, Efrén; Pedroza Ramírez, Ángel David; Martínez, Juan Manuel; Escalante, Nivia

DSpace Principal
→
Maestría en Ciencias del Procesamiento de la Información
→
*Documentos Académicos*-- M. en Ciencias del Proc. de la Info.
→
Ver ítem

dc.contributor	31249	es_ES
dc.contributor.other	https://orcid.org/0000-0002-7337-8974
dc.contributor.other	https://orcid.org/0000-0002-8060-6170
dc.coverage.spatial	Global	es_ES
dc.creator	Becerra de la Rosa, Aldonso
dc.creator	De la Rosa Vargas, José Ismael
dc.creator	González Ramírez, Efrén
dc.creator	Pedroza Ramírez, Ángel David
dc.creator	Martínez, Juan Manuel
dc.creator	Escalante, Nivia
dc.date.accessioned	2020-05-06T20:42:07Z
dc.date.available	2020-05-06T20:42:07Z
dc.date.issued	2017-11
dc.identifier	info:eu-repo/semantics/publishedVersion	es_ES
dc.identifier.issn	2573-0770	es_ES
dc.identifier.uri	http://ricaxcan.uaz.edu.mx/jspui/handle/20.500.11845/1894
dc.identifier.uri	https://doi.org/10.48779/9ds7-t936
dc.description.abstract	The aim of this paper is to present two new variations of the frame-level cost function for training a Deep neural network in order to achieve better word error rates in speech recognition. Minimization functions of a neural network are salient aspects to deal with when researchers are working on machine learning, and hence their improvement is a process of constant evolution. In the first proposed method, the conventional cross-entropy function can be mapped to a nonuniform loss function based on its corresponding extropy (a complementary dual function), enhancing the frames that have ambiguity in their belonging to specific senones (tied-triphone states in a hidden Markov model). The second proposition is a fusion of the proposed mapped cross-entropy and the boosted cross-entropy function, which emphasizes those frames with low target posterior probability. The developed approaches have been performed by using a personalized mid-vocabulary speaker-independent voice corpus. This dataset is employed for recognition of digit strings and personal name lists in Spanish from the northern central part of Mexico on a connected-words phone dialing task. A relative word error rate improvement of 12.3% and 10.7% is obtained with the two proposed approaches, respectively, regarding the conventional well-established crossentropy objective function.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	IEEE	es_ES
dc.relation.uri	generalPublic	es_ES
dc.rights	Atribución-NoComercial-SinDerivadas 3.0 Estados Unidos de América	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.source	Proc. of the IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC2017), at Ixtapa, Mexico, pp. 1-6, 2017.	es_ES
dc.subject.classification	INGENIERIA Y TECNOLOGIA [7]	es_ES
dc.subject.other	Speech recognition	es_ES
dc.subject.other	Deep neural network	es_ES
dc.subject.other	Deep Learning	es_ES
dc.title	Speech recognition using deep neural networks trained with non-uniform frame-level cost functions	es_ES
dc.type	info:eu-repo/semantics/conferencePaper	es_ES