Repositorio Dspace

Speech recognition using deep neural networks trained with non-uniform frame-level cost functions

Mostrar el registro sencillo del ítem

dc.contributor 31249 es_ES
dc.contributor.other https://orcid.org/0000-0002-7337-8974
dc.contributor.other https://orcid.org/0000-0002-8060-6170
dc.coverage.spatial Global es_ES
dc.creator Becerra de la Rosa, Aldonso
dc.creator De la Rosa Vargas, José Ismael
dc.creator González Ramírez, Efrén
dc.creator Pedroza Ramírez, Ángel David
dc.creator Martínez, Juan Manuel
dc.creator Escalante, Nivia
dc.date.accessioned 2020-05-06T20:42:07Z
dc.date.available 2020-05-06T20:42:07Z
dc.date.issued 2017-11
dc.identifier info:eu-repo/semantics/publishedVersion es_ES
dc.identifier.issn 2573-0770 es_ES
dc.identifier.uri http://ricaxcan.uaz.edu.mx/jspui/handle/20.500.11845/1894
dc.identifier.uri https://doi.org/10.48779/9ds7-t936
dc.description.abstract The aim of this paper is to present two new variations of the frame-level cost function for training a Deep neural network in order to achieve better word error rates in speech recognition. Minimization functions of a neural network are salient aspects to deal with when researchers are working on machine learning, and hence their improvement is a process of constant evolution. In the first proposed method, the conventional cross-entropy function can be mapped to a nonuniform loss function based on its corresponding extropy (a complementary dual function), enhancing the frames that have ambiguity in their belonging to specific senones (tied-triphone states in a hidden Markov model). The second proposition is a fusion of the proposed mapped cross-entropy and the boosted cross-entropy function, which emphasizes those frames with low target posterior probability. The developed approaches have been performed by using a personalized mid-vocabulary speaker-independent voice corpus. This dataset is employed for recognition of digit strings and personal name lists in Spanish from the northern central part of Mexico on a connected-words phone dialing task. A relative word error rate improvement of 12.3% and 10.7% is obtained with the two proposed approaches, respectively, regarding the conventional well-established crossentropy objective function. es_ES
dc.language.iso eng es_ES
dc.publisher IEEE es_ES
dc.relation.uri generalPublic es_ES
dc.rights Atribución-NoComercial-SinDerivadas 3.0 Estados Unidos de América *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/us/ *
dc.source Proc. of the IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC2017), at Ixtapa, Mexico, pp. 1-6, 2017. es_ES
dc.subject.classification INGENIERIA Y TECNOLOGIA [7] es_ES
dc.subject.other Speech recognition es_ES
dc.subject.other Deep neural network es_ES
dc.subject.other Deep Learning es_ES
dc.title Speech recognition using deep neural networks trained with non-uniform frame-level cost functions es_ES
dc.type info:eu-repo/semantics/conferencePaper es_ES


Ficheros en el ítem

El ítem tiene asociados los siguientes ficheros de licencia:

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Atribución-NoComercial-SinDerivadas 3.0 Estados Unidos de América Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución-NoComercial-SinDerivadas 3.0 Estados Unidos de América

Buscar en DSpace


Búsqueda avanzada

Listar

Mi cuenta

Estadísticas