Voice and Speech Recognition Application in Emotion Detection: A Utility for Future Trends

Voice and Speech Recognition Application in Emotion Detection: A Utility for Future Trends

Tushar Anand, Sarthak Panwar, Shubham Kumar Sharma, Rohit Rastogi, Mayank Gupta
Copyright: © 2024 |Pages: 27
DOI: 10.4018/979-8-3693-1082-3.ch013
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Emotion detection from voice signals is needed for human-computer interaction (HCI), which is a difficult challenge. In the literature on speech emotion recognition, various well known speech analysis and classification methods have been used to extract emotions from signals. Deep learning strategies have recently been proposed as a workable alternative to conventional methods and discuss several recent studies have employed these methods to identify speech-based emotions. The review examines the databases used, the emotions collected, and the contributions to speech emotion recognition. The speech emotion recognition project was created by the research team, which recognizes human speech emotions. The research team developed this project using Python 3.6. The RAVDEESS dataset was also used since it contained eight distinct emotions expressed by all speakers. The RAVDESS dataset, Python programming languages, and Pycharm as an IDE were all used by the author team.
Chapter Preview
Top

Ethical Committee And Funding

The experiments don't include any human-related experiments and so no ethical constraints have been violated. Though the subjects performing the study were humans and air quality directly affects them but the study doesn’t violate any health-related measures. The Project is not funded by any agency.

Key Terms in this Chapter

Perceptual Linear Prediction Cepstral Coefficients: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. PLP analysis is computationally efficient and yields a low-dimensional representation of speech.

Neural Networks: A neural network is a series of algorithms that endeavours to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.

Mel Spectrogram: It logarithmically renders frequencies above a certain threshold (the corner frequency). For example, in the linearly scaled spectrogram, the vertical space between 1,000 and 2,000Hz is half of the vertical space between 2,000Hz and4,000Hz.

DBNs: In machine learning, a deep belief network is a generative graphical model, or alternatively a class of deep neural network, composed of multiple layers of latent variables, with connections between the layers but not between units within each layer.

SAVEE: It is an emotion recognition dataset. It consists of recordings from 4 male actors in 7 different emotions, 480 British English utterances in total. The sentences were chosen from the standard TIMIT corpus and phonetically-balanced for each emotion.

Multilayer Perceptron: A multilayer perceptron (MLP) is a feedforward artificial neural network that generates a set of outputs from a set of inputs. An MLP is characterized by several layers of input nodes connected as a directed graph between the input and output layers.

CNNs and RNNs: The ability to process temporal information data that comes in sequences, such as a sentence. Recurrent neural networks are designed for this very purpose, while convolutional neural networks are incapable of effectively interpreting temporal information.

Complete Chapter List

Search this Book:
Reset