The Effects of Voice Features in Voice Assistant Systems on Human Perceived Emotion
Kurunthachalam, Nishanth Prabhu
2024-12-21
Abstract
In the contemporary landscape of rapidly advancing artificial intelligence, Voice Assistant Systems (VAS) have become ubiquitous, permeating various facets of daily life including domestic, vehicular, and mobile environments. These systems predominantly rely on auditory interaction, disseminating information through synthesized speech. While human communication often conveys emotions through non-verbal cues such as facial expressions, gestures, and modulations in voice tone, volume, and speed, the synthesized voices employed in VAS typically lack these emotional nuances, thereby limiting their capacity to transmit non-verbal information. This disparity presents a significant challenge in enhancing the naturality and emotional engagement of VAS interactions. Despite the proliferation of VAS, there remains a paucity of research examining how specific voice features—including pitch, speed, intonation, and voice gender—influence users' emotional perceptions. The primary objective of this study, therefore, is to investigate the relationship between these various voice features, user perceptions, and preferences. In this study, the Kansei engineering methodology was employed, a research approach that elucidates the correlation between a product's physical characteristics and the psychological and emotional responses it elicits. The study was conducted in two principal phases. The initial phase focused on identifying key emotional descriptors utilized by individuals to articulate their responses to synthesized speech. Thirty-eight six semantic pairs, representing diverse emotional states, were incorporated into a questionnaire designed to assess participants' emotional reactions to sample voice outputs. An online survey yielded 608 valid responses, which were subsequently analyzed using factor analysis. This analysis revealed four major emotional dimensions commonly perceived by users when exposed to synthesized voices: happy/sad, kind/rude, confident/nervous, and trusting/deceiving. The second phase of the study concentrated on analyzing the impact of specific voice features on emotional perception. Thirty-six voice prototypes were developed by manipulating four key features: pitch (low, medium, high), speed (slow, medium, fast), intonation (with and without), and gender (male and female). An online experimental study utilizing these prototypes garnered 32 valid responses. Participants evaluated each voice prototype using the semantic pairs identified in the first phase and indicated their preferences. Analysis of Variance (ANOVA) tests revealed significant effects of voice features on listeners' emotional perceptions. Certain combinations of voice features were found to have notable influences on emotional perception. For instance, lower pitch combined with faster speed was more likely to be perceived as rude, while higher pitch and slower speed were associated with kindness. The findings of this study confirm the critical role of voice features in shaping the emotional experience of users interacting with VAS. These features significantly impact how users perceive and engage with these systems. The implications of this research suggest that incorporating emotional expression into synthesized speech could potentially enhance the meaningfulness and satisfaction of user experiences with VAS, with applications spanning personal assistants, customer service bots, and beyond. This study contributes to the growing body of knowledge on human-computer interaction and provides insights for the design and development of more emotionally intelligent voice assistant systems.Deep Blue DOI
Subjects
Voice Assistant Systems Emotional Perception Kansei Engineering Human-computer Interaction Artificial Emotional Intelligence Affective Computing
Types
Thesis
Metadata
Show full item recordRemediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.