基於人類反饋的動態語音情感識別

2508.14920v1

中文标题#

基於人類反饋的動態語音情感識別

英文標題#

Human Feedback Driven Dynamic Speech Emotion Recognition

中文摘要#

這項工作旨在探索動態語音情感識別的新領域。與傳統方法不同，我們假設每個音頻軌道與不同时刻活躍的一系列情感相關聯。該研究特別關注情感三維化身的動畫效果。我們提出了一種多階段方法，包括經典語音情感識別模型的訓練、情感序列的合成生成以及基於人類反饋的進一步模型改進。此外，我們引入了一種基於狄利克雷分佈的情感混合建模新方法。這些模型是根據從三維面部動畫數據集中提取的真實情感進行評估的。我們將我們的模型與滑動窗口方法進行比較。我們的實驗結果表明，基於狄利克雷的方法在建模情感混合方面是有效的。結合人類反饋進一步提高了模型質量，同時提供了一個簡化的標註過程。

英文摘要#

This work proposes to explore a new area of dynamic speech emotion recognition. Unlike traditional methods, we assume that each audio track is associated with a sequence of emotions active at different moments in time. The study particularly focuses on the animation of emotional 3D avatars. We propose a multi-stage method that includes the training of a classical speech emotion recognition model, synthetic generation of emotional sequences, and further model improvement based on human feedback. Additionally, we introduce a novel approach to modeling emotional mixtures based on the Dirichlet distribution. The models are evaluated based on ground-truth emotions extracted from a dataset of 3D facial animations. We compare our models against the sliding window approach. Our experimental results show the effectiveness of Dirichlet-based approach in modeling emotional mixtures. Incorporating human feedback further improves the model quality while providing a simplified annotation procedure.

文章頁面#

基於人類反饋的動態語音情感識別

PDF 獲取#

查看中文 PDF - 2508.14920v1

智能達人抖店二維碼

抖音掃碼查看更多精彩內容