zikele

zikele

人生如此自可乐

傳音多語言語音識別系統用於MLC-SLM 2025挑戰賽

2508.14916v1

中文标题#

傳音多語言語音識別系統用於 MLC-SLM 2025 挑戰賽

英文标题#

Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge

中文摘要#

本文介紹了由傳音語音團隊為 MLC-SLM 2025 挑戰賽的 Track 1 開發的新型多語言自動語音識別(ASR)系統的架構和性能。該系統包含三個關鍵組件:1)基於凍結的 Whisper-large-v3 的語音編碼器,利用大規模預訓練確保穩健的聲學特徵提取;2)使用 Linear-ReLU-Linear 變換機制的可訓練適配模塊,以有效對齊語音和文本表示;以及 3)與可訓練 LoRA 集成的凍結 Qwen2.5-7B-Instruct 大語言模型(LLM),用於優化上下文語言解碼。通過系統地結合預訓練模型與任務特定微調,該系統在評估集的 11 種語言中實現了 9.83% 的詞 / 字符錯誤率(WER/CER),並在全球參與者中排名第三。

英文摘要#

This paper presents the architecture and performance of a novel Multilingual Automatic Speech Recognition (ASR) system developed by the Transsion Speech Team for Track 1 of the MLC-SLM 2025 Challenge. The proposed system comprises three key components: 1) a frozen Whisper-large-v3 based speech encoder, leveraging large-scale pretraining to ensure robust acoustic feature extraction; 2) a trainable adaptor module using Linear-ReLU-Linear transformation mechanisms to effectively align speech and text representations; and 3) a frozen Qwen2.5-7B-Instruct large language model (LLM) integrated with trainable LoRA for optimized contextual linguistic decoding. By systematically combining pretrained models with task specific fine-tuning, the system achieved a word/character error rate (WER/CER) of 9.83% across 11 languages in the evaluation set and ranked third place among global participants.

文章页面#

傳音多語言語音識別系統用於 MLC-SLM 2025 挑戰賽

PDF 获取#

查看中文 PDF - 2508.14916v1

智能達人抖店二維碼

抖音掃碼查看更多精彩內容

載入中......
此文章數據所有權由區塊鏈加密技術和智能合約保障僅歸創作者所有。