ECHO：面向頻率的變長信號分層編碼

2508.14689v1

中文标题#

ECHO：面向頻率的變長信號分層編碼

英文标题#

ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signal

中文摘要#

預訓練基礎模型在視覺和語言領域表現出色，但它們在通用機器信號建模方面的潛力 —— 涵蓋聲學、振動和其他工業傳感器數據 —— 仍未得到充分探索。現有的基於子帶編碼器的方法已取得有競爭力的結果，但受限於固定的輸入長度，以及缺乏顯式的頻率位置編碼。在本工作中，我們提出了一種新穎的基礎模型，該模型結合了先進的帶分割架構與相對頻率位置嵌入，能夠在任意採樣配置下實現精確的頻譜定位。該模型支持任意長度的輸入，無需填充或分段，生成的嵌入表示保留了時間和頻譜保真度。我們在 SIREN（https://github.com/yucongzh/SIREN）上評估了我們的方法，這是一個新提出的用於機器信號編碼的大規模基準，它統一了多個數據集，包括所有 DCASE 任務 2 挑戰（2020-2025）和廣泛使用的工業信號語料庫。實驗結果表明，在異常檢測和故障識別方面，我們的方法始終表現出最先進的性能，證實了所提出模型的有效性和泛化能力。我們在 https://github.com/yucongzh/ECHO 上開源了 ECHO。

英文摘要#

Pre-trained foundation models have demonstrated remarkable success in vision and language, yet their potential for general machine signal modeling-covering acoustic, vibration, and other industrial sensor data-remains under-explored. Existing approach using sub-band-based encoders has achieved competitive results but are limited by fixed input lengths, and the absence of explicit frequency positional encoding. In this work, we propose a novel foundation model that integrates an advanced band-split architecture with relative frequency positional embeddings, enabling precise spectral localization across arbitrary sampling configurations. The model supports inputs of arbitrary length without padding or segmentation, producing a concise embedding that retains both temporal and spectral fidelity. We evaluate our method on SIREN (https://github.com/yucongzh/SIREN), a newly introduced large-scale benchmark for machine signal encoding that unifies multiple datasets, including all DCASE task 2 challenges (2020-2025) and widely-used industrial signal corpora. Experimental results demonstrate consistent state-of-the-art performance in anomaly detection and fault identification, confirming the effectiveness and generalization capability of the proposed model. We open-sourced ECHO on https://github.com/yucongzh/ECHO.

文章页面#

ECHO：面向頻率的變長信號分層編碼

PDF 获取#

查看中文 PDF - 2508.14689v1

智能達人抖店二維碼

抖音掃碼查看更多精彩內容