GraSP：一種統一的基於圖的框架，用於可擴展生成、質量標記和管理合成數據，用於SFT和DPO

2508.15432v1

中文标题#

GraSP：一種統一的基於圖的框架，用於可擴展生成、質量標記和管理合成數據，用於 SFT 和 DPO

英文标题#

GraSP: A Unified Graph-Based Framework for Scalable Generation, Quality Tagging, and Management of Synthetic Data for SFT and DPO

中文摘要#

大型語言模型（LLMs）的進步在很大程度上依賴於用於監督微調（SFT）、對齊任務如直接偏好優化（DPO）等的高質量數據集的可用性。在本工作中，我們提出了一種全面的合成數據生成框架，該框架能夠實現可擴展、可配置且高保真的合成數據生成，專門針對這些訓練範式。我們的方法採用模塊化和基於配置的流程，能夠在最少人工干預的情況下建模複雜的對話流程。該框架使用雙階段質量標記機制，結合啟發式規則和基於 LLM 的評估，自動過濾和評分從 OASST 格式對話中提取的數據，確保高質量對話樣本的整理。生成的數據集在支持 SFT 和 DPO 用例的靈活模式下進行結構化，可無縫集成到各種訓練工作流中。這些創新共同提供了一個強大的解決方案，用於大規模生成和管理合成對話數據，顯著降低了 LLM 訓練流程中的數據準備開銷。

英文摘要#

The advancement of large language models (LLMs) is critically dependent on the availability of high-quality datasets for Supervised Fine-Tuning (SFT), alignment tasks like Direct Preference Optimization (DPO), etc. In this work, we present a comprehensive synthetic data generation framework that facilitates scalable, configurable, and high-fidelity generation of synthetic data tailored for these training paradigms. Our approach employs a modular and configuration-based pipeline capable of modeling complex dialogue flows with minimal manual intervention. This framework uses a dual-stage quality tagging mechanism, combining heuristic rules and LLM-based evaluations, to automatically filter and score data extracted from OASST-formatted conversations, ensuring the curation of high-quality dialogue samples. The resulting datasets are structured under a flexible schema supporting both SFT and DPO use cases, enabling seamless integration into diverse training workflows. Together, these innovations offer a robust solution for generating and managing synthetic conversational data at scale, significantly reducing the overhead of data preparation in LLM training pipelines.

文章页面#

GraSP：一種統一的基於圖的框架，用於可擴展生成、質量標記和管理合成數據，用於 SFT 和 DPO

PDF 获取#

查看中文 PDF - 2508.15432v1

智能達人抖店二維碼

抖音掃碼查看更多精彩內容