Saved in:
Bibliographic Details
Main Authors: Zhang, Ming, Wang, Yuhui, Shen, Yujiong, Yang, Tingyi, Jiang, Changhao, Wu, Yilong, Dou, Shihan, Chen, Qinhao, Xi, Zhiheng, Zhang, Zhihao, Dong, Yi, Wang, Zhen, Fei, Zhihui, Wan, Mingyang, Liang, Tao, Ma, Guojun, Zhang, Qi, Gui, Tao, Huang, Xuanjing
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.06706
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913891437510656
author Zhang, Ming
Wang, Yuhui
Shen, Yujiong
Yang, Tingyi
Jiang, Changhao
Wu, Yilong
Dou, Shihan
Chen, Qinhao
Xi, Zhiheng
Zhang, Zhihao
Dong, Yi
Wang, Zhen
Fei, Zhihui
Wan, Mingyang
Liang, Tao
Ma, Guojun
Zhang, Qi
Gui, Tao
Huang, Xuanjing
author_facet Zhang, Ming
Wang, Yuhui
Shen, Yujiong
Yang, Tingyi
Jiang, Changhao
Wu, Yilong
Dou, Shihan
Chen, Qinhao
Xi, Zhiheng
Zhang, Zhihao
Dong, Yi
Wang, Zhen
Fei, Zhihui
Wan, Mingyang
Liang, Tao
Ma, Guojun
Zhang, Qi
Gui, Tao
Huang, Xuanjing
contents Process-driven dialogue systems, which operate under strict predefined process constraints, are essential in customer service and equipment maintenance scenarios. Although Large Language Models (LLMs) have shown remarkable progress in dialogue and reasoning, they still struggle to solve these strictly constrained dialogue tasks. To address this challenge, we construct Process Flow Dialogue (PFDial) dataset, which contains 12,705 high-quality Chinese dialogue instructions derived from 440 flowcharts containing 5,055 process nodes. Based on PlantUML specification, each UML flowchart is converted into atomic dialogue units i.e., structured five-tuples. Experimental results demonstrate that a 7B model trained with merely 800 samples, and a 0.5B model trained on total data both can surpass 90% accuracy. Additionally, the 8B model can surpass GPT-4o up to 43.88% with an average of 11.00%. We further evaluate models' performance on challenging backward transitions in process flows and conduct an in-depth analysis of various dataset formats to reveal their impact on model performance in handling decision and sequential branches. The data is released in https://github.com/KongLongGeFDU/PFDial.
format Preprint
id arxiv_https___arxiv_org_abs_2503_06706
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
Zhang, Ming
Wang, Yuhui
Shen, Yujiong
Yang, Tingyi
Jiang, Changhao
Wu, Yilong
Dou, Shihan
Chen, Qinhao
Xi, Zhiheng
Zhang, Zhihao
Dong, Yi
Wang, Zhen
Fei, Zhihui
Wan, Mingyang
Liang, Tao
Ma, Guojun
Zhang, Qi
Gui, Tao
Huang, Xuanjing
Computation and Language
Artificial Intelligence
Machine Learning
Process-driven dialogue systems, which operate under strict predefined process constraints, are essential in customer service and equipment maintenance scenarios. Although Large Language Models (LLMs) have shown remarkable progress in dialogue and reasoning, they still struggle to solve these strictly constrained dialogue tasks. To address this challenge, we construct Process Flow Dialogue (PFDial) dataset, which contains 12,705 high-quality Chinese dialogue instructions derived from 440 flowcharts containing 5,055 process nodes. Based on PlantUML specification, each UML flowchart is converted into atomic dialogue units i.e., structured five-tuples. Experimental results demonstrate that a 7B model trained with merely 800 samples, and a 0.5B model trained on total data both can surpass 90% accuracy. Additionally, the 8B model can surpass GPT-4o up to 43.88% with an average of 11.00%. We further evaluate models' performance on challenging backward transitions in process flows and conduct an in-depth analysis of various dataset formats to reveal their impact on model performance in handling decision and sequential branches. The data is released in https://github.com/KongLongGeFDU/PFDial.
title PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
topic Computation and Language
Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2503.06706