Saved in:
| Main Author: | Skripko, Nikolai |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.18420 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models
by: Sun, Wangtao, et al.
Published: (2024)
by: Sun, Wangtao, et al.
Published: (2024)
LIFEBench: Evaluating Length Instruction Following in Large Language Models
by: Zhang, Wei, et al.
Published: (2025)
by: Zhang, Wei, et al.
Published: (2025)
RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models
by: Yan, Jianhao, et al.
Published: (2024)
by: Yan, Jianhao, et al.
Published: (2024)
Deconstructing Instruction-Following: A New Benchmark for Granular Evaluation of Large Language Model Instruction Compliance Abilities
by: Purpura, Alberto, et al.
Published: (2026)
by: Purpura, Alberto, et al.
Published: (2026)
LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models
by: Ren, Huimin, et al.
Published: (2025)
by: Ren, Huimin, et al.
Published: (2025)
InFoBench: Evaluating Instruction Following Ability in Large Language Models
by: Qin, Yiwei, et al.
Published: (2024)
by: Qin, Yiwei, et al.
Published: (2024)
KITE: A Benchmark for Evaluating Korean Instruction-Following Abilities in Large Language Models
by: Kim, Dongjun, et al.
Published: (2025)
by: Kim, Dongjun, et al.
Published: (2025)
LsrIF: Enhancing Logic-Structured Instruction Following of Large Language Models
by: Ren, Qingyu, et al.
Published: (2026)
by: Ren, Qingyu, et al.
Published: (2026)
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models
by: LI, Yizhi, et al.
Published: (2024)
by: LI, Yizhi, et al.
Published: (2024)
CarbonCall: Sustainability-Aware Function Calling for Large Language Models on Edge Devices
by: Paramanayakam, Varatheepan, et al.
Published: (2025)
by: Paramanayakam, Varatheepan, et al.
Published: (2025)
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks
by: Manduzio, Graziano A., et al.
Published: (2024)
by: Manduzio, Graziano A., et al.
Published: (2024)
HREF: Human Response-Guided Evaluation of Instruction Following in Language Models
by: Lyu, Xinxi, et al.
Published: (2024)
by: Lyu, Xinxi, et al.
Published: (2024)
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
by: Qi, Yunjia, et al.
Published: (2025)
by: Qi, Yunjia, et al.
Published: (2025)
The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models
by: Wu, Zihui, et al.
Published: (2024)
by: Wu, Zihui, et al.
Published: (2024)
Instruction Following by Principled Boosting Attention of Large Language Models
by: Guardieiro, Vitoria, et al.
Published: (2025)
by: Guardieiro, Vitoria, et al.
Published: (2025)
CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
by: Wang, Peiding, et al.
Published: (2025)
by: Wang, Peiding, et al.
Published: (2025)
Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions
by: Cassano, Federico, et al.
Published: (2023)
by: Cassano, Federico, et al.
Published: (2023)
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
by: Fu, Tingchen, et al.
Published: (2025)
by: Fu, Tingchen, et al.
Published: (2025)
Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models
by: Moon, Hyeonseok, et al.
Published: (2024)
by: Moon, Hyeonseok, et al.
Published: (2024)
Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates
by: Dang, Hy, et al.
Published: (2025)
by: Dang, Hy, et al.
Published: (2025)
MulDimIF: A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
by: Ye, Junjie, et al.
Published: (2025)
by: Ye, Junjie, et al.
Published: (2025)
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
by: Qi, Yunjia, et al.
Published: (2024)
by: Qi, Yunjia, et al.
Published: (2024)
RIFT: Reordered Instruction Following Testbed To Evaluate Instruction Following in Singular Multistep Prompt Structures
by: Jaffe, Andrew, et al.
Published: (2026)
by: Jaffe, Andrew, et al.
Published: (2026)
ABC-Eval: Benchmarking Large Language Models on Symbolic Music Understanding and Instruction Following
by: Zhao, Jiahao, et al.
Published: (2025)
by: Zhao, Jiahao, et al.
Published: (2025)
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
by: Sakai, Yusuke, et al.
Published: (2025)
by: Sakai, Yusuke, et al.
Published: (2025)
LARFT: Closing the Cognition-Action Gap for Length Instruction Following in Large Language Models
by: Zhang, Wei, et al.
Published: (2026)
by: Zhang, Wei, et al.
Published: (2026)
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
by: Li, Zekun, et al.
Published: (2024)
by: Li, Zekun, et al.
Published: (2024)
mind_call: A Dataset for Mental Health Function Calling with Large Language Models
by: Shafi, Fozle Rabbi, et al.
Published: (2026)
by: Shafi, Fozle Rabbi, et al.
Published: (2026)
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models
by: Sun, Haoran, et al.
Published: (2024)
by: Sun, Haoran, et al.
Published: (2024)
Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models
by: Qin, Yulei, et al.
Published: (2025)
by: Qin, Yulei, et al.
Published: (2025)
Revisiting the Reliability of Language Models in Instruction-Following
by: Dong, Jianshuo, et al.
Published: (2025)
by: Dong, Jianshuo, et al.
Published: (2025)
Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
by: Adlakha, Vaibhav, et al.
Published: (2023)
by: Adlakha, Vaibhav, et al.
Published: (2023)
Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering
by: Si, Shuzheng, et al.
Published: (2025)
by: Si, Shuzheng, et al.
Published: (2025)
M-IFEval: Multilingual Instruction-Following Evaluation
by: Dussolle, Antoine, et al.
Published: (2025)
by: Dussolle, Antoine, et al.
Published: (2025)
Can Language Models Follow Multiple Turns of Entangled Instructions?
by: Han, Chi, et al.
Published: (2025)
by: Han, Chi, et al.
Published: (2025)
Financial Instruction Following Evaluation (FIFE)
by: Matlin, Glenn, et al.
Published: (2025)
by: Matlin, Glenn, et al.
Published: (2025)
Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging
by: Lyu, Mengxian, et al.
Published: (2026)
by: Lyu, Mengxian, et al.
Published: (2026)
GenFollower: Enhancing Car-Following Prediction with Large Language Models
by: Chen, Xianda, et al.
Published: (2024)
by: Chen, Xianda, et al.
Published: (2024)
ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning
by: Kwon, Yongchan, et al.
Published: (2025)
by: Kwon, Yongchan, et al.
Published: (2025)
Enhancing LLM Instruction Following: An Evaluation-Driven Multi-Agentic Workflow for Prompt Instructions Optimization
by: Purpura, Alberto, et al.
Published: (2026)
by: Purpura, Alberto, et al.
Published: (2026)
Similar Items
-
Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models
by: Sun, Wangtao, et al.
Published: (2024) -
LIFEBench: Evaluating Length Instruction Following in Large Language Models
by: Zhang, Wei, et al.
Published: (2025) -
RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models
by: Yan, Jianhao, et al.
Published: (2024) -
Deconstructing Instruction-Following: A New Benchmark for Granular Evaluation of Large Language Model Instruction Compliance Abilities
by: Purpura, Alberto, et al.
Published: (2026) -
LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models
by: Ren, Huimin, et al.
Published: (2025)