:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Skripko, Nikolai
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.18420
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models
by: Sun, Wangtao, et al.
Published: (2024)

LIFEBench: Evaluating Length Instruction Following in Large Language Models
by: Zhang, Wei, et al.
Published: (2025)

RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models
by: Yan, Jianhao, et al.
Published: (2024)

Deconstructing Instruction-Following: A New Benchmark for Granular Evaluation of Large Language Model Instruction Compliance Abilities
by: Purpura, Alberto, et al.
Published: (2026)

LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models
by: Ren, Huimin, et al.
Published: (2025)

InFoBench: Evaluating Instruction Following Ability in Large Language Models
by: Qin, Yiwei, et al.
Published: (2024)

KITE: A Benchmark for Evaluating Korean Instruction-Following Abilities in Large Language Models
by: Kim, Dongjun, et al.
Published: (2025)

LsrIF: Enhancing Logic-Structured Instruction Following of Large Language Models
by: Ren, Qingyu, et al.
Published: (2026)

CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models
by: LI, Yizhi, et al.
Published: (2024)

CarbonCall: Sustainability-Aware Function Calling for Large Language Models on Edge Devices
by: Paramanayakam, Varatheepan, et al.
Published: (2025)

Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks
by: Manduzio, Graziano A., et al.
Published: (2024)

HREF: Human Response-Guided Evaluation of Instruction Following in Language Models
by: Lyu, Xinxi, et al.
Published: (2024)

AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
by: Qi, Yunjia, et al.
Published: (2025)

The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models
by: Wu, Zihui, et al.
Published: (2024)

Instruction Following by Principled Boosting Attention of Large Language Models
by: Guardieiro, Vitoria, et al.
Published: (2025)

CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
by: Wang, Peiding, et al.
Published: (2025)

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions
by: Cassano, Federico, et al.
Published: (2023)

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
by: Fu, Tingchen, et al.
Published: (2025)

Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models
by: Moon, Hyeonseok, et al.
Published: (2024)

Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates
by: Dang, Hy, et al.
Published: (2025)

MulDimIF: A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
by: Ye, Junjie, et al.
Published: (2025)

Constraint Back-translation Improves Complex Instruction Following of Large Language Models
by: Qi, Yunjia, et al.
Published: (2024)

RIFT: Reordered Instruction Following Testbed To Evaluate Instruction Following in Singular Multistep Prompt Structures
by: Jaffe, Andrew, et al.
Published: (2026)

ABC-Eval: Benchmarking Large Language Models on Symbolic Music Understanding and Instruction Following
by: Zhao, Jiahao, et al.
Published: (2025)

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
by: Sakai, Yusuke, et al.
Published: (2025)

LARFT: Closing the Cognition-Action Gap for Length Instruction Following in Large Language Models
by: Zhang, Wei, et al.
Published: (2026)

Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
by: Li, Zekun, et al.
Published: (2024)

mind_call: A Dataset for Mental Health Function Calling with Large Language Models
by: Shafi, Fozle Rabbi, et al.
Published: (2026)

Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models
by: Sun, Haoran, et al.
Published: (2024)

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models
by: Qin, Yulei, et al.
Published: (2025)

Revisiting the Reliability of Language Models in Instruction-Following
by: Dong, Jianshuo, et al.
Published: (2025)

Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
by: Adlakha, Vaibhav, et al.
Published: (2023)

Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering
by: Si, Shuzheng, et al.
Published: (2025)

M-IFEval: Multilingual Instruction-Following Evaluation
by: Dussolle, Antoine, et al.
Published: (2025)

Can Language Models Follow Multiple Turns of Entangled Instructions?
by: Han, Chi, et al.
Published: (2025)

Financial Instruction Following Evaluation (FIFE)
by: Matlin, Glenn, et al.
Published: (2025)

Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging
by: Lyu, Mengxian, et al.
Published: (2026)

GenFollower: Enhancing Car-Following Prediction with Large Language Models
by: Chen, Xianda, et al.
Published: (2024)

ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning
by: Kwon, Yongchan, et al.
Published: (2025)

Enhancing LLM Instruction Following: An Evaluation-Driven Multi-Agentic Workflow for Prompt Instructions Optimization
by: Purpura, Alberto, et al.
Published: (2026)