:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Purpura, Alberto, Wang, Li, Badyal, Sahil, Beaufrand, Eugenio, Faulkner, Adam
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.03359
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Deconstructing Instruction-Following: A New Benchmark for Granular Evaluation of Large Language Model Instruction Compliance Abilities
by: Purpura, Alberto, et al.
Published: (2026)

A Multi-Stage Workflow for the Review of Marketing Content with Reasoning Large Language Models
by: Purpura, Alberto, et al.
Published: (2025)

RIFT: Reordered Instruction Following Testbed To Evaluate Instruction Following in Singular Multistep Prompt Structures
by: Jaffe, Andrew, et al.
Published: (2026)

EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions
by: Wu, Xiaorui, et al.
Published: (2025)

AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
by: Qi, Yunjia, et al.
Published: (2025)

DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following
by: Basta, Nardine, et al.
Published: (2026)

Self-Review Framework for Enhancing Instruction Following Capability of LLM
by: Park, Sihyun
Published: (2025)

Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants
by: Yang, Jiuding, et al.
Published: (2024)

MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following
by: Lou, Renze, et al.
Published: (2023)

LIFEBench: Evaluating Length Instruction Following in Large Language Models
by: Zhang, Wei, et al.
Published: (2025)

Financial Instruction Following Evaluation (FIFE)
by: Matlin, Glenn, et al.
Published: (2025)

M-IFEval: Multilingual Instruction-Following Evaluation
by: Dussolle, Antoine, et al.
Published: (2025)

Instructional Prompt Optimization for Few-Shot LLM-Based Recommendations on Cold-Start Users
by: Yang, Haowei, et al.
Published: (2025)

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
by: Wang, Yidong, et al.
Published: (2023)

Adaptive Instruction Composition for Automated LLM Red-Teaming
by: Zymet, Jesse, et al.
Published: (2026)

RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following
by: Pan, Tianjun, et al.
Published: (2026)

Boosting Instruction Following at Scale
by: Elder, Ben, et al.
Published: (2025)

The Instruction Gap: LLMs get lost in Following Instruction
by: Tripathi, Vishesh, et al.
Published: (2025)

LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess
by: Kolasani, Sai, et al.
Published: (2025)

MaXIFE: Multilingual and Cross-lingual Instruction Following Evaluation
by: Liu, Yile, et al.
Published: (2025)

Multi-Level Compositional Reasoning for Interactive Instruction Following
by: Bhambri, Suvaansh, et al.
Published: (2023)

Instruction-Following Evaluation in Function Calling for Large Language Models
by: Skripko, Nikolai
Published: (2025)

ReIFE: Re-evaluating Instruction-Following Evaluation
by: Liu, Yixin, et al.
Published: (2024)

Embodied Instruction Following in Unknown Environments
by: Wu, Zhenyu, et al.
Published: (2024)

OctoBench: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding
by: Ding, Deming, et al.
Published: (2026)

On the Multi-turn Instruction Following for Conversational Web Agents
by: Deng, Yang, et al.
Published: (2024)

Situated Instruction Following
by: Min, So Yeon, et al.
Published: (2024)

Neuro-Symbolic Verification on Instruction Following of LLMs
by: Su, Yiming, et al.
Published: (2026)

HREF: Human Response-Guided Evaluation of Instruction Following in Language Models
by: Lyu, Xinxi, et al.
Published: (2024)

Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models
by: Sun, Wangtao, et al.
Published: (2024)

Procedural Knowledge Improves Agentic LLM Workflows
by: Hsiao, Vincent, et al.
Published: (2025)

LsrIF: Enhancing Logic-Structured Instruction Following of Large Language Models
by: Ren, Qingyu, et al.
Published: (2026)

RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data
by: Guo, Zhengkang, et al.
Published: (2025)

Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
by: Adlakha, Vaibhav, et al.
Published: (2023)

Prompt Codebooks: Discrete Compositional Optimization for Language Model Instruction Refinement
by: Nath, Jyotirmoy, et al.
Published: (2026)

Agentic Policy Optimization via Instruction-Policy Co-Evolution
by: Zhou, Han, et al.
Published: (2025)

VerIF: Verification Engineering for Reinforcement Learning in Instruction Following
by: Peng, Hao, et al.
Published: (2025)

Instruction Following by Principled Boosting Attention of Large Language Models
by: Guardieiro, Vitoria, et al.
Published: (2025)

LLM Based Bayesian Optimization for Prompt Search
by: Ballew, Adam, et al.
Published: (2025)

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
by: Fu, Tingchen, et al.
Published: (2025)