:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nie, Fan, Wang, Junlin, Hua, Harper, Bianchi, Federico, Kwon, Yongchan, Qi, Zhenting, Queen, Owen, Zhu, Shang, Zou, James
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.16344
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Automated Benchmark Auditing for AI Agents and Large Language Models
by: Wang, Junlin, et al.
Published: (2026)

ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning
by: Kwon, Yongchan, et al.
Published: (2025)

Exploring the use of AI authors and reviewers at Agents4Science
by: Bianchi, Federico, et al.
Published: (2025)

To Err Is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis
by: Bianchi, Federico, et al.
Published: (2025)

What LLMs Think When You Don't Tell Them What to Think About?
by: Kwon, Yongchan, et al.
Published: (2026)

Voice "Cloning" is Style Transfer
by: Zhou, Kaitlyn, et al.
Published: (2026)

DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models
by: Kwon, Yongchan, et al.
Published: (2023)

Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits
by: Wang, Jiachen T., et al.
Published: (2024)

2D-OOB: Attributing Data Contribution Through Joint Valuation Framework
by: Sun, Yifan, et al.
Published: (2024)

ReasonOps: Operator Segmentation for LLM Reasoning Traces
by: Lee, Daniel, et al.
Published: (2026)

Proper Dataset Valuation by Pointwise Mutual Information
by: Zheng, Shuran, et al.
Published: (2024)

CGBench: Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research
by: Queen, Owen, et al.
Published: (2025)

Distributionally Robust Instrumental Variables Estimation
by: Qu, Zhaonan, et al.
Published: (2024)

EvoLM: In Search of Lost Language Model Training Dynamics
by: Qi, Zhenting, et al.
Published: (2025)

Certified Data Removal Under High-dimensional Settings
by: Zou, Haolin, et al.
Published: (2025)

TimeInf: Time Series Data Contribution via Influence Functions
by: Zhang, Yizi, et al.
Published: (2024)

Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content
by: Bianchi, Federico, et al.
Published: (2024)

Newfluence: Boosting Model interpretability and Understanding in High Dimensions
by: Zou, Haolin, et al.
Published: (2025)

Group Shapley Value and Counterfactual Simulations in a Structural Model
by: Kwon, Yongchan, et al.
Published: (2024)

Understanding Impact of Human Feedback via Influence Functions
by: Min, Taywon, et al.
Published: (2025)

When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework
by: Xu, Zhen, et al.
Published: (2025)

A Business Education Program for Training Library Technicians.
by: McQueen, Harriett
Published: (1981)

Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors
by: Nie, Fan, et al.
Published: (2025)

ContextualLVLM-Agent: A Holistic Framework for Multi-Turn Visually-Grounded Dialogue and Complex Instruction Following
by: Han, Seungmin, et al.
Published: (2025)

Mixture-of-Agents Enhances Large Language Model Capabilities
by: Wang, Junlin, et al.
Published: (2024)

Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
by: Kapoor, Sayash, et al.
Published: (2025)

"Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most
by: Zhou, Kaitlyn, et al.
Published: (2026)

ADO: Automatic Data Optimization for Inputs in LLM Prompts
by: Lin, Sam, et al.
Published: (2025)

Holistic Evaluation and Failure Diagnosis of AI Agents
by: Madvil, Netta, et al.
Published: (2026)

Temperature dependence of energy transport in the $\mathbb{Z}_3$ chiral clock model
by: Yoo, Yongchan, et al.
Published: (2023)

TapeAgents: a Holistic Framework for Agent Development and Optimization
by: Bahdanau, Dzmitry, et al.
Published: (2024)

AutoGenesisAgent: Self-Generating Multi-Agent Systems for Complex Tasks
by: Harper, Jeremy
Published: (2024)

AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
by: Chen, Yinfang, et al.
Published: (2025)

EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
by: Min, Rui, et al.
Published: (2025)

Evaluating A/B Testing Methodologies via Sample Splitting: Theory and Practice
by: Kessler, Ryan, et al.
Published: (2025)

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents
by: Ying, Zonghao, et al.
Published: (2025)

Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
by: Suzgun, Mirac, et al.
Published: (2025)

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
by: Qi, Zhenting, et al.
Published: (2024)

Toward Emergent Holism: A Mutually Constitutive Account for Systems Science and Holistic Philosophy
by: Qiang Fu, et al.
Published: (2026)

Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems
by: Sun, Zhaoyan, et al.
Published: (2025)