Saved in:
| Main Authors: | Nie, Fan, Wang, Junlin, Hua, Harper, Bianchi, Federico, Kwon, Yongchan, Qi, Zhenting, Queen, Owen, Zhu, Shang, Zou, James |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.16344 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automated Benchmark Auditing for AI Agents and Large Language Models
by: Wang, Junlin, et al.
Published: (2026)
by: Wang, Junlin, et al.
Published: (2026)
ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning
by: Kwon, Yongchan, et al.
Published: (2025)
by: Kwon, Yongchan, et al.
Published: (2025)
Exploring the use of AI authors and reviewers at Agents4Science
by: Bianchi, Federico, et al.
Published: (2025)
by: Bianchi, Federico, et al.
Published: (2025)
To Err Is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis
by: Bianchi, Federico, et al.
Published: (2025)
by: Bianchi, Federico, et al.
Published: (2025)
What LLMs Think When You Don't Tell Them What to Think About?
by: Kwon, Yongchan, et al.
Published: (2026)
by: Kwon, Yongchan, et al.
Published: (2026)
Voice "Cloning" is Style Transfer
by: Zhou, Kaitlyn, et al.
Published: (2026)
by: Zhou, Kaitlyn, et al.
Published: (2026)
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models
by: Kwon, Yongchan, et al.
Published: (2023)
by: Kwon, Yongchan, et al.
Published: (2023)
Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits
by: Wang, Jiachen T., et al.
Published: (2024)
by: Wang, Jiachen T., et al.
Published: (2024)
2D-OOB: Attributing Data Contribution Through Joint Valuation Framework
by: Sun, Yifan, et al.
Published: (2024)
by: Sun, Yifan, et al.
Published: (2024)
ReasonOps: Operator Segmentation for LLM Reasoning Traces
by: Lee, Daniel, et al.
Published: (2026)
by: Lee, Daniel, et al.
Published: (2026)
Proper Dataset Valuation by Pointwise Mutual Information
by: Zheng, Shuran, et al.
Published: (2024)
by: Zheng, Shuran, et al.
Published: (2024)
CGBench: Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research
by: Queen, Owen, et al.
Published: (2025)
by: Queen, Owen, et al.
Published: (2025)
Distributionally Robust Instrumental Variables Estimation
by: Qu, Zhaonan, et al.
Published: (2024)
by: Qu, Zhaonan, et al.
Published: (2024)
EvoLM: In Search of Lost Language Model Training Dynamics
by: Qi, Zhenting, et al.
Published: (2025)
by: Qi, Zhenting, et al.
Published: (2025)
Certified Data Removal Under High-dimensional Settings
by: Zou, Haolin, et al.
Published: (2025)
by: Zou, Haolin, et al.
Published: (2025)
TimeInf: Time Series Data Contribution via Influence Functions
by: Zhang, Yizi, et al.
Published: (2024)
by: Zhang, Yizi, et al.
Published: (2024)
Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content
by: Bianchi, Federico, et al.
Published: (2024)
by: Bianchi, Federico, et al.
Published: (2024)
Newfluence: Boosting Model interpretability and Understanding in High Dimensions
by: Zou, Haolin, et al.
Published: (2025)
by: Zou, Haolin, et al.
Published: (2025)
Group Shapley Value and Counterfactual Simulations in a Structural Model
by: Kwon, Yongchan, et al.
Published: (2024)
by: Kwon, Yongchan, et al.
Published: (2024)
Understanding Impact of Human Feedback via Influence Functions
by: Min, Taywon, et al.
Published: (2025)
by: Min, Taywon, et al.
Published: (2025)
When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework
by: Xu, Zhen, et al.
Published: (2025)
by: Xu, Zhen, et al.
Published: (2025)
A Business Education Program for Training Library Technicians.
by: McQueen, Harriett
Published: (1981)
by: McQueen, Harriett
Published: (1981)
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors
by: Nie, Fan, et al.
Published: (2025)
by: Nie, Fan, et al.
Published: (2025)
ContextualLVLM-Agent: A Holistic Framework for Multi-Turn Visually-Grounded Dialogue and Complex Instruction Following
by: Han, Seungmin, et al.
Published: (2025)
by: Han, Seungmin, et al.
Published: (2025)
Mixture-of-Agents Enhances Large Language Model Capabilities
by: Wang, Junlin, et al.
Published: (2024)
by: Wang, Junlin, et al.
Published: (2024)
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
by: Kapoor, Sayash, et al.
Published: (2025)
by: Kapoor, Sayash, et al.
Published: (2025)
"Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most
by: Zhou, Kaitlyn, et al.
Published: (2026)
by: Zhou, Kaitlyn, et al.
Published: (2026)
ADO: Automatic Data Optimization for Inputs in LLM Prompts
by: Lin, Sam, et al.
Published: (2025)
by: Lin, Sam, et al.
Published: (2025)
Holistic Evaluation and Failure Diagnosis of AI Agents
by: Madvil, Netta, et al.
Published: (2026)
by: Madvil, Netta, et al.
Published: (2026)
Temperature dependence of energy transport in the $\mathbb{Z}_3$ chiral clock model
by: Yoo, Yongchan, et al.
Published: (2023)
by: Yoo, Yongchan, et al.
Published: (2023)
TapeAgents: a Holistic Framework for Agent Development and Optimization
by: Bahdanau, Dzmitry, et al.
Published: (2024)
by: Bahdanau, Dzmitry, et al.
Published: (2024)
AutoGenesisAgent: Self-Generating Multi-Agent Systems for Complex Tasks
by: Harper, Jeremy
Published: (2024)
by: Harper, Jeremy
Published: (2024)
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
by: Chen, Yinfang, et al.
Published: (2025)
by: Chen, Yinfang, et al.
Published: (2025)
EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
by: Min, Rui, et al.
Published: (2025)
by: Min, Rui, et al.
Published: (2025)
Evaluating A/B Testing Methodologies via Sample Splitting: Theory and Practice
by: Kessler, Ryan, et al.
Published: (2025)
by: Kessler, Ryan, et al.
Published: (2025)
SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents
by: Ying, Zonghao, et al.
Published: (2025)
by: Ying, Zonghao, et al.
Published: (2025)
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
by: Suzgun, Mirac, et al.
Published: (2025)
by: Suzgun, Mirac, et al.
Published: (2025)
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
by: Qi, Zhenting, et al.
Published: (2024)
by: Qi, Zhenting, et al.
Published: (2024)
Toward Emergent Holism: A Mutually Constitutive Account for Systems Science and Holistic Philosophy
by: Qiang Fu, et al.
Published: (2026)
by: Qiang Fu, et al.
Published: (2026)
Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems
by: Sun, Zhaoyan, et al.
Published: (2025)
by: Sun, Zhaoyan, et al.
Published: (2025)
Similar Items
-
Automated Benchmark Auditing for AI Agents and Large Language Models
by: Wang, Junlin, et al.
Published: (2026) -
ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning
by: Kwon, Yongchan, et al.
Published: (2025) -
Exploring the use of AI authors and reviewers at Agents4Science
by: Bianchi, Federico, et al.
Published: (2025) -
To Err Is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis
by: Bianchi, Federico, et al.
Published: (2025) -
What LLMs Think When You Don't Tell Them What to Think About?
by: Kwon, Yongchan, et al.
Published: (2026)