Saved in:
| Main Authors: | Chen, Yihan, Xu, Benfeng, Wang, Quan, Liu, Yi, Mao, Zhendong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.00690 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding
by: Zhu, Chiwei, et al.
Published: (2025)
by: Zhu, Chiwei, et al.
Published: (2025)
An Index-based Approach for Efficient and Effective Web Content Extraction
by: Chen, Yihan, et al.
Published: (2025)
by: Chen, Yihan, et al.
Published: (2025)
Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking
by: Chen, Yihan, et al.
Published: (2025)
by: Chen, Yihan, et al.
Published: (2025)
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts
by: Xu, Benfeng, et al.
Published: (2023)
by: Xu, Benfeng, et al.
Published: (2023)
Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach
by: Li, Ruizhe, et al.
Published: (2025)
by: Li, Ruizhe, et al.
Published: (2025)
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
by: Du, Mingxuan, et al.
Published: (2025)
by: Du, Mingxuan, et al.
Published: (2025)
Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability
by: Zhu, Chiwei, et al.
Published: (2025)
by: Zhu, Chiwei, et al.
Published: (2025)
WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora
by: Wang, Pengyu, et al.
Published: (2026)
by: Wang, Pengyu, et al.
Published: (2026)
A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces
by: Du, Mingxuan, et al.
Published: (2026)
by: Du, Mingxuan, et al.
Published: (2026)
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools
by: Guo, Zikang, et al.
Published: (2025)
by: Guo, Zikang, et al.
Published: (2025)
DeepResearch Bench II: Diagnosing Deep Research Agents via Rubrics from Expert Report
by: Li, Ruizhe, et al.
Published: (2026)
by: Li, Ruizhe, et al.
Published: (2026)
Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation
by: Zhong, Tianqi, et al.
Published: (2024)
by: Zhong, Tianqi, et al.
Published: (2024)
FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents
by: Zhu, Chiwei, et al.
Published: (2026)
by: Zhu, Chiwei, et al.
Published: (2026)
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization
by: Liu, Yixin, et al.
Published: (2023)
by: Liu, Yixin, et al.
Published: (2023)
Mitigating Biases in Language Models via Bias Unlearning
by: Liu, Dianqing, et al.
Published: (2025)
by: Liu, Dianqing, et al.
Published: (2025)
Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles
by: Wang, Shaohan, et al.
Published: (2026)
by: Wang, Shaohan, et al.
Published: (2026)
Instruction Mining: Instruction Data Selection for Tuning Large Language Models
by: Cao, Yihan, et al.
Published: (2023)
by: Cao, Yihan, et al.
Published: (2023)
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
by: Liu, Yi, et al.
Published: (2025)
by: Liu, Yi, et al.
Published: (2025)
FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimization
by: Zhu, Mingye, et al.
Published: (2024)
by: Zhu, Mingye, et al.
Published: (2024)
Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation
by: Li, Jiaang, et al.
Published: (2026)
by: Li, Jiaang, et al.
Published: (2026)
TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking
by: Wang, Danqing, et al.
Published: (2024)
by: Wang, Danqing, et al.
Published: (2024)
Leveraging Robust Optimization for LLM Alignment under Distribution Shifts
by: Zhu, Mingye, et al.
Published: (2025)
by: Zhu, Mingye, et al.
Published: (2025)
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
by: Qi, Yunjia, et al.
Published: (2025)
by: Qi, Yunjia, et al.
Published: (2025)
TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis
by: Wu, Xiaorui, et al.
Published: (2025)
by: Wu, Xiaorui, et al.
Published: (2025)
Benchmarking Political Persuasion Risks Across Frontier Large Language Models
by: Chen, Zhongren, et al.
Published: (2026)
by: Chen, Zhongren, et al.
Published: (2026)
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models
by: Liu, Yang, et al.
Published: (2024)
by: Liu, Yang, et al.
Published: (2024)
SimpleStrat: Diversifying Language Model Generation with Stratification
by: Wong, Justin, et al.
Published: (2024)
by: Wong, Justin, et al.
Published: (2024)
Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models
by: Zhang, Huatian, et al.
Published: (2026)
by: Zhang, Huatian, et al.
Published: (2026)
MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models
by: Yan, Qiao, et al.
Published: (2025)
by: Yan, Qiao, et al.
Published: (2025)
Enhancing Large Language Models Against Inductive Instructions with Dual-critique Prompting
by: Wang, Rui, et al.
Published: (2023)
by: Wang, Rui, et al.
Published: (2023)
Resilience of Large Language Models for Noisy Instructions
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA
by: Li, Jiaang, et al.
Published: (2024)
by: Li, Jiaang, et al.
Published: (2024)
PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model
by: Zhang, Yizhe, et al.
Published: (2023)
by: Zhang, Yizhe, et al.
Published: (2023)
EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models
by: Zou, Tao, et al.
Published: (2025)
by: Zou, Tao, et al.
Published: (2025)
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
by: Quan, Yinzhu, et al.
Published: (2024)
by: Quan, Yinzhu, et al.
Published: (2024)
Feature-Adaptive and Data-Scalable In-Context Learning
by: Li, Jiahao, et al.
Published: (2024)
by: Li, Jiahao, et al.
Published: (2024)
LexGenius: An Expert-Level Benchmark for Large Language Models in Legal General Intelligence
by: Liu, Wenjin, et al.
Published: (2025)
by: Liu, Wenjin, et al.
Published: (2025)
Improved Unbiased Watermark for Large Language Models
by: Chen, Ruibo, et al.
Published: (2025)
by: Chen, Ruibo, et al.
Published: (2025)
KCS: Diversify Multi-hop Question Generation with Knowledge Composition Sampling
by: Wang, Yangfan, et al.
Published: (2025)
by: Wang, Yangfan, et al.
Published: (2025)
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models
by: Chen, Xinyi, et al.
Published: (2024)
by: Chen, Xinyi, et al.
Published: (2024)
Similar Items
-
From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding
by: Zhu, Chiwei, et al.
Published: (2025) -
An Index-based Approach for Efficient and Effective Web Content Extraction
by: Chen, Yihan, et al.
Published: (2025) -
Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking
by: Chen, Yihan, et al.
Published: (2025) -
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts
by: Xu, Benfeng, et al.
Published: (2023) -
Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach
by: Li, Ruizhe, et al.
Published: (2025)