Saved in:
| Main Authors: | Gu, Zhouhong, Zhang, Lin, Chen, Jiangjie, Ye, Haoning, Zhu, Xiaoxuan, Li, Zihan, Ye, Zheyu, Gao, Yan, Hu, Yao, Xiao, Yanghua, Feng, Hongwei |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2307.05113 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?
by: Gu, Zhouhong, et al.
Published: (2024)
by: Gu, Zhouhong, et al.
Published: (2024)
StrucText-Eval: Evaluating Large Language Model's Reasoning Ability in Structure-Rich Text
by: Gu, Zhouhong, et al.
Published: (2024)
by: Gu, Zhouhong, et al.
Published: (2024)
Efficiently Quantifying and Mitigating Ripple Effects in Model Editing
by: Wang, Jianchen, et al.
Published: (2024)
by: Wang, Jianchen, et al.
Published: (2024)
AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior
by: Gu, Zhouhong, et al.
Published: (2024)
by: Gu, Zhouhong, et al.
Published: (2024)
RECKON: Large-scale Reference-based Efficient Knowledge Evaluation for Large Language Model
by: Zhang, Lin, et al.
Published: (2025)
by: Zhang, Lin, et al.
Published: (2025)
VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It
by: Zhu, Xiaoxuan, et al.
Published: (2024)
by: Zhu, Xiaoxuan, et al.
Published: (2024)
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
by: Gu, Zhouhong, et al.
Published: (2023)
by: Gu, Zhouhong, et al.
Published: (2023)
MIRAGE: Exploring How Large Language Models Perform in Complex Social Interactive Environments
by: Cai, Yin, et al.
Published: (2025)
by: Cai, Yin, et al.
Published: (2025)
LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection
by: Wang, Yifeng, et al.
Published: (2024)
by: Wang, Yifeng, et al.
Published: (2024)
ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection
by: Zhu, Xiaoxuan, et al.
Published: (2025)
by: Zhu, Xiaoxuan, et al.
Published: (2025)
LITE: LLM-Impelled efficient Taxonomy Evaluation
by: Zhang, Lin, et al.
Published: (2025)
by: Zhang, Lin, et al.
Published: (2025)
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need
by: Gu, Zhouhong, et al.
Published: (2025)
by: Gu, Zhouhong, et al.
Published: (2025)
Can Large Language Models Understand Real-World Complex Instructions?
by: He, Qianyu, et al.
Published: (2023)
by: He, Qianyu, et al.
Published: (2023)
Enhancing Language Agent Strategic Reasoning through Self-Play in Adversarial Games
by: Zhang, Yikai, et al.
Published: (2025)
by: Zhang, Yikai, et al.
Published: (2025)
GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization
by: Gu, Zhouhong, et al.
Published: (2025)
by: Gu, Zhouhong, et al.
Published: (2025)
ConcEPT: Concept-Enhanced Pre-Training for Language Models
by: Wang, Xintao, et al.
Published: (2024)
by: Wang, Xintao, et al.
Published: (2024)
Scaling Behavior of Single LLM-Driven Multi-Agent Systems
by: Li, Jialing, et al.
Published: (2026)
by: Li, Jialing, et al.
Published: (2026)
How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?
by: Wu, Siye, et al.
Published: (2024)
by: Wu, Siye, et al.
Published: (2024)
ScholarGym: Benchmarking Large Language Model Capabilities in the Information-Gathering Stage of Deep Research
by: Shen, Hao, et al.
Published: (2026)
by: Shen, Hao, et al.
Published: (2026)
TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation
by: Zhang, Yikai, et al.
Published: (2024)
by: Zhang, Yikai, et al.
Published: (2024)
Past Meets Present: Creating Historical Analogy with Large Language Models
by: Li, Nianqi, et al.
Published: (2024)
by: Li, Nianqi, et al.
Published: (2024)
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
by: Xie, Jian, et al.
Published: (2024)
by: Xie, Jian, et al.
Published: (2024)
Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models
by: Chen, Yuyan, et al.
Published: (2024)
by: Chen, Yuyan, et al.
Published: (2024)
EmotionQueen: A Benchmark for Evaluating Empathy of Large Language Models
by: Chen, Yuyan, et al.
Published: (2024)
by: Chen, Yuyan, et al.
Published: (2024)
MCiteBench: A Multimodal Benchmark for Generating Text with Citations
by: Hu, Caiyu, et al.
Published: (2025)
by: Hu, Caiyu, et al.
Published: (2025)
GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick
by: Fu, Jiayi, et al.
Published: (2024)
by: Fu, Jiayi, et al.
Published: (2024)
ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base
by: Yuan, Siyu, et al.
Published: (2023)
by: Yuan, Siyu, et al.
Published: (2023)
Towards the Law of Capacity Gap in Distilling Language Models
by: Zhang, Chen, et al.
Published: (2023)
by: Zhang, Chen, et al.
Published: (2023)
TravelAgent: An AI Assistant for Personalized Travel Planning
by: Chen, Aili, et al.
Published: (2024)
by: Chen, Aili, et al.
Published: (2024)
TESSELLATE: Piecing Together the Variable Sky With TESS
by: Roxburgh, Hugh, et al.
Published: (2025)
by: Roxburgh, Hugh, et al.
Published: (2025)
Embedded Course Reserves: Piecing the Puzzle Together
by: Clumpner, Krista E., et al.
Published: (2011)
by: Clumpner, Krista E., et al.
Published: (2011)
Recent Advancement of Emotion Cognition in Large Language Models
by: Chen, Yuyan, et al.
Published: (2024)
by: Chen, Yuyan, et al.
Published: (2024)
GAUSS: Benchmarking Structured Mathematical Skills for Large Language Models
by: Zhang, Yue, et al.
Published: (2025)
by: Zhang, Yue, et al.
Published: (2025)
AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation
by: Huang, Wenhao, et al.
Published: (2024)
by: Huang, Wenhao, et al.
Published: (2024)
Piece it Together: Part-Based Concepting with IP-Priors
by: Richardson, Elad, et al.
Published: (2025)
by: Richardson, Elad, et al.
Published: (2025)
Enhancing Quantitative Reasoning Skills of Large Language Models through Dimension Perception
by: Huang, Yuncheng, et al.
Published: (2023)
by: Huang, Yuncheng, et al.
Published: (2023)
Revealing the Barriers of Language Agents in Planning
by: Xie, Jian, et al.
Published: (2024)
by: Xie, Jian, et al.
Published: (2024)
Piecing It All Together: Verifying Multi-Hop Multimodal Claims
by: Wang, Haoran, et al.
Published: (2024)
by: Wang, Haoran, et al.
Published: (2024)
Fitting the Pieces Together: Developing Better Service for End-Users.
by: Bridges, Peggy Bass, et al.
Published: (2000)
by: Bridges, Peggy Bass, et al.
Published: (2000)
Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation
by: Wang, Fei, et al.
Published: (2025)
by: Wang, Fei, et al.
Published: (2025)
Similar Items
-
DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?
by: Gu, Zhouhong, et al.
Published: (2024) -
StrucText-Eval: Evaluating Large Language Model's Reasoning Ability in Structure-Rich Text
by: Gu, Zhouhong, et al.
Published: (2024) -
Efficiently Quantifying and Mitigating Ripple Effects in Model Editing
by: Wang, Jianchen, et al.
Published: (2024) -
AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior
by: Gu, Zhouhong, et al.
Published: (2024) -
RECKON: Large-scale Reference-based Efficient Knowledge Evaluation for Large Language Model
by: Zhang, Lin, et al.
Published: (2025)