Saved in:
| Main Authors: | Liang, Jingcong, Ye, Rong, Han, Meng, Lai, Ruofei, Zhang, Xinyu, Huang, Xuanjing, Wei, Zhongyu |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.08010 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
by: Mou, Xinyi, et al.
Published: (2024)
by: Mou, Xinyi, et al.
Published: (2024)
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
by: Wang, Siyuan, et al.
Published: (2024)
by: Wang, Siyuan, et al.
Published: (2024)
From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents
by: Mou, Xinyi, et al.
Published: (2024)
by: Mou, Xinyi, et al.
Published: (2024)
LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation
by: Duan, Feiyu, et al.
Published: (2026)
by: Duan, Feiyu, et al.
Published: (2026)
Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation
by: Mou, Xinyi, et al.
Published: (2024)
by: Mou, Xinyi, et al.
Published: (2024)
Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks
by: Yue, Shengbin, et al.
Published: (2024)
by: Yue, Shengbin, et al.
Published: (2024)
ALaRM: Align Language Models via Hierarchical Rewards Modeling
by: Lai, Yuhang, et al.
Published: (2024)
by: Lai, Yuhang, et al.
Published: (2024)
CURP: Codebook-based Continuous User Representation for Personalized Generation with LLMs
by: Wang, Liang, et al.
Published: (2026)
by: Wang, Liang, et al.
Published: (2026)
How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation
by: Long, Zhuohang, et al.
Published: (2025)
by: Long, Zhuohang, et al.
Published: (2025)
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
by: Li, Zejun, et al.
Published: (2024)
by: Li, Zejun, et al.
Published: (2024)
Overview of the CAIL 2023 Argument Mining Track
by: Liang, Jingcong, et al.
Published: (2024)
by: Liang, Jingcong, et al.
Published: (2024)
EcoLANG: Efficient and Effective Agent Communication Language Induction for Social Simulation
by: Mou, Xinyi, et al.
Published: (2025)
by: Mou, Xinyi, et al.
Published: (2025)
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation
by: Sternlicht, Noy, et al.
Published: (2025)
by: Sternlicht, Noy, et al.
Published: (2025)
HAF-RM: A Hybrid Alignment Framework for Reward Model Training
by: Liu, Shujun, et al.
Published: (2024)
by: Liu, Shujun, et al.
Published: (2024)
Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction
by: Yue, Shengbin, et al.
Published: (2025)
by: Yue, Shengbin, et al.
Published: (2025)
PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation
by: Bao, Zhijie, et al.
Published: (2024)
by: Bao, Zhijie, et al.
Published: (2024)
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
by: Du, Mengfei, et al.
Published: (2024)
by: Du, Mengfei, et al.
Published: (2024)
CauESC: A Causal Aware Model for Emotional Support Conversation
by: Chen, Wei, et al.
Published: (2024)
by: Chen, Wei, et al.
Published: (2024)
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
by: Wang, Yidong, et al.
Published: (2025)
by: Wang, Yidong, et al.
Published: (2025)
AI-Press: A Multi-Agent News Generating and Feedback Simulation System Powered by Large Language Models
by: Liu, Xiawei, et al.
Published: (2024)
by: Liu, Xiawei, et al.
Published: (2024)
Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game
by: Ye, Rong, et al.
Published: (2025)
by: Ye, Rong, et al.
Published: (2025)
WHERE and WHICH: Iterative Debate for Biomedical Synthetic Data Augmentation
by: Zhao, Zhengyi, et al.
Published: (2025)
by: Zhao, Zhengyi, et al.
Published: (2025)
FinDebate: Multi-Agent Collaborative Intelligence for Financial Analysis
by: Cai, Tianshi, et al.
Published: (2025)
by: Cai, Tianshi, et al.
Published: (2025)
Judge's Verdict: A Comprehensive Analysis of LLM Judge Capability Through Human Agreement
by: Han, Steve, et al.
Published: (2025)
by: Han, Steve, et al.
Published: (2025)
Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis
by: Kargupta, Priyanka, et al.
Published: (2025)
by: Kargupta, Priyanka, et al.
Published: (2025)
Multi-dimensional Data Analysis and Applications Basing on LLM Agents and Knowledge Graph Interactions
by: Wang, Xi, et al.
Published: (2025)
by: Wang, Xi, et al.
Published: (2025)
Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks
by: Lin, Jiayu, et al.
Published: (2024)
by: Lin, Jiayu, et al.
Published: (2024)
SoMeLVLM: A Large Vision Language Model for Social Media Processing
by: Zhang, Xinnong, et al.
Published: (2024)
by: Zhang, Xinnong, et al.
Published: (2024)
DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
by: Du, Mengfei, et al.
Published: (2024)
by: Du, Mengfei, et al.
Published: (2024)
Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge
by: Shi, Lin, et al.
Published: (2024)
by: Shi, Lin, et al.
Published: (2024)
LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named Entity Recognition
by: Ye, Junjie, et al.
Published: (2024)
by: Ye, Junjie, et al.
Published: (2024)
RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation
by: Li, Tianjiao, et al.
Published: (2025)
by: Li, Tianjiao, et al.
Published: (2025)
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
by: Ye, Jiayi, et al.
Published: (2024)
by: Ye, Jiayi, et al.
Published: (2024)
Beyond Isolated Behaviors: Hierarchical User Modeling for LLM Personalization
by: Wang, Liang, et al.
Published: (2026)
by: Wang, Liang, et al.
Published: (2026)
Stepwise Informativeness Search for Efficient and Effective LLM Reasoning
by: Wang, Siyuan, et al.
Published: (2025)
by: Wang, Siyuan, et al.
Published: (2025)
MADIAVE: Multi-Agent Debate for Implicit Attribute Value Extraction
by: Huang, Wei-Chieh, et al.
Published: (2025)
by: Huang, Wei-Chieh, et al.
Published: (2025)
Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation
by: Chen, Jiaju, et al.
Published: (2025)
by: Chen, Jiaju, et al.
Published: (2025)
AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling
by: Miao, Yongliang, et al.
Published: (2026)
by: Miao, Yongliang, et al.
Published: (2026)
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
by: Li, Han, et al.
Published: (2025)
by: Li, Han, et al.
Published: (2025)
Debate Helps Weak Judges Reward Stronger Models
by: Elasky, Ethan, et al.
Published: (2026)
by: Elasky, Ethan, et al.
Published: (2026)
Similar Items
-
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
by: Mou, Xinyi, et al.
Published: (2024) -
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
by: Wang, Siyuan, et al.
Published: (2024) -
From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents
by: Mou, Xinyi, et al.
Published: (2024) -
LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation
by: Duan, Feiyu, et al.
Published: (2026) -
Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation
by: Mou, Xinyi, et al.
Published: (2024)