:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Fang, Yi, Li, Moxin, Wang, Wenjie, Lin, Hui, Feng, Fuli
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language
Accesso online:	https://arxiv.org/abs/2406.11514
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning
di: Li, Xiaoyuan, et al.
Pubblicazione: (2025)

Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative Societies
di: Zhang, Hanzhong, et al.
Pubblicazione: (2026)

Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection
di: Li, Moxin, et al.
Pubblicazione: (2024)

Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
di: Li, Xiaoyuan, et al.
Pubblicazione: (2024)

SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs
di: Li, Xiaoyuan, et al.
Pubblicazione: (2026)

MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning
di: Li, Xiaoyuan, et al.
Pubblicazione: (2025)

Robust Prompt Optimization for Large Language Models Against Distribution Shifts
di: Li, Moxin, et al.
Pubblicazione: (2023)

Assistant-Guided Mitigation of Teacher Preference Bias in LLM-as-a-Judge
di: Liu, Zhuo, et al.
Pubblicazione: (2025)

On Predicting the Post-training Potential of Pre-trained LLMs
di: Li, Xiaoyuan, et al.
Pubblicazione: (2026)

ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning
di: Li, Xiaoyuan, et al.
Pubblicazione: (2026)

Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment
di: Li, Moxin, et al.
Pubblicazione: (2025)

SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation
di: Li, Xiaoyuan, et al.
Pubblicazione: (2026)

MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation
di: Li, Xiaoyuan, et al.
Pubblicazione: (2025)

Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration
di: Li, Ang, et al.
Pubblicazione: (2024)

SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing LLMs
di: Kong, Chuyi, et al.
Pubblicazione: (2024)

CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG
di: Deng, Boyi, et al.
Pubblicazione: (2024)

TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data
di: Zhu, Fengbin, et al.
Pubblicazione: (2024)

Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical Graphs
di: Zhu, Fengbin, et al.
Pubblicazione: (2023)

Dialectic-Med: Mitigating Diagnostic Hallucinations via Counterfactual Adversarial Multi-Agent Debate
di: Lu, Zhixiang, et al.
Pubblicazione: (2026)

Dual-Phase Accelerated Prompt Optimization
di: Yang, Muchen, et al.
Pubblicazione: (2024)

SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
di: Deng, Boyi, et al.
Pubblicazione: (2025)

Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation
di: Koupaee, Mahnaz, et al.
Pubblicazione: (2025)

Removal of Hallucination on Hallucination: Debate-Augmented RAG
di: Hu, Wentao, et al.
Pubblicazione: (2025)

Controllable LLM Reasoning via Sparse Autoencoder-Based Steering
di: Fang, Yi, et al.
Pubblicazione: (2026)

Breaking Event Rumor Detection via Stance-Separated Multi-Agent Debate
di: Zhang, Mingqing, et al.
Pubblicazione: (2024)

ORCHID: A Chinese Debate Corpus for Target-Independent Stance Detection and Argumentative Dialogue Summarization
di: Zhao, Xiutian, et al.
Pubblicazione: (2024)

Counterfactual Probing for Hallucination Detection and Mitigation in Large Language Models
di: Feng, Yijun
Pubblicazione: (2025)

Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework
di: Sun, Xiaoxi, et al.
Pubblicazione: (2024)

One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment
di: Cai, Hongru, et al.
Pubblicazione: (2026)

Acting Flatterers via LLMs Sycophancy: Combating Clickbait with LLMs Opposing-Stance Reasoning
di: Zhang, Chaowei, et al.
Pubblicazione: (2026)

Strategic Planning and Rationalizing on Trees Make LLMs Better Debaters
di: Wang, Danqing, et al.
Pubblicazione: (2025)

Navigating Through Paper Flood: Advancing LLM-based Paper Evaluation through Domain-Aware Retrieval and Latent Reasoning
di: Zheng, Wuqiang, et al.
Pubblicazione: (2025)

TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems
di: Xu, Chen, et al.
Pubblicazione: (2026)

Is External Information Useful for Stance Detection with LLMs?
di: Nguyen, Quang Minh, et al.
Pubblicazione: (2025)

A Debate-Driven Experiment on LLM Hallucinations and Accuracy
di: Li, Ray, et al.
Pubblicazione: (2024)

A Survey of Generative Search and Recommendation in the Era of Large Language Models
di: Li, Yongqi, et al.
Pubblicazione: (2024)

Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate
di: Zhang, Yiqun, et al.
Pubblicazione: (2024)

MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs
di: Ning, Yucheng, et al.
Pubblicazione: (2025)

PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models
di: Wang, Chengbing, et al.
Pubblicazione: (2026)

Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery
di: Yang, Chaoqun, et al.
Pubblicazione: (2026)