Saved in:
| Main Author: | Young, Robin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.05293 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Scalable Oversight via Partitioned Human Supervision
by: Yin, Ren, et al.
Published: (2025)
by: Yin, Ren, et al.
Published: (2025)
Scalable Oversight for Superhuman AI via Recursive Self-Critiquing
by: Wen, Xueru, et al.
Published: (2025)
by: Wen, Xueru, et al.
Published: (2025)
Why Is RLHF Alignment Shallow? A Gradient Analysis
by: Young, Robin
Published: (2026)
by: Young, Robin
Published: (2026)
Great Models Think Alike and this Undermines AI Oversight
by: Goel, Shashwat, et al.
Published: (2025)
by: Goel, Shashwat, et al.
Published: (2025)
Towards Scalable Oversight with Collaborative Multi-Agent Debate in Error Detection
by: Chen, Yongqiang, et al.
Published: (2025)
by: Chen, Yongqiang, et al.
Published: (2025)
Scaling Laws For Scalable Oversight
by: Engels, Joshua, et al.
Published: (2025)
by: Engels, Joshua, et al.
Published: (2025)
Multi-Agent Debate with Memory Masking
by: Tian, Hongduan, et al.
Published: (2026)
by: Tian, Hongduan, et al.
Published: (2026)
Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey
by: Lee, Kunil, et al.
Published: (2026)
by: Lee, Kunil, et al.
Published: (2026)
LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations
by: Vujanic, Robin, et al.
Published: (2025)
by: Vujanic, Robin, et al.
Published: (2025)
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
by: Li, Han, et al.
Published: (2025)
by: Li, Han, et al.
Published: (2025)
HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature
by: Joshi, Devvrat, et al.
Published: (2026)
by: Joshi, Devvrat, et al.
Published: (2026)
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning
by: Liu, Tianci, et al.
Published: (2025)
by: Liu, Tianci, et al.
Published: (2025)
Building a Precise Video Language with Human-AI Oversight
by: Lin, Zhiqiu, et al.
Published: (2026)
by: Lin, Zhiqiu, et al.
Published: (2026)
Convergence and Divergence of Language Models under Different Random Seeds
by: Fehlauer, Finlay, et al.
Published: (2025)
by: Fehlauer, Finlay, et al.
Published: (2025)
Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge
by: Cui, Xinyue, et al.
Published: (2025)
by: Cui, Xinyue, et al.
Published: (2025)
DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models
by: Tiwari, Utkarsh, et al.
Published: (2025)
by: Tiwari, Utkarsh, et al.
Published: (2025)
UltRAG: a Universal Simple Scalable Recipe for Knowledge Graph RAG
by: Georgiev, Dobrik, et al.
Published: (2026)
by: Georgiev, Dobrik, et al.
Published: (2026)
DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models
by: Li, Zihao, et al.
Published: (2025)
by: Li, Zihao, et al.
Published: (2025)
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models
by: Wang, Fan, et al.
Published: (2024)
by: Wang, Fan, et al.
Published: (2024)
Stop Overvaluing Multi-Agent Debate -- We Must Rethink Evaluation and Embrace Model Heterogeneity
by: Zhang, Hangfan, et al.
Published: (2025)
by: Zhang, Hangfan, et al.
Published: (2025)
Mirage or Method? How Model-Task Alignment Induces Divergent RL Conclusions
by: Wu, Haoze, et al.
Published: (2025)
by: Wu, Haoze, et al.
Published: (2025)
Debate Helps Weak Judges Reward Stronger Models
by: Elasky, Ethan, et al.
Published: (2026)
by: Elasky, Ethan, et al.
Published: (2026)
Evaluating the Performance of Large Language Models via Debates
by: Moniri, Behrad, et al.
Published: (2024)
by: Moniri, Behrad, et al.
Published: (2024)
From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium
by: Yi, Xie, et al.
Published: (2025)
by: Yi, Xie, et al.
Published: (2025)
Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization
by: Deiseroth, Björn, et al.
Published: (2023)
by: Deiseroth, Björn, et al.
Published: (2023)
Let Models Speak Ciphers: Multiagent Debate through Embeddings
by: Pham, Chau, et al.
Published: (2023)
by: Pham, Chau, et al.
Published: (2023)
Verify with Caution: The Pitfalls of Relying on Imperfect Factuality Metrics
by: Godbole, Ameya, et al.
Published: (2025)
by: Godbole, Ameya, et al.
Published: (2025)
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values
by: Yu, Dian, et al.
Published: (2025)
by: Yu, Dian, et al.
Published: (2025)
ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA datasets with Large Language Models
by: Pradeep, Ronak, et al.
Published: (2024)
by: Pradeep, Ronak, et al.
Published: (2024)
Hidden States Know Where Reasoning Diverges: Credit Assignment via Span-Level Wasserstein Distance
by: Chen, Xinzhu, et al.
Published: (2026)
by: Chen, Xinzhu, et al.
Published: (2026)
BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate
by: Mazza, Arnon, et al.
Published: (2026)
by: Mazza, Arnon, et al.
Published: (2026)
RADAR: Relative Angular Divergence Across Representations
by: Cadet, Xavier, et al.
Published: (2026)
by: Cadet, Xavier, et al.
Published: (2026)
Scalable Ensembling For Mitigating Reward Overoptimisation
by: Ahmed, Ahmed M., et al.
Published: (2024)
by: Ahmed, Ahmed M., et al.
Published: (2024)
When Two LLMs Debate, Both Think They'll Win
by: Prasad, Pradyumna Shyama, et al.
Published: (2025)
by: Prasad, Pradyumna Shyama, et al.
Published: (2025)
Value Drifts: Tracing Value Alignment During LLM Post-Training
by: Bhatia, Mehar, et al.
Published: (2025)
by: Bhatia, Mehar, et al.
Published: (2025)
S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models
by: Young, Jack
Published: (2026)
by: Young, Jack
Published: (2026)
ClaimVer: Explainable Claim-Level Verification and Evidence Attribution of Text Through Knowledge Graphs
by: Dammu, Preetam Prabhu Srikar, et al.
Published: (2024)
by: Dammu, Preetam Prabhu Srikar, et al.
Published: (2024)
Automated Data Enrichment using Confidence-Aware Fine-Grained Debate among Open-Source LLMs for Mental Health and Online Safety
by: Mao, Junyu, et al.
Published: (2025)
by: Mao, Junyu, et al.
Published: (2025)
$C^2$: Scalable Auto-Feedback for LLM-based Chart Generation
by: Koh, Woosung, et al.
Published: (2024)
by: Koh, Woosung, et al.
Published: (2024)
Value Alignment from Unstructured Text
by: Padhi, Inkit, et al.
Published: (2024)
by: Padhi, Inkit, et al.
Published: (2024)
Similar Items
-
Towards Scalable Oversight via Partitioned Human Supervision
by: Yin, Ren, et al.
Published: (2025) -
Scalable Oversight for Superhuman AI via Recursive Self-Critiquing
by: Wen, Xueru, et al.
Published: (2025) -
Why Is RLHF Alignment Shallow? A Gradient Analysis
by: Young, Robin
Published: (2026) -
Great Models Think Alike and this Undermines AI Oversight
by: Goel, Shashwat, et al.
Published: (2025) -
Towards Scalable Oversight with Collaborative Multi-Agent Debate in Error Detection
by: Chen, Yongqiang, et al.
Published: (2025)