Saved in:
| Main Authors: | Li, Yang, Nijkamp, Erik, Yavuz, Semih, Joty, Shafiq |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.15113 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
by: Xu, Austin, et al.
Published: (2025)
by: Xu, Austin, et al.
Published: (2025)
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
by: Chen, Hailin, et al.
Published: (2023)
by: Chen, Hailin, et al.
Published: (2023)
Variation in Verification: Understanding Verification Dynamics in Large Language Models
by: Zhou, Yefan, et al.
Published: (2025)
by: Zhou, Yefan, et al.
Published: (2025)
Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
by: Tu, Lifu, et al.
Published: (2024)
by: Tu, Lifu, et al.
Published: (2024)
SkillOrchestra: Learning to Route Agents via Skill Transfer
by: Wang, Jiayu, et al.
Published: (2026)
by: Wang, Jiayu, et al.
Published: (2026)
JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking
by: Niu, Tong, et al.
Published: (2024)
by: Niu, Tong, et al.
Published: (2024)
CEMTM: Contextual Embedding-based Multimodal Topic Modeling
by: Abaskohi, Amirhossein, et al.
Published: (2025)
by: Abaskohi, Amirhossein, et al.
Published: (2025)
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
by: Zhou, Yilun, et al.
Published: (2025)
by: Zhou, Yilun, et al.
Published: (2025)
VIBEPASS: Can Vibe Coders Really Pass the Vibe Check?
by: Bansal, Srijan, et al.
Published: (2026)
by: Bansal, Srijan, et al.
Published: (2026)
The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering
by: Zhou, Yefan, et al.
Published: (2026)
by: Zhou, Yefan, et al.
Published: (2026)
Recurrent Natural Policy Gradient for POMDPs
by: Cayci, Semih, et al.
Published: (2024)
by: Cayci, Semih, et al.
Published: (2024)
Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts
by: Nguyen, Xuan-Phi, et al.
Published: (2026)
by: Nguyen, Xuan-Phi, et al.
Published: (2026)
Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning
by: Tu, Lifu, et al.
Published: (2023)
by: Tu, Lifu, et al.
Published: (2023)
Variational Self-Supervised Learning
by: Yavuz, Mehmet Can, et al.
Published: (2025)
by: Yavuz, Mehmet Can, et al.
Published: (2025)
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
by: Shi, Zhenmei, et al.
Published: (2024)
by: Shi, Zhenmei, et al.
Published: (2024)
Multivariate Variational Autoencoder
by: Yavuz, Mehmet Can
Published: (2025)
by: Yavuz, Mehmet Can
Published: (2025)
Variational Distillation of Diffusion Policies into Mixture of Experts
by: Zhou, Hongyi, et al.
Published: (2024)
by: Zhou, Hongyi, et al.
Published: (2024)
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
by: Ming, Yifei, et al.
Published: (2024)
by: Ming, Yifei, et al.
Published: (2024)
Gradually Compacting Large Language Models for Reasoning Like a Boiling Frog
by: Zhao, Yiran, et al.
Published: (2026)
by: Zhao, Yiran, et al.
Published: (2026)
Variational Self-Supervised Contrastive Learning Using Beta Divergence
by: Yavuz, Mehmet Can, et al.
Published: (2023)
by: Yavuz, Mehmet Can, et al.
Published: (2023)
Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation
by: Zhang, Yuwei, et al.
Published: (2026)
by: Zhang, Yuwei, et al.
Published: (2026)
CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval
by: Liu, Ye, et al.
Published: (2024)
by: Liu, Ye, et al.
Published: (2024)
Modeling Uncertainty and Using Post-fusion as Fallback Improves Retrieval Augmented Generation with LLMs
by: Liu, Ye, et al.
Published: (2023)
by: Liu, Ye, et al.
Published: (2023)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification
by: Liang, Zhenwen, et al.
Published: (2024)
by: Liang, Zhenwen, et al.
Published: (2024)
Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation
by: Cayci, Semih, et al.
Published: (2021)
by: Cayci, Semih, et al.
Published: (2021)
Demystifying Domain-adaptive Post-training for Financial LLMs
by: Ke, Zixuan, et al.
Published: (2025)
by: Ke, Zixuan, et al.
Published: (2025)
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models
by: Islam, Shayekh Bin, et al.
Published: (2024)
by: Islam, Shayekh Bin, et al.
Published: (2024)
References Improve LLM Alignment in Non-Verifiable Domains
by: Shi, Kejian, et al.
Published: (2026)
by: Shi, Kejian, et al.
Published: (2026)
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math
by: Pandit, Shrey, et al.
Published: (2025)
by: Pandit, Shrey, et al.
Published: (2025)
On the Shelf Life of Fine-Tuned LLM-Judges: Future-Proofing, Backward-Compatibility, and Question Generalization
by: Singh, Janvijay, et al.
Published: (2025)
by: Singh, Janvijay, et al.
Published: (2025)
Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models
by: Cayci, Semih
Published: (2025)
by: Cayci, Semih
Published: (2025)
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
by: Xu, Austin, et al.
Published: (2025)
by: Xu, Austin, et al.
Published: (2025)
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
by: Xu, Charles, et al.
Published: (2024)
by: Xu, Charles, et al.
Published: (2024)
Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning
by: Yang, Xuewei, et al.
Published: (2026)
by: Yang, Xuewei, et al.
Published: (2026)
Continual Policy Distillation from Distributed Reinforcement Learning Teachers
by: Li, Yuxuan, et al.
Published: (2026)
by: Li, Yuxuan, et al.
Published: (2026)
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization
by: Liu, Yixin, et al.
Published: (2023)
by: Liu, Yixin, et al.
Published: (2023)
MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision
by: Ke, Zixuan, et al.
Published: (2025)
by: Ke, Zixuan, et al.
Published: (2025)
Best Policy Learning from Trajectory Preference Feedback
by: Agnihotri, Akhil, et al.
Published: (2025)
by: Agnihotri, Akhil, et al.
Published: (2025)
SSR: Socratic Self-Refine for Large Language Model Reasoning
by: Shi, Haizhou, et al.
Published: (2025)
by: Shi, Haizhou, et al.
Published: (2025)
KL for a KL: On-Policy Distillation with Control Variate Baseline
by: Oh, Minjae, et al.
Published: (2026)
by: Oh, Minjae, et al.
Published: (2026)
Similar Items
-
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
by: Xu, Austin, et al.
Published: (2025) -
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
by: Chen, Hailin, et al.
Published: (2023) -
Variation in Verification: Understanding Verification Dynamics in Large Language Models
by: Zhou, Yefan, et al.
Published: (2025) -
Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
by: Tu, Lifu, et al.
Published: (2024) -
SkillOrchestra: Learning to Route Agents via Skill Transfer
by: Wang, Jiayu, et al.
Published: (2026)