Saved in:
| Main Authors: | Chakraborty, Souradip, Pourreza, Mohammadreza, Sun, Ruoxi, Song, Yiwen, Scherrer, Nino, Huang, Furong, Bedi, Amrit Singh, Beirami, Ahmad, Gu, Jindong, Palangi, Hamid, Pfister, Tomas |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.01931 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RL with Learnable Textual Feedback: A Bilevel Approach
by: Singh, Utsav, et al.
Published: (2026)
by: Singh, Utsav, et al.
Published: (2026)
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback
by: Chakraborty, Souradip, et al.
Published: (2023)
by: Chakraborty, Souradip, et al.
Published: (2023)
HEART: Emotionally-Driven Test-Time Scaling of Language Models
by: Pinto, Gabriela, et al.
Published: (2025)
by: Pinto, Gabriela, et al.
Published: (2025)
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
by: Ghosal, Soumya Suvra, et al.
Published: (2024)
by: Ghosal, Soumya Suvra, et al.
Published: (2024)
Uncertainty-Aware Answer Selection for Improved Reasoning in Multi-LLM Systems
by: Agrawal, Aakriti, et al.
Published: (2025)
by: Agrawal, Aakriti, et al.
Published: (2025)
LIAR: Leveraging Inference Time Alignment (Best-of-N) to Jailbreak LLMs in Seconds
by: Beetham, James, et al.
Published: (2024)
by: Beetham, James, et al.
Published: (2024)
Safety Recovery in Reasoning Models Is Only a Few Early Steering Steps Away
by: Ghosal, Soumya Suvra, et al.
Published: (2026)
by: Ghosal, Soumya Suvra, et al.
Published: (2026)
Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training
by: Barakat, Anas, et al.
Published: (2026)
by: Barakat, Anas, et al.
Published: (2026)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback
by: Chakraborty, Souradip, et al.
Published: (2023)
by: Chakraborty, Souradip, et al.
Published: (2023)
VARP: Reinforcement Learning from Vision-Language Model Feedback with Agent Regularized Preferences
by: Singh, Anukriti, et al.
Published: (2025)
by: Singh, Anukriti, et al.
Published: (2025)
Transfer Q Star: Principled Decoding for LLM Alignment
by: Chakraborty, Souradip, et al.
Published: (2024)
by: Chakraborty, Souradip, et al.
Published: (2024)
On the Global Optimality of Policy Gradient Methods in General Utility Reinforcement Learning
by: Barakat, Anas, et al.
Published: (2024)
by: Barakat, Anas, et al.
Published: (2024)
Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks
by: Sun, Xingpeng, et al.
Published: (2024)
by: Sun, Xingpeng, et al.
Published: (2024)
Code Comprehension then Auditing for Unsupervised LLM Evaluation
by: Patel, Bhrij, et al.
Published: (2024)
by: Patel, Bhrij, et al.
Published: (2024)
Does Thinking More always Help? Mirage of Test-Time Scaling in Reasoning Models
by: Ghosal, Soumya Suvra, et al.
Published: (2025)
by: Ghosal, Soumya Suvra, et al.
Published: (2025)
SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging
by: Pourreza, Mohammadreza, et al.
Published: (2024)
by: Pourreza, Mohammadreza, et al.
Published: (2024)
MIRA: Towards Mitigating Reward Hacking in Inference-Time Alignment of T2I Diffusion Models
by: Zhai, Kevin, et al.
Published: (2025)
by: Zhai, Kevin, et al.
Published: (2025)
MaxMin-RLHF: Alignment with Diverse Human Preferences
by: Chakraborty, Souradip, et al.
Published: (2024)
by: Chakraborty, Souradip, et al.
Published: (2024)
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
by: Ding, Mucong, et al.
Published: (2024)
by: Ding, Mucong, et al.
Published: (2024)
BalancedDPO: Adaptive Multi-Metric Alignment
by: Tamboli, Dipesh, et al.
Published: (2025)
by: Tamboli, Dipesh, et al.
Published: (2025)
PROPS: Progressively Private Self-alignment of Large Language Models
by: Teku, Noel, et al.
Published: (2025)
by: Teku, Noel, et al.
Published: (2025)
ScholarPeer: A Context-Aware Multi-Agent Framework for Automated Peer Review
by: Goyal, Palash, et al.
Published: (2026)
by: Goyal, Palash, et al.
Published: (2026)
Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment
by: Trivedi, Prashant, et al.
Published: (2025)
by: Trivedi, Prashant, et al.
Published: (2025)
FACT or Fiction: Can Truthful Mechanisms Eliminate Federated Free Riding?
by: Bornstein, Marco, et al.
Published: (2024)
by: Bornstein, Marco, et al.
Published: (2024)
DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning
by: Singh, Utsav, et al.
Published: (2024)
by: Singh, Utsav, et al.
Published: (2024)
Inducing Group Fairness in Prompt-Based Language Model Decisions
by: Atwood, James, et al.
Published: (2024)
by: Atwood, James, et al.
Published: (2024)
Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time
by: Chehade, Mohamad, et al.
Published: (2025)
by: Chehade, Mohamad, et al.
Published: (2025)
Watch and Learn: Learning to Use Computers from Online Videos
by: Song, Chan Hee, et al.
Published: (2025)
by: Song, Chan Hee, et al.
Published: (2025)
Agentic Critical Training
by: Liu, Weize, et al.
Published: (2026)
by: Liu, Weize, et al.
Published: (2026)
VQQA: An Agentic Approach for Video Evaluation and Quality Improvement
by: Song, Yiwen, et al.
Published: (2026)
by: Song, Yiwen, et al.
Published: (2026)
Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts
by: Lee, Jihoon, et al.
Published: (2025)
by: Lee, Jihoon, et al.
Published: (2025)
LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science
by: Salemi, Alireza, et al.
Published: (2025)
by: Salemi, Alireza, et al.
Published: (2025)
PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving
by: Parmar, Mihir, et al.
Published: (2025)
by: Parmar, Mihir, et al.
Published: (2025)
AI Cap-and-Trade: Efficiency Incentives for Accessibility and Sustainability
by: Bornstein, Marco, et al.
Published: (2026)
by: Bornstein, Marco, et al.
Published: (2026)
Towards Realistic Mechanisms That Incentivize Federated Participation and Contribution
by: Bornstein, Marco, et al.
Published: (2023)
by: Bornstein, Marco, et al.
Published: (2023)
Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in RL
by: Liu, Xiangyu, et al.
Published: (2023)
by: Liu, Xiangyu, et al.
Published: (2023)
DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models
by: Pourreza, Mohammadreza, et al.
Published: (2024)
by: Pourreza, Mohammadreza, et al.
Published: (2024)
Auction-Based Regulation for Artificial Intelligence
by: Bornstein, Marco, et al.
Published: (2024)
by: Bornstein, Marco, et al.
Published: (2024)
On the Vulnerability of LLM/VLM-Controlled Robotics
by: Wu, Xiyang, et al.
Published: (2024)
by: Wu, Xiyang, et al.
Published: (2024)
Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment
by: Sarkar, Pritam, et al.
Published: (2024)
by: Sarkar, Pritam, et al.
Published: (2024)
Similar Items
-
RL with Learnable Textual Feedback: A Bilevel Approach
by: Singh, Utsav, et al.
Published: (2026) -
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback
by: Chakraborty, Souradip, et al.
Published: (2023) -
HEART: Emotionally-Driven Test-Time Scaling of Language Models
by: Pinto, Gabriela, et al.
Published: (2025) -
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
by: Ghosal, Soumya Suvra, et al.
Published: (2024) -
Uncertainty-Aware Answer Selection for Improved Reasoning in Multi-LLM Systems
by: Agrawal, Aakriti, et al.
Published: (2025)