Saved in:
| Main Authors: | Chen, Changyu, Liu, Zichen, Du, Chao, Pang, Tianyu, Liu, Qian, Sinha, Arunesh, Varakantham, Pradeep, Lin, Min |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.09760 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression
by: Ge, Zichang, et al.
Published: (2025)
by: Ge, Zichang, et al.
Published: (2025)
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
by: Luo, Renjie, et al.
Published: (2025)
by: Luo, Renjie, et al.
Published: (2025)
Towards Neural Network based Cognitive Models of Dynamic Decision-Making by Humans
by: Chen, Changyu, et al.
Published: (2024)
by: Chen, Changyu, et al.
Published: (2024)
Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs
by: Lu, Yuxiao, et al.
Published: (2024)
by: Lu, Yuxiao, et al.
Published: (2024)
Handling Long and Richly Constrained Tasks through Constrained Hierarchical Reinforcement Learning
by: Lu, Yuxiao, et al.
Published: (2023)
by: Lu, Yuxiao, et al.
Published: (2023)
Automatic LLM Red Teaming
by: Belaire, Roman, et al.
Published: (2025)
by: Belaire, Roman, et al.
Published: (2025)
On Minimizing Adversarial Counterfactual Error in Adversarial RL
by: Belaire, Roman, et al.
Published: (2024)
by: Belaire, Roman, et al.
Published: (2024)
Understanding R1-Zero-Like Training: A Critical Perspective
by: Liu, Zichen, et al.
Published: (2025)
by: Liu, Zichen, et al.
Published: (2025)
Sample-Efficient Alignment for LLMs
by: Liu, Zichen, et al.
Published: (2024)
by: Liu, Zichen, et al.
Published: (2024)
Variational Reasoning for Language Models
by: Zhou, Xiangxin, et al.
Published: (2025)
by: Zhou, Xiangxin, et al.
Published: (2025)
Purifying Large Language Models by Ensembling a Small Language Model
by: Li, Tianlin, et al.
Published: (2024)
by: Li, Tianlin, et al.
Published: (2024)
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
by: Qi, Penghui, et al.
Published: (2025)
by: Qi, Penghui, et al.
Published: (2025)
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses
by: Zheng, Xiaosen, et al.
Published: (2024)
by: Zheng, Xiaosen, et al.
Published: (2024)
Test-Time Backdoor Attacks on Multimodal Large Language Models
by: Lu, Dong, et al.
Published: (2024)
by: Lu, Dong, et al.
Published: (2024)
When Attention Sink Emerges in Language Models: An Empirical View
by: Gu, Xiangming, et al.
Published: (2024)
by: Gu, Xiangming, et al.
Published: (2024)
Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models
by: Wong, Wai Tuck, et al.
Published: (2026)
by: Wong, Wai Tuck, et al.
Published: (2026)
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
by: Zhang, Xuan, et al.
Published: (2024)
by: Zhang, Xuan, et al.
Published: (2024)
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
by: Min, Rui, et al.
Published: (2025)
by: Min, Rui, et al.
Published: (2025)
Defeating the Training-Inference Mismatch via FP16
by: Qi, Penghui, et al.
Published: (2025)
by: Qi, Penghui, et al.
Published: (2025)
A Closer Look at Machine Unlearning for Large Language Models
by: Yuan, Xiaojian, et al.
Published: (2024)
by: Yuan, Xiaojian, et al.
Published: (2024)
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
by: Zheng, Xiaosen, et al.
Published: (2024)
by: Zheng, Xiaosen, et al.
Published: (2024)
Reinforcing General Reasoning without Verifiers
by: Zhou, Xiangxin, et al.
Published: (2025)
by: Zhou, Xiangxin, et al.
Published: (2025)
Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
by: Qi, Xuan, et al.
Published: (2025)
by: Qi, Xuan, et al.
Published: (2025)
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
by: Wang, Bo, et al.
Published: (2025)
by: Wang, Bo, et al.
Published: (2025)
Scaling up Masked Diffusion Models on Text
by: Nie, Shen, et al.
Published: (2024)
by: Nie, Shen, et al.
Published: (2024)
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
by: Jia, Xiaojun, et al.
Published: (2024)
by: Jia, Xiaojun, et al.
Published: (2024)
Scalable Token-Level Hallucination Detection in Large Language Models
by: Min, Rui, et al.
Published: (2026)
by: Min, Rui, et al.
Published: (2026)
Lifelong Safety Alignment for Language Models
by: Wang, Haoyu, et al.
Published: (2025)
by: Wang, Haoyu, et al.
Published: (2025)
Training Optimal Large Diffusion Language Models
by: Ni, Jinjie, et al.
Published: (2025)
by: Ni, Jinjie, et al.
Published: (2025)
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function
by: Wang, Zhichao, et al.
Published: (2024)
by: Wang, Zhichao, et al.
Published: (2024)
LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation
by: Zhang, Xuan, et al.
Published: (2024)
by: Zhang, Xuan, et al.
Published: (2024)
Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts
by: Gao, Hongcheng, et al.
Published: (2024)
by: Gao, Hongcheng, et al.
Published: (2024)
Why is Your Language Model a Poor Implicit Reward Model?
by: Razin, Noam, et al.
Published: (2025)
by: Razin, Noam, et al.
Published: (2025)
Benchmarking Large Multimodal Models against Common Corruptions
by: Zhang, Jiawei, et al.
Published: (2024)
by: Zhang, Jiawei, et al.
Published: (2024)
DavIR: Data Selection via Implicit Reward for Large Language Models
by: Zhou, Haotian, et al.
Published: (2023)
by: Zhou, Haotian, et al.
Published: (2023)
Heterogeneous Graph Generation: A Hierarchical Approach using Node Feature Pooling
by: Ghosh, Hritaban, et al.
Published: (2024)
by: Ghosh, Hritaban, et al.
Published: (2024)
Process Reinforcement through Implicit Rewards
by: Cui, Ganqu, et al.
Published: (2025)
by: Cui, Ganqu, et al.
Published: (2025)
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
by: Gu, Xiangming, et al.
Published: (2024)
by: Gu, Xiangming, et al.
Published: (2024)
GIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNA
by: Wang, Zhichao
Published: (2025)
by: Wang, Zhichao
Published: (2025)
Self-Generated Critiques Boost Reward Modeling for Language Models
by: Yu, Yue, et al.
Published: (2024)
by: Yu, Yue, et al.
Published: (2024)
Similar Items
-
On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression
by: Ge, Zichang, et al.
Published: (2025) -
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
by: Luo, Renjie, et al.
Published: (2025) -
Towards Neural Network based Cognitive Models of Dynamic Decision-Making by Humans
by: Chen, Changyu, et al.
Published: (2024) -
Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs
by: Lu, Yuxiao, et al.
Published: (2024) -
Handling Long and Richly Constrained Tasks through Constrained Hierarchical Reinforcement Learning
by: Lu, Yuxiao, et al.
Published: (2023)