Saved in:
| Main Authors: | Liu, Xiao, Song, Xixuan, Dong, Yuxiao, Tang, Jie |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.00604 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Understanding Emergent Abilities of Language Models from the Loss Perspective
by: Du, Zhengxiao, et al.
Published: (2024)
by: Du, Zhengxiao, et al.
Published: (2024)
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
by: Cheng, Jiale, et al.
Published: (2024)
by: Cheng, Jiale, et al.
Published: (2024)
AlignBench: Benchmarking Chinese Alignment of Large Language Models
by: Liu, Xiao, et al.
Published: (2023)
by: Liu, Xiao, et al.
Published: (2023)
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models
by: Cheng, Jiale, et al.
Published: (2024)
by: Cheng, Jiale, et al.
Published: (2024)
Evolutionary Contrastive Distillation for Language Model Alignment
by: Katz-Samuels, Julian, et al.
Published: (2024)
by: Katz-Samuels, Julian, et al.
Published: (2024)
Parameter-Efficient Fine-Tuning for Foundation Models
by: Zhang, Dan, et al.
Published: (2025)
by: Zhang, Dan, et al.
Published: (2025)
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
by: Qu, Yuxiao, et al.
Published: (2024)
by: Qu, Yuxiao, et al.
Published: (2024)
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
by: Zhou, Zhanhui, et al.
Published: (2024)
by: Zhou, Zhanhui, et al.
Published: (2024)
SELF: Self-Evolution with Language Feedback
by: Lu, Jianqiao, et al.
Published: (2023)
by: Lu, Jianqiao, et al.
Published: (2023)
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
by: Dong, Guanting, et al.
Published: (2024)
by: Dong, Guanting, et al.
Published: (2024)
Self-Play Preference Optimization for Language Model Alignment
by: Wu, Yue, et al.
Published: (2024)
by: Wu, Yue, et al.
Published: (2024)
Deliberative Alignment: Reasoning Enables Safer Language Models
by: Guan, Melody Y., et al.
Published: (2024)
by: Guan, Melody Y., et al.
Published: (2024)
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
by: Chuang, Yung-Sung, et al.
Published: (2025)
by: Chuang, Yung-Sung, et al.
Published: (2025)
Self-Refinement of Language Models from External Proxy Metrics Feedback
by: Ramji, Keshav, et al.
Published: (2024)
by: Ramji, Keshav, et al.
Published: (2024)
On the Robustness of Reward Models for Language Model Alignment
by: Hong, Jiwoo, et al.
Published: (2025)
by: Hong, Jiwoo, et al.
Published: (2025)
Self-Evolving Critique Abilities in Large Language Models
by: Tang, Zhengyang, et al.
Published: (2025)
by: Tang, Zhengyang, et al.
Published: (2025)
Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models
by: Liu, Qin, et al.
Published: (2024)
by: Liu, Qin, et al.
Published: (2024)
Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy
by: Zhu, Yu, et al.
Published: (2024)
by: Zhu, Yu, et al.
Published: (2024)
UltraFeedback: Boosting Language Models with Scaled AI Feedback
by: Cui, Ganqu, et al.
Published: (2023)
by: Cui, Ganqu, et al.
Published: (2023)
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
by: Ding, Mucong, et al.
Published: (2024)
by: Ding, Mucong, et al.
Published: (2024)
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
by: Ji, Xiang, et al.
Published: (2024)
by: Ji, Xiang, et al.
Published: (2024)
Personalized Language Modeling from Personalized Human Feedback
by: Li, Xinyu, et al.
Published: (2024)
by: Li, Xinyu, et al.
Published: (2024)
RLTHF: Targeted Human Feedback for LLM Alignment
by: Xu, Yifei, et al.
Published: (2025)
by: Xu, Yifei, et al.
Published: (2025)
Training Language Models with Language Feedback at Scale
by: Scheurer, Jérémy, et al.
Published: (2023)
by: Scheurer, Jérémy, et al.
Published: (2023)
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
by: Zhou, Zhanhui, et al.
Published: (2024)
by: Zhou, Zhanhui, et al.
Published: (2024)
SALMON: Self-Alignment with Instructable Reward Models
by: Sun, Zhiqing, et al.
Published: (2023)
by: Sun, Zhiqing, et al.
Published: (2023)
Probabilistic Token Alignment for Large Language Model Fusion
by: Zeng, Runjia, et al.
Published: (2025)
by: Zeng, Runjia, et al.
Published: (2025)
ProgCo: Program Helps Self-Correction of Large Language Models
by: Song, Xiaoshuai, et al.
Published: (2025)
by: Song, Xiaoshuai, et al.
Published: (2025)
Policy Improvement using Language Feedback Models
by: Zhong, Victor, et al.
Published: (2024)
by: Zhong, Victor, et al.
Published: (2024)
Towards Aligning Language Models with Textual Feedback
by: Lloret, Saüc Abadal, et al.
Published: (2024)
by: Lloret, Saüc Abadal, et al.
Published: (2024)
Teaching Your Models to Understand Code via Focal Preference Alignment
by: Wu, Jie, et al.
Published: (2025)
by: Wu, Jie, et al.
Published: (2025)
Self-Hinting Language Models Enhance Reinforcement Learning
by: Liao, Baohao, et al.
Published: (2026)
by: Liao, Baohao, et al.
Published: (2026)
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
by: Luo, Renjie, et al.
Published: (2025)
by: Luo, Renjie, et al.
Published: (2025)
Multilingual Safety Alignment via Self-Distillation
by: Qin, Ruiyang, et al.
Published: (2026)
by: Qin, Ruiyang, et al.
Published: (2026)
Heterogeneous Value Alignment Evaluation for Large Language Models
by: Zhang, Zhaowei, et al.
Published: (2023)
by: Zhang, Zhaowei, et al.
Published: (2023)
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
by: D'Oosterlinck, Karel, et al.
Published: (2024)
by: D'Oosterlinck, Karel, et al.
Published: (2024)
Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations
by: Luo, Haozheng, et al.
Published: (2026)
by: Luo, Haozheng, et al.
Published: (2026)
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
by: Wang, Xiyao, et al.
Published: (2024)
by: Wang, Xiyao, et al.
Published: (2024)
Reasoning Elicitation in Language Models via Counterfactual Feedback
by: Hüyük, Alihan, et al.
Published: (2024)
by: Hüyük, Alihan, et al.
Published: (2024)
Feedback-Driven Vision-Language Alignment with Minimal Human Supervision
by: Giannone, Giorgio, et al.
Published: (2025)
by: Giannone, Giorgio, et al.
Published: (2025)
Similar Items
-
Understanding Emergent Abilities of Language Models from the Loss Perspective
by: Du, Zhengxiao, et al.
Published: (2024) -
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
by: Cheng, Jiale, et al.
Published: (2024) -
AlignBench: Benchmarking Chinese Alignment of Large Language Models
by: Liu, Xiao, et al.
Published: (2023) -
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models
by: Cheng, Jiale, et al.
Published: (2024) -
Evolutionary Contrastive Distillation for Language Model Alignment
by: Katz-Samuels, Julian, et al.
Published: (2024)