Saved in:
| Main Authors: | Tao, Leitian, Li, Yixuan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.08813 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Challenges and Future Directions of Data-Centric AI Alignment
by: Yeh, Min-Hsuan, et al.
Published: (2024)
by: Yeh, Min-Hsuan, et al.
Published: (2024)
CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement
by: Tao, Leitian, et al.
Published: (2024)
by: Tao, Leitian, et al.
Published: (2024)
Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis
by: Tao, Leitian, et al.
Published: (2025)
by: Tao, Leitian, et al.
Published: (2025)
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
by: Li, Ziyue, et al.
Published: (2024)
by: Li, Ziyue, et al.
Published: (2024)
Improving Weak-to-Strong Generalization with Reliability-Aware Alignment
by: Guo, Yue, et al.
Published: (2024)
by: Guo, Yue, et al.
Published: (2024)
Alice: Proactive Learning with Teacher's Demonstrations for Weak-to-Strong Generalization
by: Wu, Shujin, et al.
Published: (2025)
by: Wu, Shujin, et al.
Published: (2025)
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
by: Dong, Weilong, et al.
Published: (2024)
by: Dong, Weilong, et al.
Published: (2024)
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
by: Ni, Bolin, et al.
Published: (2024)
by: Ni, Bolin, et al.
Published: (2024)
Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher
by: Uzunoglu, Arda, et al.
Published: (2026)
by: Uzunoglu, Arda, et al.
Published: (2026)
Your Transformer is Secretly Linear
by: Razzhigaev, Anton, et al.
Published: (2024)
by: Razzhigaev, Anton, et al.
Published: (2024)
It Takes Two: Your GRPO Is Secretly DPO
by: Wu, Yihong, et al.
Published: (2025)
by: Wu, Yihong, et al.
Published: (2025)
Strong Teacher Not Needed? On Distillation in LLM Pretraining
by: Lu, Taiming, et al.
Published: (2026)
by: Lu, Taiming, et al.
Published: (2026)
IPO: Your Language Model is Secretly a Preference Classifier
by: Garg, Shivank, et al.
Published: (2025)
by: Garg, Shivank, et al.
Published: (2025)
Your Language Model Secretly Contains Personality Subnetworks
by: Ye, Ruimeng, et al.
Published: (2026)
by: Ye, Ruimeng, et al.
Published: (2026)
Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors
by: Fang, Hao, et al.
Published: (2025)
by: Fang, Hao, et al.
Published: (2025)
Is Your LLM Really Mastering the Concept? A Multi-Agent Benchmark
by: Xu, Shuhang, et al.
Published: (2025)
by: Xu, Shuhang, et al.
Published: (2025)
Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)
by: Miyashita, Hisashi
Published: (2026)
by: Miyashita, Hisashi
Published: (2026)
LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation
by: Zhang, Xuan, et al.
Published: (2024)
by: Zhang, Xuan, et al.
Published: (2024)
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
by: Lyu, Yougang, et al.
Published: (2024)
by: Lyu, Yougang, et al.
Published: (2024)
Selective Weak-to-Strong Generalization
by: Lang, Hao, et al.
Published: (2025)
by: Lang, Hao, et al.
Published: (2025)
Incentivizing Strong Reasoning from Weak Supervision
by: Yuan, Yige, et al.
Published: (2025)
by: Yuan, Yige, et al.
Published: (2025)
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
by: Tao, Leitian, et al.
Published: (2025)
by: Tao, Leitian, et al.
Published: (2025)
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
by: Ou, Jingyang, et al.
Published: (2024)
by: Ou, Jingyang, et al.
Published: (2024)
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding
by: Song, Feifan, et al.
Published: (2025)
by: Song, Feifan, et al.
Published: (2025)
Your Extreme Multi-label Classifier is Secretly a Hierarchical Text Classifier for Free
by: Bertalis, Nerijus, et al.
Published: (2024)
by: Bertalis, Nerijus, et al.
Published: (2024)
LargePiG: Your Large Language Model is Secretly a Pointer Generator
by: Sun, Zhongxiang, et al.
Published: (2024)
by: Sun, Zhongxiang, et al.
Published: (2024)
Weak-to-Strong Reasoning
by: Yang, Yuqing, et al.
Published: (2024)
by: Yang, Yuqing, et al.
Published: (2024)
Debate Helps Weak-to-Strong Generalization
by: Lang, Hao, et al.
Published: (2025)
by: Lang, Hao, et al.
Published: (2025)
Weak-to-Strong Jailbreaking on Large Language Models
by: Zhao, Xuandong, et al.
Published: (2024)
by: Zhao, Xuandong, et al.
Published: (2024)
RedacBench: Can AI Erase Your Secrets?
by: Jeon, Hyunjun, et al.
Published: (2026)
by: Jeon, Hyunjun, et al.
Published: (2026)
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
by: Cheng, Pengyu, et al.
Published: (2023)
by: Cheng, Pengyu, et al.
Published: (2023)
Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation
by: Xia, Mingxuan, et al.
Published: (2025)
by: Xia, Mingxuan, et al.
Published: (2025)
GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings
by: Tang, Yixuan, et al.
Published: (2025)
by: Tang, Yixuan, et al.
Published: (2025)
Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations
by: Zheng, Brian Siyuan, et al.
Published: (2025)
by: Zheng, Brian Siyuan, et al.
Published: (2025)
Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
by: Li, Ming, et al.
Published: (2024)
by: Li, Ming, et al.
Published: (2024)
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
by: Yang, Wenkai, et al.
Published: (2024)
by: Yang, Wenkai, et al.
Published: (2024)
Your Absorbing Discrete Diffusion Secretly Models the Bayesian Posterior
by: Doyle, Cooper
Published: (2025)
by: Doyle, Cooper
Published: (2025)
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
by: Zhu, Wenhong, et al.
Published: (2024)
by: Zhu, Wenhong, et al.
Published: (2024)
Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation
by: Zhang, Hengyuan, et al.
Published: (2025)
by: Zhang, Hengyuan, et al.
Published: (2025)
The Era of Real-World Human Interaction: RL from User Conversations
by: Jin, Chuanyang, et al.
Published: (2025)
by: Jin, Chuanyang, et al.
Published: (2025)
Similar Items
-
Challenges and Future Directions of Data-Centric AI Alignment
by: Yeh, Min-Hsuan, et al.
Published: (2024) -
CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement
by: Tao, Leitian, et al.
Published: (2024) -
Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis
by: Tao, Leitian, et al.
Published: (2025) -
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
by: Li, Ziyue, et al.
Published: (2024) -
Improving Weak-to-Strong Generalization with Reliability-Aware Alignment
by: Guo, Yue, et al.
Published: (2024)