:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tao, Leitian, Li, Yixuan
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2409.08813
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Challenges and Future Directions of Data-Centric AI Alignment
by: Yeh, Min-Hsuan, et al.
Published: (2024)

CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement
by: Tao, Leitian, et al.
Published: (2024)

Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis
by: Tao, Leitian, et al.
Published: (2025)

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
by: Li, Ziyue, et al.
Published: (2024)

Improving Weak-to-Strong Generalization with Reliability-Aware Alignment
by: Guo, Yue, et al.
Published: (2024)

Alice: Proactive Learning with Teacher's Demonstrations for Weak-to-Strong Generalization
by: Wu, Shujin, et al.
Published: (2025)

ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
by: Dong, Weilong, et al.
Published: (2024)

Xwin-LM: Strong and Scalable Alignment Practice for LLMs
by: Ni, Bolin, et al.
Published: (2024)

Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher
by: Uzunoglu, Arda, et al.
Published: (2026)

Your Transformer is Secretly Linear
by: Razzhigaev, Anton, et al.
Published: (2024)

It Takes Two: Your GRPO Is Secretly DPO
by: Wu, Yihong, et al.
Published: (2025)

Strong Teacher Not Needed? On Distillation in LLM Pretraining
by: Lu, Taiming, et al.
Published: (2026)

IPO: Your Language Model is Secretly a Preference Classifier
by: Garg, Shivank, et al.
Published: (2025)

Your Language Model Secretly Contains Personality Subnetworks
by: Ye, Ruimeng, et al.
Published: (2026)

Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors
by: Fang, Hao, et al.
Published: (2025)

Is Your LLM Really Mastering the Concept? A Multi-Agent Benchmark
by: Xu, Shuhang, et al.
Published: (2025)

Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)
by: Miyashita, Hisashi
Published: (2026)

LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation
by: Zhang, Xuan, et al.
Published: (2024)

MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
by: Lyu, Yougang, et al.
Published: (2024)

Selective Weak-to-Strong Generalization
by: Lang, Hao, et al.
Published: (2025)

Incentivizing Strong Reasoning from Weak Supervision
by: Yuan, Yige, et al.
Published: (2025)

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
by: Tao, Leitian, et al.
Published: (2025)

Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
by: Ou, Jingyang, et al.
Published: (2024)

Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding
by: Song, Feifan, et al.
Published: (2025)

Your Extreme Multi-label Classifier is Secretly a Hierarchical Text Classifier for Free
by: Bertalis, Nerijus, et al.
Published: (2024)

LargePiG: Your Large Language Model is Secretly a Pointer Generator
by: Sun, Zhongxiang, et al.
Published: (2024)

Weak-to-Strong Reasoning
by: Yang, Yuqing, et al.
Published: (2024)

Debate Helps Weak-to-Strong Generalization
by: Lang, Hao, et al.
Published: (2025)

Weak-to-Strong Jailbreaking on Large Language Models
by: Zhao, Xuandong, et al.
Published: (2024)

RedacBench: Can AI Erase Your Secrets?
by: Jeon, Hyunjun, et al.
Published: (2026)

Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
by: Cheng, Pengyu, et al.
Published: (2023)

Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation
by: Xia, Mingxuan, et al.
Published: (2025)

GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings
by: Tang, Yixuan, et al.
Published: (2025)

Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations
by: Zheng, Brian Siyuan, et al.
Published: (2025)

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
by: Li, Ming, et al.
Published: (2024)

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
by: Yang, Wenkai, et al.
Published: (2024)

Your Absorbing Discrete Diffusion Secretly Models the Bayesian Posterior
by: Doyle, Cooper
Published: (2025)

Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
by: Zhu, Wenhong, et al.
Published: (2024)

Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation
by: Zhang, Hengyuan, et al.
Published: (2025)

The Era of Real-World Human Interaction: RL from User Conversations
by: Jin, Chuanyang, et al.
Published: (2025)