Gardado en:
| Main Authors: | Guo, Yiju, Cui, Ganqu, Yuan, Lifan, Ding, Ning, Sun, Zexu, Sun, Bowen, Chen, Huimin, Xie, Ruobing, Zhou, Jie, Lin, Yankai, Liu, Zhiyuan, Sun, Maosong |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Subjects: | |
| Acceso en liña: | https://arxiv.org/abs/2402.19085 |
| Tags: |
Engadir etiqueta
Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!
|
Títulos similares
Advancing LLM Reasoning Generalists with Preference Trees
por: Yuan, Lifan, et al.
Publicado: (2024)
por: Yuan, Lifan, et al.
Publicado: (2024)
Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning
por: Guo, Yiju, et al.
Publicado: (2025)
por: Guo, Yiju, et al.
Publicado: (2025)
UltraFeedback: Boosting Language Models with Scaled AI Feedback
por: Cui, Ganqu, et al.
Publicado: (2023)
por: Cui, Ganqu, et al.
Publicado: (2023)
Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification
por: Guo, Yiju, et al.
Publicado: (2026)
por: Guo, Yiju, et al.
Publicado: (2026)
Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models
por: Ding, Ning, et al.
Publicado: (2024)
por: Ding, Ning, et al.
Publicado: (2024)
From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
por: Yuan, Lifan, et al.
Publicado: (2025)
por: Yuan, Lifan, et al.
Publicado: (2025)
Representation Learning for Natural Language Processing
por: Liu, Zhiyuan, et al.
Publicado: (2020)
por: Liu, Zhiyuan, et al.
Publicado: (2020)
The Right Time Matters: Data Arrangement Affects Zero-Shot Generalization in Instruction Tuning
por: He, Bingxiang, et al.
Publicado: (2024)
por: He, Bingxiang, et al.
Publicado: (2024)
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
por: He, Bingxiang, et al.
Publicado: (2025)
por: He, Bingxiang, et al.
Publicado: (2025)
RLPR: Extrapolating RLVR to General Domains without Verifiers
por: Yu, Tianyu, et al.
Publicado: (2025)
por: Yu, Tianyu, et al.
Publicado: (2025)
Free Process Rewards without Process Labels
por: Yuan, Lifan, et al.
Publicado: (2024)
por: Yuan, Lifan, et al.
Publicado: (2024)
Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment
por: Xu, Wenzhe, et al.
Publicado: (2026)
por: Xu, Wenzhe, et al.
Publicado: (2026)
Exploring the Benefit of Activation Sparsity in Pre-training
por: Zhang, Zhengyan, et al.
Publicado: (2024)
por: Zhang, Zhengyan, et al.
Publicado: (2024)
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training
por: Chen, Weize, et al.
Publicado: (2025)
por: Chen, Weize, et al.
Publicado: (2025)
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
por: Yang, Wenkai, et al.
Publicado: (2025)
por: Yang, Wenkai, et al.
Publicado: (2025)
Noise Contrastive Alignment of Language Models with Explicit Rewards
por: Chen, Huayu, et al.
Publicado: (2024)
por: Chen, Huayu, et al.
Publicado: (2024)
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
por: Yu, Tianyu, et al.
Publicado: (2023)
por: Yu, Tianyu, et al.
Publicado: (2023)
Empowering Private Tutoring by Chaining Large Language Models
por: Chen, Yulin, et al.
Publicado: (2023)
por: Chen, Yulin, et al.
Publicado: (2023)
Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules
por: Xiao, Chaojun, et al.
Publicado: (2023)
por: Xiao, Chaojun, et al.
Publicado: (2023)
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System
por: Chen, Weize, et al.
Publicado: (2024)
por: Chen, Weize, et al.
Publicado: (2024)
Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication
por: Chen, Weize, et al.
Publicado: (2024)
por: Chen, Weize, et al.
Publicado: (2024)
M$^3$TN: Multi-gate Mixture-of-Experts based Multi-valued Treatment Network for Uplift Modeling
por: Sun, Zexu, et al.
Publicado: (2024)
por: Sun, Zexu, et al.
Publicado: (2024)
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization
por: Fu, Yuhan, et al.
Publicado: (2024)
por: Fu, Yuhan, et al.
Publicado: (2024)
TTRL: Test-Time Reinforcement Learning
por: Zuo, Yuxin, et al.
Publicado: (2025)
por: Zuo, Yuxin, et al.
Publicado: (2025)
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
por: Zhou, Zhanhui, et al.
Publicado: (2023)
por: Zhou, Zhanhui, et al.
Publicado: (2023)
H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
por: Gao, Cheng, et al.
Publicado: (2025)
por: Gao, Cheng, et al.
Publicado: (2025)
Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs
por: Gao, Cheng, et al.
Publicado: (2024)
por: Gao, Cheng, et al.
Publicado: (2024)
Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention
por: Lv, Xingtai, et al.
Publicado: (2024)
por: Lv, Xingtai, et al.
Publicado: (2024)
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
por: Wang, Haoxiang, et al.
Publicado: (2024)
por: Wang, Haoxiang, et al.
Publicado: (2024)
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents
por: Qian, Cheng, et al.
Publicado: (2024)
por: Qian, Cheng, et al.
Publicado: (2024)
Quality-Diversity Optimization as Multi-Objective Optimization
por: Lin, Xi, et al.
Publicado: (2026)
por: Lin, Xi, et al.
Publicado: (2026)
Preference Orchestrator: Prompt-Aware Multi-Objective Alignment for Large Language Models
por: Liu, Biao, et al.
Publicado: (2025)
por: Liu, Biao, et al.
Publicado: (2025)
INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair
por: Wang, Hanbin, et al.
Publicado: (2023)
por: Wang, Hanbin, et al.
Publicado: (2023)
Self-Play Preference Optimization for Language Model Alignment
por: Wu, Yue, et al.
Publicado: (2024)
por: Wu, Yue, et al.
Publicado: (2024)
Rational Decision-Making Agent with Internalized Utility Judgment
por: Ye, Yining, et al.
Publicado: (2023)
por: Ye, Yining, et al.
Publicado: (2023)
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
por: Xiao, Chaojun, et al.
Publicado: (2024)
por: Xiao, Chaojun, et al.
Publicado: (2024)
Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models
por: Agnihotri, Akhil, et al.
Publicado: (2025)
por: Agnihotri, Akhil, et al.
Publicado: (2025)
Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts
por: Wang, Xing, et al.
Publicado: (2025)
por: Wang, Xing, et al.
Publicado: (2025)
Process Reinforcement through Implicit Rewards
por: Cui, Ganqu, et al.
Publicado: (2025)
por: Cui, Ganqu, et al.
Publicado: (2025)
States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly
por: Chen, Junhao, et al.
Publicado: (2024)
por: Chen, Junhao, et al.
Publicado: (2024)
Títulos similares
-
Advancing LLM Reasoning Generalists with Preference Trees
por: Yuan, Lifan, et al.
Publicado: (2024) -
Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning
por: Guo, Yiju, et al.
Publicado: (2025) -
UltraFeedback: Boosting Language Models with Scaled AI Feedback
por: Cui, Ganqu, et al.
Publicado: (2023) -
Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification
por: Guo, Yiju, et al.
Publicado: (2026) -
Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models
por: Ding, Ning, et al.
Publicado: (2024)