:: Library Catalog

Imaxe de Portada

Gardado en:

Detalles Bibliográficos
Main Authors:	Guo, Yiju, Cui, Ganqu, Yuan, Lifan, Ding, Ning, Sun, Zexu, Sun, Bowen, Chen, Huimin, Xie, Ruobing, Zhou, Jie, Lin, Yankai, Liu, Zhiyuan, Sun, Maosong
Formato:	Preprint
Publicado:	2024
Subjects:	Computation and Language Artificial Intelligence Systems and Control
Acceso en liña:	https://arxiv.org/abs/2402.19085
Tags:	Engadir etiqueta Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!

Títulos similares

Advancing LLM Reasoning Generalists with Preference Trees
por: Yuan, Lifan, et al.
Publicado: (2024)

Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning
por: Guo, Yiju, et al.
Publicado: (2025)

UltraFeedback: Boosting Language Models with Scaled AI Feedback
por: Cui, Ganqu, et al.
Publicado: (2023)

Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification
por: Guo, Yiju, et al.
Publicado: (2026)

Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models
por: Ding, Ning, et al.
Publicado: (2024)

From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
por: Yuan, Lifan, et al.
Publicado: (2025)

Representation Learning for Natural Language Processing
por: Liu, Zhiyuan, et al.
Publicado: (2020)

The Right Time Matters: Data Arrangement Affects Zero-Shot Generalization in Instruction Tuning
por: He, Bingxiang, et al.
Publicado: (2024)

AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
por: He, Bingxiang, et al.
Publicado: (2025)

RLPR: Extrapolating RLVR to General Domains without Verifiers
por: Yu, Tianyu, et al.
Publicado: (2025)

Free Process Rewards without Process Labels
por: Yuan, Lifan, et al.
Publicado: (2024)

Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment
por: Xu, Wenzhe, et al.
Publicado: (2026)

Exploring the Benefit of Activation Sparsity in Pre-training
por: Zhang, Zhengyan, et al.
Publicado: (2024)

The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training
por: Chen, Weize, et al.
Publicado: (2025)

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
por: Yang, Wenkai, et al.
Publicado: (2025)

Noise Contrastive Alignment of Language Models with Explicit Rewards
por: Chen, Huayu, et al.
Publicado: (2024)

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
por: Yu, Tianyu, et al.
Publicado: (2023)

Empowering Private Tutoring by Chaining Large Language Models
por: Chen, Yulin, et al.
Publicado: (2023)

Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules
por: Xiao, Chaojun, et al.
Publicado: (2023)

Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System
por: Chen, Weize, et al.
Publicado: (2024)

Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication
por: Chen, Weize, et al.
Publicado: (2024)

M$^3$TN: Multi-gate Mixture-of-Experts based Multi-valued Treatment Network for Uplift Modeling
por: Sun, Zexu, et al.
Publicado: (2024)

Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization
por: Fu, Yuhan, et al.
Publicado: (2024)

TTRL: Test-Time Reinforcement Learning
por: Zuo, Yuxin, et al.
Publicado: (2025)

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
por: Zhou, Zhanhui, et al.
Publicado: (2023)

H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
por: Gao, Cheng, et al.
Publicado: (2025)

Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs
por: Gao, Cheng, et al.
Publicado: (2024)

Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention
por: Lv, Xingtai, et al.
Publicado: (2024)

Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
por: Wang, Haoxiang, et al.
Publicado: (2024)

Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents
por: Qian, Cheng, et al.
Publicado: (2024)

Quality-Diversity Optimization as Multi-Objective Optimization
por: Lin, Xi, et al.
Publicado: (2026)

Preference Orchestrator: Prompt-Aware Multi-Objective Alignment for Large Language Models
por: Liu, Biao, et al.
Publicado: (2025)

INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair
por: Wang, Hanbin, et al.
Publicado: (2023)

Self-Play Preference Optimization for Language Model Alignment
por: Wu, Yue, et al.
Publicado: (2024)

Rational Decision-Making Agent with Internalized Utility Judgment
por: Ye, Yining, et al.
Publicado: (2023)

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
por: Xiao, Chaojun, et al.
Publicado: (2024)

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models
por: Agnihotri, Akhil, et al.
Publicado: (2025)

Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts
por: Wang, Xing, et al.
Publicado: (2025)

Process Reinforcement through Implicit Rewards
por: Cui, Ganqu, et al.
Publicado: (2025)

States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly
por: Chen, Junhao, et al.
Publicado: (2024)