:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Sharma, Raghav, Mehta, Manan, Raina, Sai Tiger
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Machine Learning Artificial Intelligence Computation and Language
Accesso online:	https://arxiv.org/abs/2511.03939
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Small Language Models for Agentic Systems: A Survey of Architectures, Capabilities, and Deployment Trade offs
di: Sharma, Raghav, et al.
Pubblicazione: (2025)

Adaptive and Explainable AI Agents for Anomaly Detection in Critical IoT Infrastructure using LLM-Enhanced Contextual Reasoning
di: Sharma, Raghav, et al.
Pubblicazione: (2025)

MaxMin-RLHF: Alignment with Diverse Human Preferences
di: Chakraborty, Souradip, et al.
Pubblicazione: (2024)

RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment
di: Du, Yuhao, et al.
Pubblicazione: (2025)

Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy
di: Zhu, Yu, et al.
Pubblicazione: (2024)

RLHF Workflow: From Reward Modeling to Online RLHF
di: Dong, Hanze, et al.
Pubblicazione: (2024)

Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison
di: Shen, Judy Hanwen, et al.
Pubblicazione: (2024)

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
di: Hu, Jian, et al.
Pubblicazione: (2024)

RLHF and IIA: Perverse Incentives
di: Xu, Wanqiao, et al.
Pubblicazione: (2023)

Reward-Robust RLHF in LLMs
di: Yan, Yuzi, et al.
Pubblicazione: (2024)

Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward
di: Chavan, Arnav, et al.
Pubblicazione: (2024)

Large Multimodal Models for Low-Resource Languages: A Survey
di: Lupascu, Marian, et al.
Pubblicazione: (2025)

Continual SFT Matches Multimodal RLHF with Negative Supervision
di: Zhu, Ke, et al.
Pubblicazione: (2024)

Reward Model Overoptimisation in Iterated RLHF
di: Wolf, Lorenz, et al.
Pubblicazione: (2025)

How to Evaluate Reward Models for RLHF
di: Frick, Evan, et al.
Pubblicazione: (2024)

Dataset Reset Policy Optimization for RLHF
di: Chang, Jonathan D., et al.
Pubblicazione: (2024)

When Every Token Counts: Optimal Segmentation for Low-Resource Language Models
di: Raj, Bharath, et al.
Pubblicazione: (2024)

CXMArena: Unified Dataset to benchmark performance in realistic CXM Scenarios
di: Garg, Raghav, et al.
Pubblicazione: (2025)

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations
di: Gupta, Manan, et al.
Pubblicazione: (2026)

Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering
di: Gupta, Manan, et al.
Pubblicazione: (2026)

RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
di: Liang, Kaiqu, et al.
Pubblicazione: (2025)

Information-Theoretic Reward Decomposition for Generalizable RLHF
di: Mao, Liyuan, et al.
Pubblicazione: (2025)

General Exploratory Bonus for Optimistic Exploration in RLHF
di: Li, Wendi, et al.
Pubblicazione: (2025)

Reward Shaping to Mitigate Reward Hacking in RLHF
di: Fu, Jiayi, et al.
Pubblicazione: (2025)

Active Preference Optimization for Sample Efficient RLHF
di: Das, Nirjhar, et al.
Pubblicazione: (2024)

Quantile Regression for Distributional Reward Models in RLHF
di: Dorka, Nicolai
Pubblicazione: (2024)

The Perfect Blend: Redefining RLHF with Mixture of Judges
di: Xu, Tengyu, et al.
Pubblicazione: (2024)

ODIN: Disentangled Reward Mitigates Hacking in RLHF
di: Chen, Lichang, et al.
Pubblicazione: (2024)

Understanding the Effects of RLHF on LLM Generalisation and Diversity
di: Kirk, Robert, et al.
Pubblicazione: (2023)

WPO: Enhancing RLHF with Weighted Preference Optimization
di: Zhou, Wenxuan, et al.
Pubblicazione: (2024)

MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models
di: Saha, Partha Pratim, et al.
Pubblicazione: (2026)

zFLoRA: Zero-Latency Fused Low-Rank Adapters
di: Gowda, Dhananjaya, et al.
Pubblicazione: (2025)

Adaptive Margin RLHF via Preference over Preferences
di: Chittepu, Yaswanth, et al.
Pubblicazione: (2025)

DPO Meets PPO: Reinforced Token Optimization for RLHF
di: Zhong, Han, et al.
Pubblicazione: (2024)

An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training
di: Xiao, Youshao, et al.
Pubblicazione: (2023)

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
di: Zhu, Banghua, et al.
Pubblicazione: (2024)

Mitigating Reward Hacking in RLHF via Advantage Sign Robustness
di: Ono, Shinnosuke, et al.
Pubblicazione: (2026)

It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
di: Lu, Taiming, et al.
Pubblicazione: (2024)

Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning
di: Ponkshe, Kaustubh, et al.
Pubblicazione: (2024)

Reward Generalization in RLHF: A Topological Perspective
di: Qiu, Tianyi, et al.
Pubblicazione: (2024)