:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Barnhart, Logan, Bafghi, Reza Akbarian, Becker, Stephen, Raissi, Maziar
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2503.09025
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

From Centerlines to Hemodynamics: Anisotropic RBF Decoders for Coronary Arteries
by: Bafghi, Reza Akbarian, et al.
Published: (2026)

Test-Driven Agentic Framework for Reliable Robot Controller
by: Tripathi, Shivanshu, et al.
Published: (2026)

MixDiff: Mixing Natural and Synthetic Images for Robust Self-Supervised Representations
by: Bafghi, Reza Akbarian, et al.
Published: (2024)

Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting
by: Bafghi, Reza Akbarian, et al.
Published: (2024)

Fine Tuning without Catastrophic Forgetting via Selective Low Rank Adaptation
by: Bafghi, Reza Akbarian, et al.
Published: (2025)

Where Did Your Model Learn That? Label-free Influence for Self-supervised Learning
by: Harilal, Nidhin, et al.
Published: (2024)

Solving the Inverse Alignment Problem for Efficient RLHF
by: Krishna, Shambhavi, et al.
Published: (2024)

Understanding Tool-Augmented Agents for Lean Formalization: A Factorial Analysis
by: Zhang, Ke, et al.
Published: (2026)

ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback
by: Hou, Zhenyu, et al.
Published: (2024)

Why Is RLHF Alignment Shallow? A Gradient Analysis
by: Young, Robin
Published: (2026)

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
by: Zhang, Yi-Fan, et al.
Published: (2025)

More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
by: Li, Aaron J., et al.
Published: (2024)

MaxMin-RLHF: Alignment with Diverse Human Preferences
by: Chakraborty, Souradip, et al.
Published: (2024)

SenseAI: A Human-in-the-Loop Dataset for RLHF-Aligned Financial Sentiment Reasoning
by: Kabalisa, Berny
Published: (2026)

Deep LPPLS: Forecasting of temporal critical points in natural, engineering and financial systems
by: Nielsen, Joshua, et al.
Published: (2024)

PUNCH: Physics-informed Uncertainty-aware Network for Coronary Hemodynamics
by: Thakur, Sukirt, et al.
Published: (2026)

A Systematic Evaluation of Preference Aggregation in Federated RLHF for Pluralistic Alignment of LLMs
by: Srewa, Mahmoud, et al.
Published: (2025)

Culturally Adaptive Explainable LLM Assessment for Multilingual Information Disorder: A Human-in-the-Loop Approach
by: Jouneghani, Maziar Kianimoghadam
Published: (2026)

RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment
by: Du, Yuhao, et al.
Published: (2025)

Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy
by: Zhu, Yu, et al.
Published: (2024)

PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
by: Ji, Jiaming, et al.
Published: (2024)

Physics-Informed Machine Learning for Smart Additive Manufacturing
by: Sharma, Rahul, et al.
Published: (2024)

Derailing Non-Answers via Logit Suppression at Output Subspace Boundaries in RLHF-Aligned Language Models
by: Dam, Harvey, et al.
Published: (2025)

RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024)

RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods
by: Sharma, Raghav, et al.
Published: (2025)

ELPINN: Eulerian Lagrangian Physics-Informed Neural Network
by: Thakur, Sukirt, et al.
Published: (2025)

Online Optimization with Unknown Time-Varying Parameters from Noisy Gradient Measurements
by: Tripathi, Shivanshu, et al.
Published: (2026)

From RLHF to Direct Alignment: A Theoretical Unification of Preference Learning for Large Language Models
by: Raheja, Tarun, et al.
Published: (2026)

MKJ at SemEval-2026 Task 9: A Comparative Study of Generalist, Specialist, and Ensemble Strategies for Multilingual Polarization
by: Jouneghani, Maziar Kianimoghadam
Published: (2026)

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
by: Hu, Jian, et al.
Published: (2024)

Balanced Actor Initialization: Stable RLHF Training of Distillation-Based Reasoning Models
by: Zheng, Chen, et al.
Published: (2025)

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
by: Yu, Tianyu, et al.
Published: (2023)

Learning Parameterized Nonlinear Elasticity on Curved Surfaces
by: Liu, Yankang, et al.
Published: (2026)

AT-RAG: An Adaptive RAG Model Enhancing Query Efficiency with Topic Filtering and Iterative Reasoning
by: Rezaei, Mohammad Reza, et al.
Published: (2024)

Taming Overconfidence in LLMs: Reward Calibration in RLHF
by: Leng, Jixuan, et al.
Published: (2024)

Failure Modes of Maximum Entropy RLHF
by: Çağatan, Ömer Veysel, et al.
Published: (2025)

Language Models Learn to Mislead Humans via RLHF
by: Wen, Jiaxin, et al.
Published: (2024)

RLHF and IIA: Perverse Incentives
by: Xu, Wanqiao, et al.
Published: (2023)

Reward-Robust RLHF in LLMs
by: Yan, Yuzi, et al.
Published: (2024)

FormalAlign: Automated Alignment Evaluation for Autoformalization
by: Lu, Jianqiao, et al.
Published: (2024)