Saved in:
| Main Authors: | Nie, Allen, Chandak, Yash, Yuan, Christina J., Badrinath, Anirudhan, Flet-Berliac, Yannis, Brunskil, Emma |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.17708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier
by: Badrinath, Anirudhan, et al.
Published: (2024)
by: Badrinath, Anirudhan, et al.
Published: (2024)
pyBKT: An Accessible Python Library of Bayesian Knowledge Tracing Models
by: Badrinath, Anirudhan, et al.
Published: (2021)
by: Badrinath, Anirudhan, et al.
Published: (2021)
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
by: Flet-Berliac, Yannis, et al.
Published: (2024)
by: Flet-Berliac, Yannis, et al.
Published: (2024)
The GPT Surprise: Offering Large Language Model Chat in a Massive Coding Class Reduced Engagement but Increased Adopters Exam Performances
by: Nie, Allen, et al.
Published: (2024)
by: Nie, Allen, et al.
Published: (2024)
Averaging log-likelihoods in direct alignment
by: Grinsztajn, Nathan, et al.
Published: (2024)
by: Grinsztajn, Nathan, et al.
Published: (2024)
AttackQA: Development and Adoption of a Dataset for Assisting Cybersecurity Operations using Fine-tuned and Open-Source LLMs
by: Krishna, Varun Badrinath
Published: (2024)
by: Krishna, Varun Badrinath
Published: (2024)
Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models
by: Ramakrishnan, Badrinath, et al.
Published: (2025)
by: Ramakrishnan, Badrinath, et al.
Published: (2025)
Securing AI Agents Against Prompt Injection Attacks
by: Ramakrishnan, Badrinath, et al.
Published: (2025)
by: Ramakrishnan, Badrinath, et al.
Published: (2025)
Short-Long Policy Evaluation with Novel Actions
by: Nam, Hyunji Alex, et al.
Published: (2024)
by: Nam, Hyunji Alex, et al.
Published: (2024)
Answer Matching Outperforms Multiple Choice for Language Model Evaluation
by: Chandak, Nikhil, et al.
Published: (2025)
by: Chandak, Nikhil, et al.
Published: (2025)
Predicting Long Term Sequential Policy Value Using Softer Surrogates
by: Nam, Hyunji, et al.
Published: (2024)
by: Nam, Hyunji, et al.
Published: (2024)
ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting
by: Tian, Yuxing, et al.
Published: (2026)
by: Tian, Yuxing, et al.
Published: (2026)
Evaluating LLMs for Visualization Tasks
by: Khan, Saadiq Rauf, et al.
Published: (2025)
by: Khan, Saadiq Rauf, et al.
Published: (2025)
Who's the (Multi-)Fairest of Them All: Rethinking Interpolation-Based Data Augmentation Through the Lens of Multicalibration
by: Halevy, Karina, et al.
Published: (2024)
by: Halevy, Karina, et al.
Published: (2024)
Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation
by: Wang, Hao, et al.
Published: (2026)
by: Wang, Hao, et al.
Published: (2026)
Evaluating LLMs for Visualization Generation and Understanding
by: Khan, Saadiq Rauf, et al.
Published: (2025)
by: Khan, Saadiq Rauf, et al.
Published: (2025)
Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning
by: Cherukuri, Kalyan, et al.
Published: (2025)
by: Cherukuri, Kalyan, et al.
Published: (2025)
Evaluation-Time Policy Switching for Offline Reinforcement Learning
by: Neggatu, Natinael Solomon, et al.
Published: (2025)
by: Neggatu, Natinael Solomon, et al.
Published: (2025)
Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning
by: Han, Xinchen, et al.
Published: (2026)
by: Han, Xinchen, et al.
Published: (2026)
Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data
by: Madhow, Sunil, et al.
Published: (2023)
by: Madhow, Sunil, et al.
Published: (2023)
Offline Learning and Forgetting for Reasoning with Large Language Models
by: Ni, Tianwei, et al.
Published: (2025)
by: Ni, Tianwei, et al.
Published: (2025)
HIRO: Hierarchical Information Retrieval Optimization
by: Goel, Krish, et al.
Published: (2024)
by: Goel, Krish, et al.
Published: (2024)
OPERA: A Reinforcement Learning--Enhanced Orchestrated Planner-Executor Architecture for Reasoning-Oriented Multi-Hop Retrieval
by: Liu, Yu, et al.
Published: (2025)
by: Liu, Yu, et al.
Published: (2025)
Offline Policy Optimization with Posterior Sampling
by: Lin, Hongqiang, et al.
Published: (2026)
by: Lin, Hongqiang, et al.
Published: (2026)
Importance of Artificial Intelligence in Accounting and Taxation World
by: Khatri, Dr. Sunil Badrinath
Published: (2025)
by: Khatri, Dr. Sunil Badrinath
Published: (2025)
OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation
by: Fang, Haoyang, et al.
Published: (2026)
by: Fang, Haoyang, et al.
Published: (2026)
Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
by: Mou, Zhiyu, et al.
Published: (2025)
by: Mou, Zhiyu, et al.
Published: (2025)
Policy Gradient Methods for Non-Markovian Reinforcement Learning
by: Kar, Avik, et al.
Published: (2026)
by: Kar, Avik, et al.
Published: (2026)
Pessimistic Auxiliary Policy for Offline Reinforcement Learning
by: Zhang, Fan, et al.
Published: (2026)
by: Zhang, Fan, et al.
Published: (2026)
Automatic Reward Shaping from Confounded Offline Data
by: Li, Mingxuan, et al.
Published: (2025)
by: Li, Mingxuan, et al.
Published: (2025)
SimuDICE: Offline Policy Optimization Through World Model Updates and DICE Estimation
by: Brita, Catalin E., et al.
Published: (2024)
by: Brita, Catalin E., et al.
Published: (2024)
Offline Policy Evaluation of Multi-Turn LLM Health Coaching with Real Users
by: Ozolcer, Melik, et al.
Published: (2025)
by: Ozolcer, Melik, et al.
Published: (2025)
Offline Reinforcement Learning: Role of State Aggregation and Trajectory Data
by: Jia, Zeyu, et al.
Published: (2024)
by: Jia, Zeyu, et al.
Published: (2024)
Collaborative Split Federated Learning with Parallel Training and Aggregation
by: Papageorgiou, Yiannis, et al.
Published: (2025)
by: Papageorgiou, Yiannis, et al.
Published: (2025)
Offline Safe Policy Optimization From Heterogeneous Feedback
by: Gong, Ze, et al.
Published: (2025)
by: Gong, Ze, et al.
Published: (2025)
Policy Expansion for Bridging Offline-to-Online Reinforcement Learning
by: Zhang, Haichao, et al.
Published: (2023)
by: Zhang, Haichao, et al.
Published: (2023)
Optimizing Warfarin Dosing Using Contextual Bandit: An Offline Policy Learning and Evaluation Method
by: Huang, Yong, et al.
Published: (2024)
by: Huang, Yong, et al.
Published: (2024)
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
Offline Reinforcement Learning with Generative Trajectory Policies
by: Feng, Xinsong, et al.
Published: (2025)
by: Feng, Xinsong, et al.
Published: (2025)
Federated Offline Policy Optimization with Dual Regularization
by: Yue, Sheng, et al.
Published: (2024)
by: Yue, Sheng, et al.
Published: (2024)
Similar Items
-
Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier
by: Badrinath, Anirudhan, et al.
Published: (2024) -
pyBKT: An Accessible Python Library of Bayesian Knowledge Tracing Models
by: Badrinath, Anirudhan, et al.
Published: (2021) -
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
by: Flet-Berliac, Yannis, et al.
Published: (2024) -
The GPT Surprise: Offering Large Language Model Chat in a Massive Coding Class Reduced Engagement but Increased Adopters Exam Performances
by: Nie, Allen, et al.
Published: (2024) -
Averaging log-likelihoods in direct alignment
by: Grinsztajn, Nathan, et al.
Published: (2024)