Saved in:
| Main Authors: | Towle, Benjamin, Zhou, Ke |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.11009 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction
by: Towle, Benjamin, et al.
Published: (2024)
by: Towle, Benjamin, et al.
Published: (2024)
SeqSAM: Autoregressive Multiple Hypothesis Prediction for Medical Image Segmentation using SAM
by: Towle, Benjamin, et al.
Published: (2025)
by: Towle, Benjamin, et al.
Published: (2025)
Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback
by: Lee, Dong Won, et al.
Published: (2024)
by: Lee, Dong Won, et al.
Published: (2024)
UltraFeedback: Boosting Language Models with Scaled AI Feedback
by: Cui, Ganqu, et al.
Published: (2023)
by: Cui, Ganqu, et al.
Published: (2023)
Closing the Loop: Learning to Generate Writing Feedback via Language Model Simulated Student Revisions
by: Nair, Inderjeet, et al.
Published: (2024)
by: Nair, Inderjeet, et al.
Published: (2024)
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
by: Lee, Harrison, et al.
Published: (2023)
by: Lee, Harrison, et al.
Published: (2023)
Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings
by: Wu, Yuning, et al.
Published: (2026)
by: Wu, Yuning, et al.
Published: (2026)
In-context Continual Learning Assisted by an External Continual Learner
by: Momeni, Saleh, et al.
Published: (2024)
by: Momeni, Saleh, et al.
Published: (2024)
Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback
by: Jedidi, Nour, et al.
Published: (2024)
by: Jedidi, Nour, et al.
Published: (2024)
AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation
by: Chakrabarty, Tuhin, et al.
Published: (2025)
by: Chakrabarty, Tuhin, et al.
Published: (2025)
LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models
by: Su, Yupeng, et al.
Published: (2024)
by: Su, Yupeng, et al.
Published: (2024)
LILO: Bayesian Optimization with Natural Language Feedback
by: Kobalczyk, Katarzyna, et al.
Published: (2025)
by: Kobalczyk, Katarzyna, et al.
Published: (2025)
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
by: Byun, Ju-Seung, et al.
Published: (2024)
by: Byun, Ju-Seung, et al.
Published: (2024)
Retrieval Enhanced Feedback via In-context Neural Error-book
by: Hyun, Jongyeop, et al.
Published: (2025)
by: Hyun, Jongyeop, et al.
Published: (2025)
Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing
by: Saha, Shoumik, et al.
Published: (2025)
by: Saha, Shoumik, et al.
Published: (2025)
A Critical Evaluation of AI Feedback for Aligning Large Language Models
by: Sharma, Archit, et al.
Published: (2024)
by: Sharma, Archit, et al.
Published: (2024)
AI Knowledge Assist: An Automated Approach for the Creation of Knowledge Bases for Conversational AI Agents
by: Laskar, Md Tahmid Rahman, et al.
Published: (2025)
by: Laskar, Md Tahmid Rahman, et al.
Published: (2025)
Enhancing In-Context Learning via Implicit Demonstration Augmentation
by: Zhou, Xiaoling, et al.
Published: (2024)
by: Zhou, Xiaoling, et al.
Published: (2024)
Implicit In-context Learning
by: Li, Zhuowei, et al.
Published: (2024)
by: Li, Zhuowei, et al.
Published: (2024)
Process Reinforcement through Implicit Rewards
by: Cui, Ganqu, et al.
Published: (2025)
by: Cui, Ganqu, et al.
Published: (2025)
Weaver: Foundation Models for Creative Writing
by: Wang, Tiannan, et al.
Published: (2024)
by: Wang, Tiannan, et al.
Published: (2024)
Personalized Language Modeling from Personalized Human Feedback
by: Li, Xinyu, et al.
Published: (2024)
by: Li, Xinyu, et al.
Published: (2024)
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning
by: Zhou, Qinhao, et al.
Published: (2024)
by: Zhou, Qinhao, et al.
Published: (2024)
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
by: Wang, Bo, et al.
Published: (2025)
by: Wang, Bo, et al.
Published: (2025)
LASP: Surveying the State-of-the-Art in Large Language Model-Assisted AI Planning
by: Li, Haoming, et al.
Published: (2024)
by: Li, Haoming, et al.
Published: (2024)
DavIR: Data Selection via Implicit Reward for Large Language Models
by: Zhou, Haotian, et al.
Published: (2023)
by: Zhou, Haotian, et al.
Published: (2023)
More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives
by: Zhang, Xiaoqing, et al.
Published: (2025)
by: Zhang, Xiaoqing, et al.
Published: (2025)
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
by: Bandarkar, Lucas, et al.
Published: (2024)
by: Bandarkar, Lucas, et al.
Published: (2024)
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
by: Dong, Guanting, et al.
Published: (2024)
by: Dong, Guanting, et al.
Published: (2024)
Are Retrials All You Need? Enhancing Large Language Model Reasoning Without Verbalized Feedback
by: Potamitis, Nearchos, et al.
Published: (2025)
by: Potamitis, Nearchos, et al.
Published: (2025)
Learning Personalized Agents from Human Feedback
by: Liang, Kaiqu, et al.
Published: (2026)
by: Liang, Kaiqu, et al.
Published: (2026)
MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samples
by: Xie, Shuo, et al.
Published: (2024)
by: Xie, Shuo, et al.
Published: (2024)
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
by: Huang, Xinmeng, et al.
Published: (2024)
by: Huang, Xinmeng, et al.
Published: (2024)
More Expressive Attention with Negative Weights
by: Lv, Ang, et al.
Published: (2024)
by: Lv, Ang, et al.
Published: (2024)
CHAI for LLMs: Improving Code-Mixed Translation in Large Language Models through Reinforcement Learning with AI Feedback
by: Zhang, Wenbo, et al.
Published: (2024)
by: Zhang, Wenbo, et al.
Published: (2024)
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
by: Wang, Zhilin, et al.
Published: (2025)
by: Wang, Zhilin, et al.
Published: (2025)
Enhancing Retrieval Performance: An Ensemble Approach For Hard Negative Mining
by: Meghwani, Hansa
Published: (2024)
by: Meghwani, Hansa
Published: (2024)
Relation-Aware Network with Attention-Based Loss for Few-Shot Knowledge Graph Completion
by: Qiao, Qiao, et al.
Published: (2023)
by: Qiao, Qiao, et al.
Published: (2023)
Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback
by: Lerner, Emilia Agis, et al.
Published: (2024)
by: Lerner, Emilia Agis, et al.
Published: (2024)
MLSD: A Novel Few-Shot Learning Approach to Enhance Cross-Target and Cross-Domain Stance Detection
by: Gera, Parush, et al.
Published: (2025)
by: Gera, Parush, et al.
Published: (2025)
Similar Items
-
SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction
by: Towle, Benjamin, et al.
Published: (2024) -
SeqSAM: Autoregressive Multiple Hypothesis Prediction for Medical Image Segmentation using SAM
by: Towle, Benjamin, et al.
Published: (2025) -
Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback
by: Lee, Dong Won, et al.
Published: (2024) -
UltraFeedback: Boosting Language Models with Scaled AI Feedback
by: Cui, Ganqu, et al.
Published: (2023) -
Closing the Loop: Learning to Generate Writing Feedback via Language Model Simulated Student Revisions
by: Nair, Inderjeet, et al.
Published: (2024)