:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Towle, Benjamin, Zhou, Ke
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2410.11009
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction
by: Towle, Benjamin, et al.
Published: (2024)

SeqSAM: Autoregressive Multiple Hypothesis Prediction for Medical Image Segmentation using SAM
by: Towle, Benjamin, et al.
Published: (2025)

Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback
by: Lee, Dong Won, et al.
Published: (2024)

UltraFeedback: Boosting Language Models with Scaled AI Feedback
by: Cui, Ganqu, et al.
Published: (2023)

Closing the Loop: Learning to Generate Writing Feedback via Language Model Simulated Student Revisions
by: Nair, Inderjeet, et al.
Published: (2024)

RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
by: Lee, Harrison, et al.
Published: (2023)

Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings
by: Wu, Yuning, et al.
Published: (2026)

In-context Continual Learning Assisted by an External Continual Learner
by: Momeni, Saleh, et al.
Published: (2024)

Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback
by: Jedidi, Nour, et al.
Published: (2024)

AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation
by: Chakrabarty, Tuhin, et al.
Published: (2025)

LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models
by: Su, Yupeng, et al.
Published: (2024)

LILO: Bayesian Optimization with Natural Language Feedback
by: Kobalczyk, Katarzyna, et al.
Published: (2025)

ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
by: Byun, Ju-Seung, et al.
Published: (2024)

Retrieval Enhanced Feedback via In-context Neural Error-book
by: Hyun, Jongyeop, et al.
Published: (2025)

Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing
by: Saha, Shoumik, et al.
Published: (2025)

A Critical Evaluation of AI Feedback for Aligning Large Language Models
by: Sharma, Archit, et al.
Published: (2024)

AI Knowledge Assist: An Automated Approach for the Creation of Knowledge Bases for Conversational AI Agents
by: Laskar, Md Tahmid Rahman, et al.
Published: (2025)

Enhancing In-Context Learning via Implicit Demonstration Augmentation
by: Zhou, Xiaoling, et al.
Published: (2024)

Implicit In-context Learning
by: Li, Zhuowei, et al.
Published: (2024)

Process Reinforcement through Implicit Rewards
by: Cui, Ganqu, et al.
Published: (2025)

Weaver: Foundation Models for Creative Writing
by: Wang, Tiannan, et al.
Published: (2024)

Personalized Language Modeling from Personalized Human Feedback
by: Li, Xinyu, et al.
Published: (2024)

Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning
by: Zhou, Qinhao, et al.
Published: (2024)

Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
by: Wang, Bo, et al.
Published: (2025)

LASP: Surveying the State-of-the-Art in Large Language Model-Assisted AI Planning
by: Li, Haoming, et al.
Published: (2024)

DavIR: Data Selection via Implicit Reward for Large Language Models
by: Zhou, Haotian, et al.
Published: (2023)

More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives
by: Zhang, Xiaoqing, et al.
Published: (2025)

Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
by: Bandarkar, Lucas, et al.
Published: (2024)

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
by: Dong, Guanting, et al.
Published: (2024)

Are Retrials All You Need? Enhancing Large Language Model Reasoning Without Verbalized Feedback
by: Potamitis, Nearchos, et al.
Published: (2025)

Learning Personalized Agents from Human Feedback
by: Liang, Kaiqu, et al.
Published: (2026)

MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samples
by: Xie, Shuo, et al.
Published: (2024)

One-Shot Safety Alignment for Large Language Models via Optimal Dualization
by: Huang, Xinmeng, et al.
Published: (2024)

More Expressive Attention with Negative Weights
by: Lv, Ang, et al.
Published: (2024)

CHAI for LLMs: Improving Code-Mixed Translation in Large Language Models through Reinforcement Learning with AI Feedback
by: Zhang, Wenbo, et al.
Published: (2024)

RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
by: Wang, Zhilin, et al.
Published: (2025)

Enhancing Retrieval Performance: An Ensemble Approach For Hard Negative Mining
by: Meghwani, Hansa
Published: (2024)

Relation-Aware Network with Attention-Based Loss for Few-Shot Knowledge Graph Completion
by: Qiao, Qiao, et al.
Published: (2023)

Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback
by: Lerner, Emilia Agis, et al.
Published: (2024)

MLSD: A Novel Few-Shot Learning Approach to Enhance Cross-Target and Cross-Domain Stance Detection
by: Gera, Parush, et al.
Published: (2025)