:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lin, Yen-Ting, Jin, Di, Xu, Tengyu, Wu, Tianhao, Sukhbaatar, Sainbayar, Zhu, Chen, He, Yun, Chen, Yun-Nung, Weston, Jason, Tian, Yuandong, Rahnama, Arash, Wang, Sinong, Ma, Hao, Fang, Han
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2501.10799
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

StepWiser: Stepwise Generative Judges for Wiser Reasoning
by: Xiong, Wei, et al.
Published: (2025)

Training Large Language Models to Reason in a Continuous Latent Space
by: Hao, Shibo, et al.
Published: (2024)

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
by: Zhou, Yifei, et al.
Published: (2025)

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
by: Wu, Tianhao, et al.
Published: (2024)

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
by: Su, DiJia, et al.
Published: (2024)

Multi-Token Attention
by: Golovneva, Olga, et al.
Published: (2025)

Contextual Position Encoding: Learning to Count What's Important
by: Golovneva, Olga, et al.
Published: (2024)

Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss
by: Xu, Jing, et al.
Published: (2023)

Thinking LLMs: General Instruction Following with Thought Generation
by: Wu, Tianhao, et al.
Published: (2024)

Reverse Training to Nurse the Reversal Curse
by: Golovneva, Olga, et al.
Published: (2024)

Iterative Reasoning Preference Optimization
by: Pang, Richard Yuanzhe, et al.
Published: (2024)

R.I.P.: Better Models by Survival of the Fittest Prompts
by: Yu, Ping, et al.
Published: (2025)

Self-Challenging Language Model Agents
by: Zhou, Yifei, et al.
Published: (2025)

Diverse Preference Optimization
by: Lanchantin, Jack, et al.
Published: (2025)

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
by: Lehnert, Lucas, et al.
Published: (2024)

Following Length Constraints in Instructions
by: Yuan, Weizhe, et al.
Published: (2024)

Adaptive Decoding via Latent Preference Optimization
by: Dhuliawala, Shehzaad, et al.
Published: (2024)

Boosting LLM Reasoning via Spontaneous Self-Correction
by: Zhao, Xutong, et al.
Published: (2025)

Self-Rewarding Language Models
by: Yuan, Weizhe, et al.
Published: (2024)

SPICE: Self-Play In Corpus Environments Improves Reasoning
by: Liu, Bo, et al.
Published: (2025)

TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories
by: Chen, Yen-Shan, et al.
Published: (2026)

Injecting Salesperson's Dialogue Strategies in Large Language Models with Chain-of-Thought Reasoning
by: Chang, Wen-Yu, et al.
Published: (2024)

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
by: Yu, Ping, et al.
Published: (2025)

Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
by: Yu, Zishun, et al.
Published: (2025)

VisTW: Benchmarking Vision-Language Models for Traditional Chinese in Taiwan
by: Tam, Zhi Rui, et al.
Published: (2025)

Measuring Taiwanese Mandarin Language Understanding
by: Chen, Po-Heng, et al.
Published: (2024)

Self-Consistency Preference Optimization
by: Prasad, Archiki, et al.
Published: (2024)

LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation
by: Chen, Yen-Shan, et al.
Published: (2024)

Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning
by: Lu, Zimu, et al.
Published: (2024)

Stochastic activations
by: Lomeli, Maria, et al.
Published: (2025)

MedVoiceBias: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making
by: Tam, Zhi Rui, et al.
Published: (2025)

InstUPR : Instruction-based Unsupervised Passage Reranking with Large Language Models
by: Huang, Chao-Wei, et al.
Published: (2024)

PairDistill: Pairwise Relevance Distillation for Dense Retrieval
by: Huang, Chao-Wei, et al.
Published: (2024)

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
by: Peng, Ji-Lun, et al.
Published: (2026)

Balancing Knowledge Delivery and Emotional Comfort in Healthcare Conversational Systems
by: Tsai, Shang-Chi, et al.
Published: (2025)

Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models
by: Kao, Chang-Sheng, et al.
Published: (2024)

FactAlign: Long-form Factuality Alignment of Large Language Models
by: Huang, Chao-Wei, et al.
Published: (2024)

Expected Harm: Rethinking Safety Evaluation of (Mis)Aligned LLMs
by: Chen, Yen-Shan, et al.
Published: (2026)

Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors
by: Chen, Yen-Shan, et al.
Published: (2025)

Reinforcement Learning from User Feedback
by: Han, Eric, et al.
Published: (2025)