Saved in:
| Main Authors: | Lin, Yen-Ting, Jin, Di, Xu, Tengyu, Wu, Tianhao, Sukhbaatar, Sainbayar, Zhu, Chen, He, Yun, Chen, Yun-Nung, Weston, Jason, Tian, Yuandong, Rahnama, Arash, Wang, Sinong, Ma, Hao, Fang, Han |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.10799 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
StepWiser: Stepwise Generative Judges for Wiser Reasoning
by: Xiong, Wei, et al.
Published: (2025)
by: Xiong, Wei, et al.
Published: (2025)
Training Large Language Models to Reason in a Continuous Latent Space
by: Hao, Shibo, et al.
Published: (2024)
by: Hao, Shibo, et al.
Published: (2024)
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
by: Zhou, Yifei, et al.
Published: (2025)
by: Zhou, Yifei, et al.
Published: (2025)
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
by: Wu, Tianhao, et al.
Published: (2024)
by: Wu, Tianhao, et al.
Published: (2024)
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
by: Su, DiJia, et al.
Published: (2024)
by: Su, DiJia, et al.
Published: (2024)
Multi-Token Attention
by: Golovneva, Olga, et al.
Published: (2025)
by: Golovneva, Olga, et al.
Published: (2025)
Contextual Position Encoding: Learning to Count What's Important
by: Golovneva, Olga, et al.
Published: (2024)
by: Golovneva, Olga, et al.
Published: (2024)
Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss
by: Xu, Jing, et al.
Published: (2023)
by: Xu, Jing, et al.
Published: (2023)
Thinking LLMs: General Instruction Following with Thought Generation
by: Wu, Tianhao, et al.
Published: (2024)
by: Wu, Tianhao, et al.
Published: (2024)
Reverse Training to Nurse the Reversal Curse
by: Golovneva, Olga, et al.
Published: (2024)
by: Golovneva, Olga, et al.
Published: (2024)
Iterative Reasoning Preference Optimization
by: Pang, Richard Yuanzhe, et al.
Published: (2024)
by: Pang, Richard Yuanzhe, et al.
Published: (2024)
R.I.P.: Better Models by Survival of the Fittest Prompts
by: Yu, Ping, et al.
Published: (2025)
by: Yu, Ping, et al.
Published: (2025)
Self-Challenging Language Model Agents
by: Zhou, Yifei, et al.
Published: (2025)
by: Zhou, Yifei, et al.
Published: (2025)
Diverse Preference Optimization
by: Lanchantin, Jack, et al.
Published: (2025)
by: Lanchantin, Jack, et al.
Published: (2025)
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
by: Lehnert, Lucas, et al.
Published: (2024)
by: Lehnert, Lucas, et al.
Published: (2024)
Following Length Constraints in Instructions
by: Yuan, Weizhe, et al.
Published: (2024)
by: Yuan, Weizhe, et al.
Published: (2024)
Adaptive Decoding via Latent Preference Optimization
by: Dhuliawala, Shehzaad, et al.
Published: (2024)
by: Dhuliawala, Shehzaad, et al.
Published: (2024)
Boosting LLM Reasoning via Spontaneous Self-Correction
by: Zhao, Xutong, et al.
Published: (2025)
by: Zhao, Xutong, et al.
Published: (2025)
Self-Rewarding Language Models
by: Yuan, Weizhe, et al.
Published: (2024)
by: Yuan, Weizhe, et al.
Published: (2024)
SPICE: Self-Play In Corpus Environments Improves Reasoning
by: Liu, Bo, et al.
Published: (2025)
by: Liu, Bo, et al.
Published: (2025)
TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories
by: Chen, Yen-Shan, et al.
Published: (2026)
by: Chen, Yen-Shan, et al.
Published: (2026)
Injecting Salesperson's Dialogue Strategies in Large Language Models with Chain-of-Thought Reasoning
by: Chang, Wen-Yu, et al.
Published: (2024)
by: Chang, Wen-Yu, et al.
Published: (2024)
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
by: Yu, Ping, et al.
Published: (2025)
by: Yu, Ping, et al.
Published: (2025)
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
by: Yu, Zishun, et al.
Published: (2025)
by: Yu, Zishun, et al.
Published: (2025)
VisTW: Benchmarking Vision-Language Models for Traditional Chinese in Taiwan
by: Tam, Zhi Rui, et al.
Published: (2025)
by: Tam, Zhi Rui, et al.
Published: (2025)
Measuring Taiwanese Mandarin Language Understanding
by: Chen, Po-Heng, et al.
Published: (2024)
by: Chen, Po-Heng, et al.
Published: (2024)
Self-Consistency Preference Optimization
by: Prasad, Archiki, et al.
Published: (2024)
by: Prasad, Archiki, et al.
Published: (2024)
LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation
by: Chen, Yen-Shan, et al.
Published: (2024)
by: Chen, Yen-Shan, et al.
Published: (2024)
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning
by: Lu, Zimu, et al.
Published: (2024)
by: Lu, Zimu, et al.
Published: (2024)
Stochastic activations
by: Lomeli, Maria, et al.
Published: (2025)
by: Lomeli, Maria, et al.
Published: (2025)
MedVoiceBias: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making
by: Tam, Zhi Rui, et al.
Published: (2025)
by: Tam, Zhi Rui, et al.
Published: (2025)
InstUPR : Instruction-based Unsupervised Passage Reranking with Large Language Models
by: Huang, Chao-Wei, et al.
Published: (2024)
by: Huang, Chao-Wei, et al.
Published: (2024)
PairDistill: Pairwise Relevance Distillation for Dense Retrieval
by: Huang, Chao-Wei, et al.
Published: (2024)
by: Huang, Chao-Wei, et al.
Published: (2024)
Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
by: Peng, Ji-Lun, et al.
Published: (2026)
by: Peng, Ji-Lun, et al.
Published: (2026)
Balancing Knowledge Delivery and Emotional Comfort in Healthcare Conversational Systems
by: Tsai, Shang-Chi, et al.
Published: (2025)
by: Tsai, Shang-Chi, et al.
Published: (2025)
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models
by: Kao, Chang-Sheng, et al.
Published: (2024)
by: Kao, Chang-Sheng, et al.
Published: (2024)
FactAlign: Long-form Factuality Alignment of Large Language Models
by: Huang, Chao-Wei, et al.
Published: (2024)
by: Huang, Chao-Wei, et al.
Published: (2024)
Expected Harm: Rethinking Safety Evaluation of (Mis)Aligned LLMs
by: Chen, Yen-Shan, et al.
Published: (2026)
by: Chen, Yen-Shan, et al.
Published: (2026)
Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors
by: Chen, Yen-Shan, et al.
Published: (2025)
by: Chen, Yen-Shan, et al.
Published: (2025)
Reinforcement Learning from User Feedback
by: Han, Eric, et al.
Published: (2025)
by: Han, Eric, et al.
Published: (2025)
Similar Items
-
StepWiser: Stepwise Generative Judges for Wiser Reasoning
by: Xiong, Wei, et al.
Published: (2025) -
Training Large Language Models to Reason in a Continuous Latent Space
by: Hao, Shibo, et al.
Published: (2024) -
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
by: Zhou, Yifei, et al.
Published: (2025) -
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
by: Wu, Tianhao, et al.
Published: (2024) -
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
by: Su, DiJia, et al.
Published: (2024)