Saved in:
| Main Authors: | Shen, Yaojie, Wang, Xinyao, Niu, Yulei, Zhou, Ying, Tang, Lexin, Zhang, Libo, Chen, Fan, Wen, Longyin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.08845 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
by: Zhou, Ying, et al.
Published: (2024)
by: Zhou, Ying, et al.
Published: (2024)
Referring Layer Decomposition
by: Chen, Fangyi, et al.
Published: (2026)
by: Chen, Fangyi, et al.
Published: (2026)
Where do Large Vision-Language Models Look at when Answering Questions?
by: Xing, Xiaoying, et al.
Published: (2025)
by: Xing, Xiaoying, et al.
Published: (2025)
Accurate and Fast Compressed Video Captioning
by: Shen, Yaojie, et al.
Published: (2023)
by: Shen, Yaojie, et al.
Published: (2023)
AIPO: Learning to Reason from Active Interaction
by: Liu, Junnan, et al.
Published: (2026)
by: Liu, Junnan, et al.
Published: (2026)
Improving Multilingual Social Media Insights: Aspect-based Comment Analysis
by: Zhang, Longyin, et al.
Published: (2025)
by: Zhang, Longyin, et al.
Published: (2025)
Two Causal Principles for Improving Visual Dialog
by: Qi, Jiaxin, et al.
Published: (2019)
by: Qi, Jiaxin, et al.
Published: (2019)
APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference Training
by: Rao, Jun, et al.
Published: (2025)
by: Rao, Jun, et al.
Published: (2025)
Structured Context Learning for Generic Event Boundary Detection
by: Gu, Xin, et al.
Published: (2025)
by: Gu, Xin, et al.
Published: (2025)
Self-Steering Optimization: Autonomous Preference Optimization for Large Language Models
by: Xiang, Hao, et al.
Published: (2024)
by: Xiang, Hao, et al.
Published: (2024)
Iterative Reasoning Preference Optimization
by: Pang, Richard Yuanzhe, et al.
Published: (2024)
by: Pang, Richard Yuanzhe, et al.
Published: (2024)
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
by: Wang, Tianduo, et al.
Published: (2024)
by: Wang, Tianduo, et al.
Published: (2024)
Multi-Hop Question Generation via Dual-Perspective Keyword Guidance
by: Li, Maodong, et al.
Published: (2025)
by: Li, Maodong, et al.
Published: (2025)
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
by: Pan, Junshu, et al.
Published: (2025)
by: Pan, Junshu, et al.
Published: (2025)
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
by: Guo, Yiju, et al.
Published: (2024)
by: Guo, Yiju, et al.
Published: (2024)
EchoFoley: Event-Centric Hierarchical Control for Video Grounded Creative Sound Generation
by: Li, Bingxuan, et al.
Published: (2025)
by: Li, Bingxuan, et al.
Published: (2025)
Toward Real-World Chinese Psychological Support Dialogues: CPsDD Dataset and a Co-Evolving Multi-Agent System
by: Shi, Yuanchen, et al.
Published: (2025)
by: Shi, Yuanchen, et al.
Published: (2025)
CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization
by: Li, Junyi, et al.
Published: (2026)
by: Li, Junyi, et al.
Published: (2026)
A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
by: Zhang, Fengji, et al.
Published: (2025)
by: Zhang, Fengji, et al.
Published: (2025)
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
by: Shen, Yifan, et al.
Published: (2025)
by: Shen, Yifan, et al.
Published: (2025)
Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought
by: Chen, Qiguang, et al.
Published: (2024)
by: Chen, Qiguang, et al.
Published: (2024)
CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency
by: Shen, Zhanming, et al.
Published: (2025)
by: Shen, Zhanming, et al.
Published: (2025)
An LLM Feature-based Framework for Dialogue Constructiveness Assessment
by: Zhou, Lexin, et al.
Published: (2024)
by: Zhou, Lexin, et al.
Published: (2024)
TSO: Self-Training with Scaled Preference Optimization
by: Chen, Kaihui, et al.
Published: (2024)
by: Chen, Kaihui, et al.
Published: (2024)
Impact of Stickers on Multimodal Sentiment and Intent in Social Media: A New Task, Dataset and Baseline
by: Shi, Yuanchen, et al.
Published: (2024)
by: Shi, Yuanchen, et al.
Published: (2024)
A-IPO: Adaptive Intent-driven Preference Optimization
by: Wang, Wenqing, et al.
Published: (2025)
by: Wang, Wenqing, et al.
Published: (2025)
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
by: Li, Yafu, et al.
Published: (2025)
by: Li, Yafu, et al.
Published: (2025)
Plug-and-Play Training Framework for Preference Optimization
by: Ma, Jingyuan, et al.
Published: (2024)
by: Ma, Jingyuan, et al.
Published: (2024)
An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals
by: Zhao, Yangyang, et al.
Published: (2025)
by: Zhao, Yangyang, et al.
Published: (2025)
Statistical Rejection Sampling Improves Preference Optimization
by: Liu, Tianqi, et al.
Published: (2023)
by: Liu, Tianqi, et al.
Published: (2023)
Edit3K: Universal Representation Learning for Video Editing Components
by: Gu, Xin, et al.
Published: (2024)
by: Gu, Xin, et al.
Published: (2024)
Growth First, Care Second? Tracing the Landscape of LLM Value Preferences in Everyday Dilemmas
by: Chen, Zhiyi, et al.
Published: (2026)
by: Chen, Zhiyi, et al.
Published: (2026)
Training-Free Group Relative Policy Optimization
by: Cai, Yuzheng, et al.
Published: (2025)
by: Cai, Yuzheng, et al.
Published: (2025)
Direct Judgement Preference Optimization
by: Wang, Peifeng, et al.
Published: (2024)
by: Wang, Peifeng, et al.
Published: (2024)
Improving Factual Consistency of News Summarization by Contrastive Preference Optimization
by: Feng, Huawen, et al.
Published: (2023)
by: Feng, Huawen, et al.
Published: (2023)
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level
by: Liu, Jie, et al.
Published: (2024)
by: Liu, Jie, et al.
Published: (2024)
Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning
by: Gu, Xin, et al.
Published: (2025)
by: Gu, Xin, et al.
Published: (2025)
MaLei at MultiClinSUM: Summarisation of Clinical Documents using Perspective-Aware Iterative Self-Prompting with LLMs
by: Ren, Libo, et al.
Published: (2025)
by: Ren, Libo, et al.
Published: (2025)
InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning
by: Wei, Chengwei, et al.
Published: (2026)
by: Wei, Chengwei, et al.
Published: (2026)
Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window
by: Tang, Qiaoyu, et al.
Published: (2025)
by: Tang, Qiaoyu, et al.
Published: (2025)
Similar Items
-
DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
by: Zhou, Ying, et al.
Published: (2024) -
Referring Layer Decomposition
by: Chen, Fangyi, et al.
Published: (2026) -
Where do Large Vision-Language Models Look at when Answering Questions?
by: Xing, Xiaoying, et al.
Published: (2025) -
Accurate and Fast Compressed Video Captioning
by: Shen, Yaojie, et al.
Published: (2023) -
AIPO: Learning to Reason from Active Interaction
by: Liu, Junnan, et al.
Published: (2026)