:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shen, Yaojie, Wang, Xinyao, Niu, Yulei, Zhou, Ying, Tang, Lexin, Zhang, Libo, Chen, Fan, Wen, Longyin
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2409.08845
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
by: Zhou, Ying, et al.
Published: (2024)

Referring Layer Decomposition
by: Chen, Fangyi, et al.
Published: (2026)

Where do Large Vision-Language Models Look at when Answering Questions?
by: Xing, Xiaoying, et al.
Published: (2025)

Accurate and Fast Compressed Video Captioning
by: Shen, Yaojie, et al.
Published: (2023)

AIPO: Learning to Reason from Active Interaction
by: Liu, Junnan, et al.
Published: (2026)

Improving Multilingual Social Media Insights: Aspect-based Comment Analysis
by: Zhang, Longyin, et al.
Published: (2025)

Two Causal Principles for Improving Visual Dialog
by: Qi, Jiaxin, et al.
Published: (2019)

APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference Training
by: Rao, Jun, et al.
Published: (2025)

Structured Context Learning for Generic Event Boundary Detection
by: Gu, Xin, et al.
Published: (2025)

Self-Steering Optimization: Autonomous Preference Optimization for Large Language Models
by: Xiang, Hao, et al.
Published: (2024)

Iterative Reasoning Preference Optimization
by: Pang, Richard Yuanzhe, et al.
Published: (2024)

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
by: Wang, Tianduo, et al.
Published: (2024)

Multi-Hop Question Generation via Dual-Perspective Keyword Guidance
by: Li, Maodong, et al.
Published: (2025)

Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
by: Pan, Junshu, et al.
Published: (2025)

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
by: Guo, Yiju, et al.
Published: (2024)

EchoFoley: Event-Centric Hierarchical Control for Video Grounded Creative Sound Generation
by: Li, Bingxuan, et al.
Published: (2025)

Toward Real-World Chinese Psychological Support Dialogues: CPsDD Dataset and a Co-Evolving Multi-Agent System
by: Shi, Yuanchen, et al.
Published: (2025)

CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization
by: Li, Junyi, et al.
Published: (2026)

A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
by: Zhang, Fengji, et al.
Published: (2025)

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
by: Shen, Yifan, et al.
Published: (2025)

Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought
by: Chen, Qiguang, et al.
Published: (2024)

CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency
by: Shen, Zhanming, et al.
Published: (2025)

An LLM Feature-based Framework for Dialogue Constructiveness Assessment
by: Zhou, Lexin, et al.
Published: (2024)

TSO: Self-Training with Scaled Preference Optimization
by: Chen, Kaihui, et al.
Published: (2024)

Impact of Stickers on Multimodal Sentiment and Intent in Social Media: A New Task, Dataset and Baseline
by: Shi, Yuanchen, et al.
Published: (2024)

A-IPO: Adaptive Intent-driven Preference Optimization
by: Wang, Wenqing, et al.
Published: (2025)

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
by: Li, Yafu, et al.
Published: (2025)

Plug-and-Play Training Framework for Preference Optimization
by: Ma, Jingyuan, et al.
Published: (2024)

An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals
by: Zhao, Yangyang, et al.
Published: (2025)

Statistical Rejection Sampling Improves Preference Optimization
by: Liu, Tianqi, et al.
Published: (2023)

Edit3K: Universal Representation Learning for Video Editing Components
by: Gu, Xin, et al.
Published: (2024)

Growth First, Care Second? Tracing the Landscape of LLM Value Preferences in Everyday Dilemmas
by: Chen, Zhiyi, et al.
Published: (2026)

Training-Free Group Relative Policy Optimization
by: Cai, Yuzheng, et al.
Published: (2025)

Direct Judgement Preference Optimization
by: Wang, Peifeng, et al.
Published: (2024)

Improving Factual Consistency of News Summarization by Contrastive Preference Optimization
by: Feng, Huawen, et al.
Published: (2023)

Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level
by: Liu, Jie, et al.
Published: (2024)

Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning
by: Gu, Xin, et al.
Published: (2025)

MaLei at MultiClinSUM: Summarisation of Clinical Documents using Perspective-Aware Iterative Self-Prompting with LLMs
by: Ren, Libo, et al.
Published: (2025)

InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning
by: Wei, Chengwei, et al.
Published: (2026)

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window
by: Tang, Qiaoyu, et al.
Published: (2025)