:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hong, Yunqi, Kao, Kuei-Chun, Zhou, Hengguang, Hsieh, Cho-Jui
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.03468
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment
by: Kao, Kuei-Chun, et al.
Published: (2026)

QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
by: Kao, Kuei-Chun, et al.
Published: (2025)

IRIS: Intrinsic Reward Image Synthesis
by: Chen, Yihang, et al.
Published: (2025)

R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
by: Zhou, Hengguang, et al.
Published: (2025)

Enhancing CLIP Conceptual Embedding through Knowledge Distillation
by: Kao, Kuei-Chun
Published: (2024)

MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?
by: Li, Xirui, et al.
Published: (2024)

Unlabeled Data Improves Fine-Grained Image Zero-shot Classification with Multimodal LLMs
by: Hong, Yunqi, et al.
Published: (2025)

Understanding the Impact of Negative Prompts: When and How Do They Take Effect?
by: Ban, Yuanhao, et al.
Published: (2024)

Adaptive Diagnostic Reasoning Framework for Pathology with Multimodal Large Language Models
by: Hong, Yunqi, et al.
Published: (2025)

GARDO: Reinforcing Diffusion Models without Reward Hacking
by: He, Haoran, et al.
Published: (2025)

MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object Diffusion
by: Li, Sen, et al.
Published: (2024)

SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models
by: Lian, Jiesong, et al.
Published: (2025)

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
by: Wang, Yibin, et al.
Published: (2025)

Do Synthetic Trajectories Reflect Real Reward Hacking? A Systematic Study on Monitoring In-the-Wild Hacking in Code Generation
by: Li, Lichen, et al.
Published: (2026)

Text is All You Need for Vision-Language Model Jailbreaking
by: Chen, Yihang, et al.
Published: (2026)

Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards
by: Kim, Seungwook, et al.
Published: (2026)

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
by: Lee, Seung Hyun, et al.
Published: (2024)

The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation
by: Mao, Weijia, et al.
Published: (2025)

TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding
by: Luan, Bozhi, et al.
Published: (2024)

One-Forcing: Towards Stable One-Step Autoregressive Video Generation
by: Feng, Jiaqi, et al.
Published: (2026)

Concepts or Skills? Rethinking Instruction Selection for Multi-modal Models
by: Bai, Andrew, et al.
Published: (2025)

Embedding Space Selection for Detecting Memorization and Fingerprinting in Generative Models
by: He, Jack, et al.
Published: (2024)

MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning
by: Liu, Xiaoyang, et al.
Published: (2024)

JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits
by: Pan, Minzhou, et al.
Published: (2024)

Look Where It Matters: Training-Free Ultra-HR Remote Sensing VQA via Adaptive Zoom Search
by: Zhou, Yunqi, et al.
Published: (2025)

The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
by: Ban, Yuanhao, et al.
Published: (2024)

A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
by: Yang, Shentao, et al.
Published: (2024)

Reward Incremental Learning in Text-to-Image Generation
by: Wang, Maorong, et al.
Published: (2024)

Mitigating Bias in Dataset Distillation
by: Cui, Justin, et al.
Published: (2024)

Transfer Learning for Keypoint Detection in Low-Resolution Thermal TUG Test Images
by: Chen, Wei-Lun, et al.
Published: (2025)

Adversarial Examples Detection with Bayesian Neural Network
by: Li, Yao, et al.
Published: (2021)

Enhancing Spatial Understanding in Image Generation via Reward Modeling
by: Tang, Zhenyu, et al.
Published: (2026)

Image Deraining via Self-supervised Reinforcement Learning
by: Liao, He-Hao, et al.
Published: (2024)

Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?
by: Kao, Kuei-Chun, et al.
Published: (2024)

From Text to Pixel: Advancing Long-Context Understanding in MLLMs
by: Lu, Yujie, et al.
Published: (2024)

PromptEcho: Annotation-Free Reward from Vision-Language Models for Text-to-Image Reinforcement Learning
by: Liu, Jinlong, et al.
Published: (2026)

ShadowHack: Hacking Shadows via Luminance-Color Divide and Conquer
by: Hu, Jin, et al.
Published: (2024)

Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation
by: Yang, Hongji, et al.
Published: (2025)

SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation
by: Zhou, Sashuai, et al.
Published: (2026)

Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation
by: Gu, Yunqi, et al.
Published: (2023)