Saved in:
| Main Authors: | Hong, Yunqi, Kao, Kuei-Chun, Zhou, Hengguang, Hsieh, Cho-Jui |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.03468 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment
by: Kao, Kuei-Chun, et al.
Published: (2026)
by: Kao, Kuei-Chun, et al.
Published: (2026)
QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
by: Kao, Kuei-Chun, et al.
Published: (2025)
by: Kao, Kuei-Chun, et al.
Published: (2025)
IRIS: Intrinsic Reward Image Synthesis
by: Chen, Yihang, et al.
Published: (2025)
by: Chen, Yihang, et al.
Published: (2025)
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
by: Zhou, Hengguang, et al.
Published: (2025)
by: Zhou, Hengguang, et al.
Published: (2025)
Enhancing CLIP Conceptual Embedding through Knowledge Distillation
by: Kao, Kuei-Chun
Published: (2024)
by: Kao, Kuei-Chun
Published: (2024)
MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?
by: Li, Xirui, et al.
Published: (2024)
by: Li, Xirui, et al.
Published: (2024)
Unlabeled Data Improves Fine-Grained Image Zero-shot Classification with Multimodal LLMs
by: Hong, Yunqi, et al.
Published: (2025)
by: Hong, Yunqi, et al.
Published: (2025)
Understanding the Impact of Negative Prompts: When and How Do They Take Effect?
by: Ban, Yuanhao, et al.
Published: (2024)
by: Ban, Yuanhao, et al.
Published: (2024)
Adaptive Diagnostic Reasoning Framework for Pathology with Multimodal Large Language Models
by: Hong, Yunqi, et al.
Published: (2025)
by: Hong, Yunqi, et al.
Published: (2025)
GARDO: Reinforcing Diffusion Models without Reward Hacking
by: He, Haoran, et al.
Published: (2025)
by: He, Haoran, et al.
Published: (2025)
MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object Diffusion
by: Li, Sen, et al.
Published: (2024)
by: Li, Sen, et al.
Published: (2024)
SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models
by: Lian, Jiesong, et al.
Published: (2025)
by: Lian, Jiesong, et al.
Published: (2025)
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
by: Wang, Yibin, et al.
Published: (2025)
by: Wang, Yibin, et al.
Published: (2025)
Do Synthetic Trajectories Reflect Real Reward Hacking? A Systematic Study on Monitoring In-the-Wild Hacking in Code Generation
by: Li, Lichen, et al.
Published: (2026)
by: Li, Lichen, et al.
Published: (2026)
Text is All You Need for Vision-Language Model Jailbreaking
by: Chen, Yihang, et al.
Published: (2026)
by: Chen, Yihang, et al.
Published: (2026)
Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards
by: Kim, Seungwook, et al.
Published: (2026)
by: Kim, Seungwook, et al.
Published: (2026)
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
by: Lee, Seung Hyun, et al.
Published: (2024)
by: Lee, Seung Hyun, et al.
Published: (2024)
The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation
by: Mao, Weijia, et al.
Published: (2025)
by: Mao, Weijia, et al.
Published: (2025)
TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding
by: Luan, Bozhi, et al.
Published: (2024)
by: Luan, Bozhi, et al.
Published: (2024)
One-Forcing: Towards Stable One-Step Autoregressive Video Generation
by: Feng, Jiaqi, et al.
Published: (2026)
by: Feng, Jiaqi, et al.
Published: (2026)
Concepts or Skills? Rethinking Instruction Selection for Multi-modal Models
by: Bai, Andrew, et al.
Published: (2025)
by: Bai, Andrew, et al.
Published: (2025)
Embedding Space Selection for Detecting Memorization and Fingerprinting in Generative Models
by: He, Jack, et al.
Published: (2024)
by: He, Jack, et al.
Published: (2024)
MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning
by: Liu, Xiaoyang, et al.
Published: (2024)
by: Liu, Xiaoyang, et al.
Published: (2024)
JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits
by: Pan, Minzhou, et al.
Published: (2024)
by: Pan, Minzhou, et al.
Published: (2024)
Look Where It Matters: Training-Free Ultra-HR Remote Sensing VQA via Adaptive Zoom Search
by: Zhou, Yunqi, et al.
Published: (2025)
by: Zhou, Yunqi, et al.
Published: (2025)
The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
by: Ban, Yuanhao, et al.
Published: (2024)
by: Ban, Yuanhao, et al.
Published: (2024)
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
by: Yang, Shentao, et al.
Published: (2024)
by: Yang, Shentao, et al.
Published: (2024)
Reward Incremental Learning in Text-to-Image Generation
by: Wang, Maorong, et al.
Published: (2024)
by: Wang, Maorong, et al.
Published: (2024)
Mitigating Bias in Dataset Distillation
by: Cui, Justin, et al.
Published: (2024)
by: Cui, Justin, et al.
Published: (2024)
Transfer Learning for Keypoint Detection in Low-Resolution Thermal TUG Test Images
by: Chen, Wei-Lun, et al.
Published: (2025)
by: Chen, Wei-Lun, et al.
Published: (2025)
Adversarial Examples Detection with Bayesian Neural Network
by: Li, Yao, et al.
Published: (2021)
by: Li, Yao, et al.
Published: (2021)
Enhancing Spatial Understanding in Image Generation via Reward Modeling
by: Tang, Zhenyu, et al.
Published: (2026)
by: Tang, Zhenyu, et al.
Published: (2026)
Image Deraining via Self-supervised Reinforcement Learning
by: Liao, He-Hao, et al.
Published: (2024)
by: Liao, He-Hao, et al.
Published: (2024)
Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?
by: Kao, Kuei-Chun, et al.
Published: (2024)
by: Kao, Kuei-Chun, et al.
Published: (2024)
From Text to Pixel: Advancing Long-Context Understanding in MLLMs
by: Lu, Yujie, et al.
Published: (2024)
by: Lu, Yujie, et al.
Published: (2024)
PromptEcho: Annotation-Free Reward from Vision-Language Models for Text-to-Image Reinforcement Learning
by: Liu, Jinlong, et al.
Published: (2026)
by: Liu, Jinlong, et al.
Published: (2026)
ShadowHack: Hacking Shadows via Luminance-Color Divide and Conquer
by: Hu, Jin, et al.
Published: (2024)
by: Hu, Jin, et al.
Published: (2024)
Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation
by: Yang, Hongji, et al.
Published: (2025)
by: Yang, Hongji, et al.
Published: (2025)
SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation
by: Zhou, Sashuai, et al.
Published: (2026)
by: Zhou, Sashuai, et al.
Published: (2026)
Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation
by: Gu, Yunqi, et al.
Published: (2023)
by: Gu, Yunqi, et al.
Published: (2023)
Similar Items
-
AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment
by: Kao, Kuei-Chun, et al.
Published: (2026) -
QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
by: Kao, Kuei-Chun, et al.
Published: (2025) -
IRIS: Intrinsic Reward Image Synthesis
by: Chen, Yihang, et al.
Published: (2025) -
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
by: Zhou, Hengguang, et al.
Published: (2025) -
Enhancing CLIP Conceptual Embedding through Knowledge Distillation
by: Kao, Kuei-Chun
Published: (2024)