Saved in:
| Main Authors: | Li, Mengtian, Lu, Yuwei, Li, Feifei, Gan, Chenqi, Xie, Zhifeng, Wang, Xi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.02467 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models
by: Xie, Zhifeng, et al.
Published: (2024)
by: Xie, Zhifeng, et al.
Published: (2024)
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation
by: Yang, Songlin, et al.
Published: (2026)
by: Yang, Songlin, et al.
Published: (2026)
ViPO: Visual Preference Optimization at Scale
by: Li, Ming, et al.
Published: (2026)
by: Li, Ming, et al.
Published: (2026)
Visual Preference Optimization with Rubric Rewards
by: Yu, Ya-Qi, et al.
Published: (2026)
by: Yu, Ya-Qi, et al.
Published: (2026)
PixelArena: A benchmark for Pixel-Precision Visual Intelligence
by: Liang, Feng, et al.
Published: (2025)
by: Liang, Feng, et al.
Published: (2025)
StageDesigner: Artistic Stage Generation for Scenography via Theater Scripts
by: Gan, Zhaoxing, et al.
Published: (2025)
by: Gan, Zhaoxing, et al.
Published: (2025)
Multi-Sourced Compositional Generalization in Visual Question Answering
by: Li, Chuanhao, et al.
Published: (2025)
by: Li, Chuanhao, et al.
Published: (2025)
SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control
by: Zhang, Zhida, et al.
Published: (2026)
by: Zhang, Zhida, et al.
Published: (2026)
Why Not Hyperparameter-Friendly Optimisation? A Monotonic Adaptive Norm Rescaling Approach For Long-Tailed Recognition
by: Zhang, Shuo, et al.
Published: (2026)
by: Zhang, Shuo, et al.
Published: (2026)
Mitigating Visual Hallucinations via Semantic Curriculum Preference Optimization in MLLMs
by: Li, Yuanshuai, et al.
Published: (2025)
by: Li, Yuanshuai, et al.
Published: (2025)
Interpretable Interaction Modeling for Trajectory Prediction via Agent Selection and Physical Coefficient
by: Huang, Shiji, et al.
Published: (2024)
by: Huang, Shiji, et al.
Published: (2024)
On the Adversarial Robustness of Camera-based 3D Object Detection
by: Xie, Shaoyuan, et al.
Published: (2023)
by: Xie, Shaoyuan, et al.
Published: (2023)
Diffusion-Based Restoration for Multi-Modal 3D Object Detection in Adverse Weather
by: He, Zhijian, et al.
Published: (2025)
by: He, Zhijian, et al.
Published: (2025)
Can Vision-Language Models Replace Human Annotators: A Case Study with CelebA Dataset
by: Lu, Haoming, et al.
Published: (2024)
by: Lu, Haoming, et al.
Published: (2024)
Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation
by: Yang, Xiaomeng, et al.
Published: (2025)
by: Yang, Xiaomeng, et al.
Published: (2025)
Hydra: Accurate Multi-Modal Leaf Wetness Sensing with mm-Wave and Camera Fusion
by: Liu, Yimeng, et al.
Published: (2025)
by: Liu, Yimeng, et al.
Published: (2025)
Spotlighting Partially Visible Cinematic Language for Video-to-Audio Generation via Self-distillation
by: Huang, Feizhen, et al.
Published: (2025)
by: Huang, Feizhen, et al.
Published: (2025)
ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding
by: Wang, Haonan, et al.
Published: (2025)
by: Wang, Haonan, et al.
Published: (2025)
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization
by: Xie, Yuxi, et al.
Published: (2024)
by: Xie, Yuxi, et al.
Published: (2024)
OmniCam: Unified Multimodal Video Generation via Camera Control
by: Yang, Xiaoda, et al.
Published: (2025)
by: Yang, Xiaoda, et al.
Published: (2025)
FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes
by: Liu, Jiaxuan, et al.
Published: (2026)
by: Liu, Jiaxuan, et al.
Published: (2026)
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
by: YU, Mark, et al.
Published: (2025)
by: YU, Mark, et al.
Published: (2025)
Red Teaming Visual Language Models
by: Li, Mukai, et al.
Published: (2024)
by: Li, Mukai, et al.
Published: (2024)
Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation
by: Du, Jie, et al.
Published: (2025)
by: Du, Jie, et al.
Published: (2025)
GaussianBody: Clothed Human Reconstruction via 3d Gaussian Splatting
by: Li, Mengtian, et al.
Published: (2024)
by: Li, Mengtian, et al.
Published: (2024)
LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
by: Tang, Jiaqi, et al.
Published: (2025)
by: Tang, Jiaqi, et al.
Published: (2025)
CPO: Condition Preference Optimization for Controllable Image Generation
by: Lyu, Zonglin, et al.
Published: (2025)
by: Lyu, Zonglin, et al.
Published: (2025)
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization
by: Liu, Xinxin, et al.
Published: (2026)
by: Liu, Xinxin, et al.
Published: (2026)
MemCam: Memory-Augmented Camera Control for Consistent Video Generation
by: Gao, Xinhang, et al.
Published: (2026)
by: Gao, Xinhang, et al.
Published: (2026)
MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval
by: Lei, Youbo, et al.
Published: (2023)
by: Lei, Youbo, et al.
Published: (2023)
Learning Active Perception via Self-Evolving Preference Optimization for GUI Grounding
by: Wang, Wanfu, et al.
Published: (2025)
by: Wang, Wanfu, et al.
Published: (2025)
Adversarial Prompt Injection Attack on Multimodal Large Language Models
by: Ding, Meiwen, et al.
Published: (2026)
by: Ding, Meiwen, et al.
Published: (2026)
Walking the Schrödinger Bridge: A Direct Trajectory for Text-to-3D Generation
by: Li, Ziying, et al.
Published: (2025)
by: Li, Ziying, et al.
Published: (2025)
Visual Space Optimization for Zero-shot Learning
by: Wang, Xinsheng, et al.
Published: (2019)
by: Wang, Xinsheng, et al.
Published: (2019)
SplaTraj: Camera Trajectory Generation with Semantic Gaussian Splatting
by: Liu, Xinyi, et al.
Published: (2024)
by: Liu, Xinyi, et al.
Published: (2024)
Empathetic Response in Audio-Visual Conversations Using Emotion Preference Optimization and MambaCompressor
by: Kim, Yeonju, et al.
Published: (2024)
by: Kim, Yeonju, et al.
Published: (2024)
CheXPO: Preference Optimization for Chest X-ray VLMs with Counterfactual Rationale
by: Liang, Xiao, et al.
Published: (2025)
by: Liang, Xiao, et al.
Published: (2025)
Instance-Level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space
by: Sun, Yuwei, et al.
Published: (2023)
by: Sun, Yuwei, et al.
Published: (2023)
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
by: Li, Siyuan, et al.
Published: (2025)
by: Li, Siyuan, et al.
Published: (2025)
Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models
by: S, Sridhar, et al.
Published: (2025)
by: S, Sridhar, et al.
Published: (2025)
Similar Items
-
HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models
by: Xie, Zhifeng, et al.
Published: (2024) -
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation
by: Yang, Songlin, et al.
Published: (2026) -
ViPO: Visual Preference Optimization at Scale
by: Li, Ming, et al.
Published: (2026) -
Visual Preference Optimization with Rubric Rewards
by: Yu, Ya-Qi, et al.
Published: (2026) -
PixelArena: A benchmark for Pixel-Precision Visual Intelligence
by: Liang, Feng, et al.
Published: (2025)