Saved in:
| Main Authors: | Liu, Qihao, Mao, Chengzhi, Liu, Yaojie, Yuille, Alan, Chu, Wen-Sheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.16921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
by: Wang, Xingrui, et al.
Published: (2025)
by: Wang, Xingrui, et al.
Published: (2025)
Mull-Tokens: Modality-Agnostic Latent Thinking
by: Ray, Arijit, et al.
Published: (2025)
by: Ray, Arijit, et al.
Published: (2025)
A Bayesian Approach to OOD Robustness in Image Classification
by: Kaushik, Prakhar, et al.
Published: (2024)
by: Kaushik, Prakhar, et al.
Published: (2024)
Rule-VLN: Bridging Perception and Compliance via Semantic Reasoning and Geometric Rectification
by: Wen, Jiawen, et al.
Published: (2026)
by: Wen, Jiawen, et al.
Published: (2026)
Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation
by: Kaushik, Prakhar, et al.
Published: (2024)
by: Kaushik, Prakhar, et al.
Published: (2024)
DREAM: Diffusion Rectification and Estimation-Adaptive Models
by: Zhou, Jinxin, et al.
Published: (2023)
by: Zhou, Jinxin, et al.
Published: (2023)
Fake it till You Make it: Reward Modeling as Discriminative Prediction
by: Liu, Runtao, et al.
Published: (2025)
by: Liu, Runtao, et al.
Published: (2025)
GENFIG1: Visual Summaries of Scholarly Work as a Challenge for Vision-Language Models
by: Guan, Yaohan, et al.
Published: (2026)
by: Guan, Yaohan, et al.
Published: (2026)
Leveraging AI Predicted and Expert Revised Annotations in Interactive Segmentation: Continual Tuning or Full Training?
by: Zhang, Tiezheng, et al.
Published: (2024)
by: Zhang, Tiezheng, et al.
Published: (2024)
R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning
by: Zhang, Zirui, et al.
Published: (2026)
by: Zhang, Zirui, et al.
Published: (2026)
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
by: Wang, Xingrui, et al.
Published: (2025)
by: Wang, Xingrui, et al.
Published: (2025)
RectifiedHR: Enable Efficient High-Resolution Synthesis via Energy Rectification
by: Yang, Zhen, et al.
Published: (2025)
by: Yang, Zhen, et al.
Published: (2025)
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
by: Wang, Xingrui, et al.
Published: (2024)
by: Wang, Xingrui, et al.
Published: (2024)
Acquiring Weak Annotations for Tumor Localization in Temporal and Volumetric Data
by: Chou, Yu-Cheng, et al.
Published: (2023)
by: Chou, Yu-Cheng, et al.
Published: (2023)
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
by: Liu, Chengzhi, et al.
Published: (2025)
by: Liu, Chengzhi, et al.
Published: (2025)
Learning to Focus and Precise Cropping: A Reinforcement Learning Framework with Information Gaps and Grounding Loss for MLLMs
by: Zhao, Xuanpu, et al.
Published: (2026)
by: Zhao, Xuanpu, et al.
Published: (2026)
Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision
by: Zhang, Chenshuang, et al.
Published: (2025)
by: Zhang, Chenshuang, et al.
Published: (2025)
ReVision: Refining Video Diffusion with Explicit 3D Motion Modeling
by: Liu, Qihao, et al.
Published: (2025)
by: Liu, Qihao, et al.
Published: (2025)
Auditing Gender Presentation Differences in Text-to-Image Models
by: Zhang, Yanzhe, et al.
Published: (2023)
by: Zhang, Yanzhe, et al.
Published: (2023)
Accurate and Fast Compressed Video Captioning
by: Shen, Yaojie, et al.
Published: (2023)
by: Shen, Yaojie, et al.
Published: (2023)
Mitigating Object Hallucinations in LVLMs via Attention Imbalance Rectification
by: Sun, Han, et al.
Published: (2026)
by: Sun, Han, et al.
Published: (2026)
Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution
by: Liu, Qihao, et al.
Published: (2024)
by: Liu, Qihao, et al.
Published: (2024)
DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data
by: Liu, Qihao, et al.
Published: (2024)
by: Liu, Qihao, et al.
Published: (2024)
Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding
by: Jiang, Yibo, et al.
Published: (2026)
by: Jiang, Yibo, et al.
Published: (2026)
The Universal Weight Subspace Hypothesis
by: Kaushik, Prakhar, et al.
Published: (2025)
by: Kaushik, Prakhar, et al.
Published: (2025)
Shared LoRA Subspaces for almost Strict Continual Learning
by: Kaushik, Prakhar, et al.
Published: (2026)
by: Kaushik, Prakhar, et al.
Published: (2026)
The Geometry of Compromise: Unlocking Generative Capabilities via Controllable Modality Alignment
by: Liu, Hongyuan, et al.
Published: (2026)
by: Liu, Hongyuan, et al.
Published: (2026)
Leveraging Labelled Data Knowledge: A Cooperative Rectification Learning Network for Semi-supervised 3D Medical Image Segmentation
by: Wang, Yanyan, et al.
Published: (2025)
by: Wang, Yanyan, et al.
Published: (2025)
Attribution-Guided Model Rectification of Unreliable Neural Network Behaviors
by: Yang, Peiyu, et al.
Published: (2026)
by: Yang, Peiyu, et al.
Published: (2026)
Bridging the Gap Between Sparsity and Redundancy: A Dual-Decoding Framework with Global Context for Map Inference
by: Shen, Yudong, et al.
Published: (2025)
by: Shen, Yudong, et al.
Published: (2025)
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
by: Pan, Chenbin, et al.
Published: (2025)
by: Pan, Chenbin, et al.
Published: (2025)
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
by: Wen, Zichen, et al.
Published: (2026)
by: Wen, Zichen, et al.
Published: (2026)
SNR-Edit: Structure-Aware Noise Rectification for Inversion-Free Flow-Based Editing
by: Jiang, Lifan, et al.
Published: (2026)
by: Jiang, Lifan, et al.
Published: (2026)
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
by: Zhang, Chenshuang, et al.
Published: (2024)
by: Zhang, Chenshuang, et al.
Published: (2024)
STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language Models
by: Han, Yuhang, et al.
Published: (2026)
by: Han, Yuhang, et al.
Published: (2026)
DiffSeg: A Segmentation Model for Skin Lesions Based on Diffusion Difference
by: Shuai, Zhihao, et al.
Published: (2024)
by: Shuai, Zhihao, et al.
Published: (2024)
Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding
by: Jeon, Yerim, et al.
Published: (2025)
by: Jeon, Yerim, et al.
Published: (2025)
Enhancing Agentic Autonomous Scientific Discovery with Vision-Language Model Capabilities
by: Gandhi, Kahaan, et al.
Published: (2025)
by: Gandhi, Kahaan, et al.
Published: (2025)
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
by: Liu, Ziyi, et al.
Published: (2025)
by: Liu, Ziyi, et al.
Published: (2025)
Structure-Aware Feature Rectification with Region Adjacency Graphs for Training-Free Open-Vocabulary Semantic Segmentation
by: Huang, Qiming, et al.
Published: (2025)
by: Huang, Qiming, et al.
Published: (2025)
Similar Items
-
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
by: Wang, Xingrui, et al.
Published: (2025) -
Mull-Tokens: Modality-Agnostic Latent Thinking
by: Ray, Arijit, et al.
Published: (2025) -
A Bayesian Approach to OOD Robustness in Image Classification
by: Kaushik, Prakhar, et al.
Published: (2024) -
Rule-VLN: Bridging Perception and Compliance via Semantic Reasoning and Geometric Rectification
by: Wen, Jiawen, et al.
Published: (2026) -
Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation
by: Kaushik, Prakhar, et al.
Published: (2024)