Saved in:
| Main Authors: | Liu, Jiazhen, Deng, Yuchuan, Chen, Long |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.23061 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Segmentation as A Plug-and-Play Capability for Frozen Multimodal LLMs
by: Liu, Jiazhen, et al.
Published: (2025)
by: Liu, Jiazhen, et al.
Published: (2025)
Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
by: Liu, Jiazhen, et al.
Published: (2025)
by: Liu, Jiazhen, et al.
Published: (2025)
Fundus-R1: Training a Fundus-Reading MLLM with Knowledge-Aware Reasoning on Public Data
by: Deng, Yuchuan, et al.
Published: (2026)
by: Deng, Yuchuan, et al.
Published: (2026)
How to Take a Memorable Picture? Empowering Users with Actionable Feedback
by: Laiti, Francesco, et al.
Published: (2026)
by: Laiti, Francesco, et al.
Published: (2026)
CLGRPO: Reasoning Ability Enhancement for Small VLMs
by: Wang, Fanyi, et al.
Published: (2025)
by: Wang, Fanyi, et al.
Published: (2025)
Listener-Rewarded Thinking in VLMs for Image Preferences
by: Gambashidze, Alexander, et al.
Published: (2025)
by: Gambashidze, Alexander, et al.
Published: (2025)
DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs
by: Pan, Jiazhen, et al.
Published: (2026)
by: Pan, Jiazhen, et al.
Published: (2026)
Think Twice to See More: Iterative Visual Reasoning in Medical VLMs
by: Chen, Kaitao, et al.
Published: (2025)
by: Chen, Kaitao, et al.
Published: (2025)
Exploration of VLMs for Driver Monitoring Systems Applications
by: Cañas, Paola Natalia, et al.
Published: (2025)
by: Cañas, Paola Natalia, et al.
Published: (2025)
Towards Memorization-Free Diffusion Models
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
Investigating Memorization in Video Diffusion Models
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
by: Pan, Jiazhen, et al.
Published: (2025)
by: Pan, Jiazhen, et al.
Published: (2025)
VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs
by: Berman, Shmuel, et al.
Published: (2025)
by: Berman, Shmuel, et al.
Published: (2025)
Thinking with Gaze: Sequential Eye-Tracking as Visual Reasoning Supervision for Medical VLMs
by: Li, Yiwei, et al.
Published: (2026)
by: Li, Yiwei, et al.
Published: (2026)
DAPL: Integration of Positive and Negative Descriptions in Text-Based Person Search
by: Deng, Yuchuan, et al.
Published: (2024)
by: Deng, Yuchuan, et al.
Published: (2024)
PuzzleCraft: Exploration-Aware Curriculum Learning for Puzzle-Based RLVR in VLMs
by: Jeddi, Ahmadreza, et al.
Published: (2025)
by: Jeddi, Ahmadreza, et al.
Published: (2025)
Detecting, Explaining, and Mitigating Memorization in Diffusion Models
by: Wen, Yuxin, et al.
Published: (2024)
by: Wen, Yuxin, et al.
Published: (2024)
Integration of Object Detection and Small VLMs for Construction Safety Hazard Identification
by: Adil, Muhammad, et al.
Published: (2026)
by: Adil, Muhammad, et al.
Published: (2026)
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
by: Zhu, Wenxin, et al.
Published: (2025)
by: Zhu, Wenxin, et al.
Published: (2025)
SemVink: Advancing VLMs' Semantic Understanding of Optical Illusions via Visual Global Thinking
by: Li, Sifan, et al.
Published: (2025)
by: Li, Sifan, et al.
Published: (2025)
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
by: Chen, Chen, et al.
Published: (2025)
by: Chen, Chen, et al.
Published: (2025)
Exploring Local Memorization in Diffusion Models via Bright Ending Attention
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
Efficient Large-Deformation Medical Image Registration via Recurrent Dynamic Correlation
by: Li, Tianran, et al.
Published: (2025)
by: Li, Tianran, et al.
Published: (2025)
Unconsciously Forget: Mitigating Memorization; Without Knowing What is being Memorized
by: Jin, Er, et al.
Published: (2025)
by: Jin, Er, et al.
Published: (2025)
Modeling Visual Memorability Assessment with Autoencoders Reveals Characteristics of Memorable Images
by: Bagheri, Elham, et al.
Published: (2024)
by: Bagheri, Elham, et al.
Published: (2024)
Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM
by: Li, Baicheng, et al.
Published: (2024)
by: Li, Baicheng, et al.
Published: (2024)
Memorizing SAM: 3D Medical Segment Anything Model with Memorizing Transformer
by: Shao, Xinyuan, et al.
Published: (2024)
by: Shao, Xinyuan, et al.
Published: (2024)
Global-Local Tree Search in VLMs for 3D Indoor Scene Generation
by: Deng, Wei, et al.
Published: (2025)
by: Deng, Wei, et al.
Published: (2025)
World2VLM: Distilling World Model Imagination into VLMs for Dynamic Spatial Reasoning
by: Zhang, Wanyue, et al.
Published: (2026)
by: Zhang, Wanyue, et al.
Published: (2026)
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
by: Zhao, Wangbo, et al.
Published: (2024)
by: Zhao, Wangbo, et al.
Published: (2024)
Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
by: Zhang, Xintong, et al.
Published: (2025)
by: Zhang, Xintong, et al.
Published: (2025)
AdaThinkDrive: Adaptive Thinking via Reinforcement Learning for Autonomous Driving
by: Luo, Yuechen, et al.
Published: (2025)
by: Luo, Yuechen, et al.
Published: (2025)
Start Small, Think Big: Curriculum-based Relative Policy Optimization for Visual Grounding
by: Yan, Qingyang, et al.
Published: (2025)
by: Yan, Qingyang, et al.
Published: (2025)
How Diffusion Models Memorize
by: Kim, Juyeop, et al.
Published: (2025)
by: Kim, Juyeop, et al.
Published: (2025)
XSPA: Crafting Imperceptible X-Shaped Sparse Adversarial Perturbations for Transferable Attacks on VLMs
by: Hu, Chengyin, et al.
Published: (2026)
by: Hu, Chengyin, et al.
Published: (2026)
Empowering Dynamic Urban Navigation with Stereo and Mid-Level Vision
by: Zhou, Wentao, et al.
Published: (2025)
by: Zhou, Wentao, et al.
Published: (2025)
Long-Term Ad Memorability: Understanding & Generating Memorable Ads
by: SI, Harini, et al.
Published: (2023)
by: SI, Harini, et al.
Published: (2023)
ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing
by: Li, Hengjia, et al.
Published: (2026)
by: Li, Hengjia, et al.
Published: (2026)
Are VLMs Ready for Lane Topology Awareness in Autonomous Driving?
by: Chen, Xin, et al.
Published: (2025)
by: Chen, Xin, et al.
Published: (2025)
Filtering Memorization from Parameter-Space in Diffusion Models
by: Zhe, Yu, et al.
Published: (2026)
by: Zhe, Yu, et al.
Published: (2026)
Similar Items
-
Segmentation as A Plug-and-Play Capability for Frozen Multimodal LLMs
by: Liu, Jiazhen, et al.
Published: (2025) -
Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
by: Liu, Jiazhen, et al.
Published: (2025) -
Fundus-R1: Training a Fundus-Reading MLLM with Knowledge-Aware Reasoning on Public Data
by: Deng, Yuchuan, et al.
Published: (2026) -
How to Take a Memorable Picture? Empowering Users with Actionable Feedback
by: Laiti, Francesco, et al.
Published: (2026) -
CLGRPO: Reasoning Ability Enhancement for Small VLMs
by: Wang, Fanyi, et al.
Published: (2025)