:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Jiazhen, Deng, Yuchuan, Chen, Long
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.23061
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Segmentation as A Plug-and-Play Capability for Frozen Multimodal LLMs
by: Liu, Jiazhen, et al.
Published: (2025)

Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
by: Liu, Jiazhen, et al.
Published: (2025)

Fundus-R1: Training a Fundus-Reading MLLM with Knowledge-Aware Reasoning on Public Data
by: Deng, Yuchuan, et al.
Published: (2026)

How to Take a Memorable Picture? Empowering Users with Actionable Feedback
by: Laiti, Francesco, et al.
Published: (2026)

CLGRPO: Reasoning Ability Enhancement for Small VLMs
by: Wang, Fanyi, et al.
Published: (2025)

Listener-Rewarded Thinking in VLMs for Image Preferences
by: Gambashidze, Alexander, et al.
Published: (2025)

DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs
by: Pan, Jiazhen, et al.
Published: (2026)

Think Twice to See More: Iterative Visual Reasoning in Medical VLMs
by: Chen, Kaitao, et al.
Published: (2025)

Exploration of VLMs for Driver Monitoring Systems Applications
by: Cañas, Paola Natalia, et al.
Published: (2025)

Towards Memorization-Free Diffusion Models
by: Chen, Chen, et al.
Published: (2024)

Investigating Memorization in Video Diffusion Models
by: Chen, Chen, et al.
Published: (2024)

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
by: Pan, Jiazhen, et al.
Published: (2025)

VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs
by: Berman, Shmuel, et al.
Published: (2025)

Thinking with Gaze: Sequential Eye-Tracking as Visual Reasoning Supervision for Medical VLMs
by: Li, Yiwei, et al.
Published: (2026)

DAPL: Integration of Positive and Negative Descriptions in Text-Based Person Search
by: Deng, Yuchuan, et al.
Published: (2024)

PuzzleCraft: Exploration-Aware Curriculum Learning for Puzzle-Based RLVR in VLMs
by: Jeddi, Ahmadreza, et al.
Published: (2025)

Detecting, Explaining, and Mitigating Memorization in Diffusion Models
by: Wen, Yuxin, et al.
Published: (2024)

Integration of Object Detection and Small VLMs for Construction Safety Hazard Identification
by: Adil, Muhammad, et al.
Published: (2026)

From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
by: Zhu, Wenxin, et al.
Published: (2025)

SemVink: Advancing VLMs' Semantic Understanding of Optical Illusions via Visual Global Thinking
by: Li, Sifan, et al.
Published: (2025)

Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
by: Chen, Chen, et al.
Published: (2025)

Exploring Local Memorization in Diffusion Models via Bright Ending Attention
by: Chen, Chen, et al.
Published: (2024)

Efficient Large-Deformation Medical Image Registration via Recurrent Dynamic Correlation
by: Li, Tianran, et al.
Published: (2025)

Unconsciously Forget: Mitigating Memorization; Without Knowing What is being Memorized
by: Jin, Er, et al.
Published: (2025)

Modeling Visual Memorability Assessment with Autoencoders Reveals Characteristics of Memorable Images
by: Bagheri, Elham, et al.
Published: (2024)

Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM
by: Li, Baicheng, et al.
Published: (2024)

Memorizing SAM: 3D Medical Segment Anything Model with Memorizing Transformer
by: Shao, Xinyuan, et al.
Published: (2024)

Global-Local Tree Search in VLMs for 3D Indoor Scene Generation
by: Deng, Wei, et al.
Published: (2025)

World2VLM: Distilling World Model Imagination into VLMs for Dynamic Spatial Reasoning
by: Zhang, Wanyue, et al.
Published: (2026)

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
by: Zhao, Wangbo, et al.
Published: (2024)

Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
by: Zhang, Xintong, et al.
Published: (2025)

AdaThinkDrive: Adaptive Thinking via Reinforcement Learning for Autonomous Driving
by: Luo, Yuechen, et al.
Published: (2025)

Start Small, Think Big: Curriculum-based Relative Policy Optimization for Visual Grounding
by: Yan, Qingyang, et al.
Published: (2025)

How Diffusion Models Memorize
by: Kim, Juyeop, et al.
Published: (2025)

XSPA: Crafting Imperceptible X-Shaped Sparse Adversarial Perturbations for Transferable Attacks on VLMs
by: Hu, Chengyin, et al.
Published: (2026)

Empowering Dynamic Urban Navigation with Stereo and Mid-Level Vision
by: Zhou, Wentao, et al.
Published: (2025)

Long-Term Ad Memorability: Understanding & Generating Memorable Ads
by: SI, Harini, et al.
Published: (2023)

ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing
by: Li, Hengjia, et al.
Published: (2026)

Are VLMs Ready for Lane Topology Awareness in Autonomous Driving?
by: Chen, Xin, et al.
Published: (2025)

Filtering Memorization from Parameter-Space in Diffusion Models
by: Zhe, Yu, et al.
Published: (2026)