:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Junjie, Lou, Xinghua, Li, Jason, Tian, Ye, Chen, Keyu, Li, Yulin, Kang, Bin, Mai, Jacky, Li, Yanwei, Tian, Zhuotao, Nie, Liqiang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.19639
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception
by: Wang, Junjie, et al.
Published: (2025)

CalibCLIP: Contextual Calibration of Dominant Semantics for Text-Driven Image Retrieval
by: Kang, Bin, et al.
Published: (2025)

DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
by: Wang, Junjie, et al.
Published: (2025)

Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior
by: Li, Yulin, et al.
Published: (2025)

FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging
by: Fan, Ziyang, et al.
Published: (2026)

AgentSteerTTS: A Multi-Agent Closed-Loop Framework for Composite-Instruction Text-to-Speech
by: Kang, Bin, et al.
Published: (2026)

LISA: Reasoning Segmentation via Large Language Model
by: Lai, Xin, et al.
Published: (2023)

SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation
by: Li, Wei, et al.
Published: (2025)

Rectifying Latent Space for Generative Single-Image Reflection Removal
by: Li, Mingjia, et al.
Published: (2025)

A Visual-inertial Localization Algorithm using Opportunistic Visual Beacons and Dead-Reckoning for GNSS-Denied Large-scale Applications
by: Zhang, Liqiang, et al.
Published: (2024)

Efficient Reasoning with Balanced Thinking
by: Li, Yulin, et al.
Published: (2026)

Mitigating Object Hallucinations via Sentence-Level Early Intervention
by: Peng, Shangpin, et al.
Published: (2025)

Unified Language-driven Zero-shot Domain Adaptation
by: Yang, Senqiao, et al.
Published: (2024)

Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation
by: Ning, Zhenhua, et al.
Published: (2025)

MIRROR: Multimodal Iterative Reasoning via Reflection on Visual Regions
by: Zhang, Haoyu, et al.
Published: (2026)

Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
by: Yuan, Haobo, et al.
Published: (2025)

Video-ToC: Video Tree-of-Cue Reasoning
by: Tan, Qizhong, et al.
Published: (2026)

Tracking Reflected Objects: A Benchmark
by: Guo, Xiaoyu, et al.
Published: (2024)

CoRe^2: Collect, Reflect and Refine to Generate Better and Faster
by: Shao, Shitong, et al.
Published: (2025)

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement
by: Liu, Zhihang, et al.
Published: (2025)

SPIRAL: Self-Evolving Action-Conditioned Video Generation via Reflective Planning Agents
by: Yang, Yu, et al.
Published: (2026)

Towards Reflected Object Detection: A Benchmark
by: Wu, Yiquan, et al.
Published: (2024)

SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain
by: Zhou, Jiawei, et al.
Published: (2025)

Edit360: 2D Image Edits to 3D Assets from Any Angle
by: Huang, Junchao, et al.
Published: (2025)

TableVista: Benchmarking Multimodal Table Reasoning under Visual and Structural Complexity
by: Yang, Zheyuan, et al.
Published: (2026)

GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation
by: Chen, Sixiang, et al.
Published: (2026)

Memory Forcing: Spatio-Temporal Memory for Consistent Scene Generation on Minecraft
by: Huang, Junchao, et al.
Published: (2025)

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
by: Tian, Keyu, et al.
Published: (2024)

SJD-VP: Speculative Jacobi Decoding with Verification Prediction for Autoregressive Image Generation
by: Shan, Bingqi, et al.
Published: (2026)

Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space
by: Chen, Chao, et al.
Published: (2025)

HumanEval-V: Benchmarking High-Level Visual Reasoning with Complex Diagrams in Coding Tasks
by: Zhang, Fengji, et al.
Published: (2024)

EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing
by: Khalid, Umar, et al.
Published: (2024)

Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners
by: Liu, Qingyang, et al.
Published: (2026)

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
by: Shao, Tong, et al.
Published: (2024)

Beyond Shortcuts: Mitigating Visual Illusions in Frozen VLMs via Qualitative Reasoning
by: Guo, Hao, et al.
Published: (2026)

LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model
by: Yang, Senqiao, et al.
Published: (2023)

ReflectCAP: Detailed Image Captioning with Reflective Memory
by: Min, Kyungmin, et al.
Published: (2026)

A Skill-augmented Agentic Framework and Benchmark for Multi-Video Understanding
by: Zhang, Yue, et al.
Published: (2026)

Reflection Generation for Composite Image Using Diffusion Model
by: Zhao, Haonan, et al.
Published: (2026)

A Multi-Agent Framework with Structured Reasoning and Reflective Refinement for Multimodal Empathetic Response Generation
by: Wang, Liping, et al.
Published: (2026)