:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Jinghao, He, Qiyuan, Gu, Chunbin, Heng, Pheng-Ann
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.13179
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization
by: He, Qiyuan, et al.
Published: (2025)

AID: Attention Interpolation of Text-to-Image Diffusion
by: He, Qiyuan, et al.
Published: (2024)

Memory-Efficient Prompt Tuning for Incremental Histopathology Classification
by: Zhu, Yu, et al.
Published: (2024)

Comprehensive Generative Replay for Task-Incremental Segmentation with Concurrent Appearance and Semantic Forgetting
by: Li, Wei, et al.
Published: (2024)

VPG: Visual Prefix Guidance for Autoregressive Image and Video Generation
by: Liao, Xinyao, et al.
Published: (2026)

Unifying Physically-Informed Weather Priors in A Single Model for Image Restoration Across Multiple Adverse Weather Conditions
by: Xu, Jiaqi, et al.
Published: (2026)

Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms
by: Guo, Diandian, et al.
Published: (2024)

Towards Synchronous Memorizability and Generalizability with Site-Modulated Diffusion Replay for Cross-Site Continual Segmentation
by: Xu, Dunyuan, et al.
Published: (2024)

Deep Omni-supervised Learning for Rib Fracture Detection from Chest Radiology Images
by: Chai, Zhizhong, et al.
Published: (2023)

Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era
by: Hu, Xiaowei, et al.
Published: (2024)

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
by: Qi, Jingyuan, et al.
Published: (2025)

UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation
by: Wang, Yinqiao, et al.
Published: (2025)

SiMA-Hand: Boosting 3D Hand-Mesh Reconstruction by Single-to-Multi-View Adaptation
by: Wang, Yinqiao, et al.
Published: (2024)

Unlocking Positive Transfer in Incrementally Learning Surgical Instruments: A Self-reflection Hierarchical Prompt Framework
by: Zhu, Yu, et al.
Published: (2026)

Tiny-Engram: Trigger-Indexed Concept Tables for Generative Vision
by: Cai, Runyuan, et al.
Published: (2026)

Cross Prompting Consistency with Segment Anything Model for Semi-supervised Medical Image Segmentation
by: Miao, Juzheng, et al.
Published: (2024)

G2Face: High-Fidelity Reversible Face Anonymization via Generative and Geometric Priors
by: Yang, Haoxin, et al.
Published: (2024)

Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor Grading
by: Xu, Dunyuan, et al.
Published: (2024)

S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR
by: Pei, Jialun, et al.
Published: (2024)

Conceptrol: Concept Control of Zero-shot Personalized Image Generation
by: He, Qiyuan, et al.
Published: (2025)

SurgLQA: Scalable Long-Horizon Surgical Video Question Answering
by: Guo, Diandian, et al.
Published: (2026)

Surgical Workflow Recognition and Blocking Effectiveness Detection in Laparoscopic Liver Resections with Pringle Maneuver
by: Guo, Diandian, et al.
Published: (2024)

From Learning to Unlearning: Biomedical Security Protection in Multimodal Large Language Models
by: Xu, Dunyuan, et al.
Published: (2025)

MS2Mesh-XR: Multi-modal Sketch-to-Mesh Generation in XR Environments
by: Tong, Yuqi, et al.
Published: (2024)

Multi-scale Spatio-temporal Transformer-based Imbalanced Longitudinal Learning for Glaucoma Forecasting from Irregular Time Series Images
by: Yang, Xikai, et al.
Published: (2024)

Autoregressive Image Generation without Vector Quantization
by: Li, Tianhong, et al.
Published: (2024)

ImageFolder: Autoregressive Image Generation with Folded Tokens
by: Li, Xiang, et al.
Published: (2024)

Benchmarking Endoscopic Surgical Image Restoration and Beyond
by: Pei, Jialun, et al.
Published: (2025)

Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models
by: Xu, Jiaqi, et al.
Published: (2024)

Topology-Constrained Learning for Efficient Laparoscopic Liver Landmark Detection
by: Cui, Ruize, et al.
Published: (2025)

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both
by: Guo, Ziyu, et al.
Published: (2026)

Creating Virtual Environments with 3D Gaussian Splatting: A Comparative Study
by: Qiu, Shi, et al.
Published: (2025)

Advancing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects
by: Qiu, Shi, et al.
Published: (2024)

HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
by: Liu, Jiaming, et al.
Published: (2025)

Medical Large Vision Language Models with Multi-Image Visual Ability
by: Yang, Xikai, et al.
Published: (2025)

SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency
by: Song, Quanjian, et al.
Published: (2025)

Adapting 2D Multi-Modal Large Language Model for 3D CT Image Analysis
by: Yu, Yang, et al.
Published: (2026)

Revisiting Shadow Detection: A New Benchmark Dataset for Complex World
by: Hu, Xiaowei, et al.
Published: (2019)

A Narrative Review of Image Processing Techniques Related to Prostate Ultrasound
by: Wang, Haiqiao, et al.
Published: (2024)

Video Instance Shadow Detection Under the Sun and Sky
by: Xing, Zhenghao, et al.
Published: (2022)