:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Luo, Xiangyang, Cheng, Junhao, Xie, Yifan, Zhang, Xin, Feng, Tao, Liu, Zhou, Ma, Fei, Yu, Fei
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2503.23353
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MuseFace: Text-driven Face Editing via Diffusion-based Mask Generation Approach
by: Zhang, Xin, et al.
Published: (2025)

CCIS-Diff: A Generative Model with Stable Diffusion Prior for Controlled Colonoscopy Image Synthesis
by: Xie, Yifan, et al.
Published: (2024)

Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning
by: Xie, Yifan, et al.
Published: (2025)

DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
by: He, Huiguo, et al.
Published: (2024)

OnlineHOI: Towards Online Human-Object Interaction Generation and Perception
by: Ji, Yihong, et al.
Published: (2025)

Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis
by: Song, Tianyi, et al.
Published: (2023)

Universal Visuo-Tactile Video Understanding for Embodied Interaction
by: Xie, Yifan, et al.
Published: (2025)

SalM$^{2}$: An Extremely Lightweight Saliency Mamba Model for Real-Time Cognitive Awareness of Driver Attention
by: Zhao, Chunyu, et al.
Published: (2025)

ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context
by: Zheng, Sixiao, et al.
Published: (2024)

Learning Physical Dynamics for Object-centric Visual Prediction
by: Xu, Huilin, et al.
Published: (2024)

Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking
by: Zhou, Meng, et al.
Published: (2025)

Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models
by: Dong, Xinpeng, et al.
Published: (2026)

EEG-Driven 3D Object Reconstruction with Style Consistency and Diffusion Prior
by: Xiang, Xin, et al.
Published: (2024)

STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives
by: Wang, Bo, et al.
Published: (2025)

Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs
by: Yu, Liu, et al.
Published: (2025)

MMCL-Bench: Multimodal Context Learning from Visual Rules, Procedures, and Evidence
by: Chen, Yifan, et al.
Published: (2026)

TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization
by: Luo, Yucong, et al.
Published: (2024)

Collaborative Attention and Consistent-Guided Fusion of MRI and PET for Alzheimer's Disease Diagnosis
by: Ma, Delin, et al.
Published: (2025)

Integrating Object Interaction Self-Attention and GAN-Based Debiasing for Visual Question Answering
by: Li, Zhifei, et al.
Published: (2025)

Bridging Coarse and Fine Recognition: A Hybrid Approach for Open-Ended Multi-Granularity Object Recognition in Interactive Educational Games
by: Yi, Hanling, et al.
Published: (2026)

Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection
by: Ma, Yuhang, et al.
Published: (2024)

Architecture-Agnostic Modality-Isolated Gated Fusion for Robust Multi-Modal Prostate MRI Segmentation
by: Shu, Yongbo, et al.
Published: (2026)

Serial Over Parallel: Learning Continual Unification for Multi-Modal Visual Object Tracking and Benchmarking
by: Tang, Zhangyong, et al.
Published: (2025)

CARE: Contrastive Alignment for ADL Recognition from Event-Triggered Sensor Streams
by: Zhao, Junhao, et al.
Published: (2025)

Visual Object Tracking across Diverse Data Modalities: A Review
by: Wang, Mengmeng, et al.
Published: (2024)

Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning
by: Ge, Yuyao, et al.
Published: (2025)

RELO: Reinforcement Learning to Localize for Visual Object Tracking
by: Chen, Xin, et al.
Published: (2026)

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
by: Luo, Xiangyang, et al.
Published: (2026)

PathGLS: Evaluating Pathology Vision-Language Models without Ground Truth through Multi-Dimensional Consistency
by: Chen, Minbing, et al.
Published: (2026)

Correcting Visual Blur Induced by Attention Distraction to Reduce Hallucinations: Algorithm and Theory
by: Li, Quanjiang, et al.
Published: (2026)

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
by: Liu, Tao, et al.
Published: (2025)

Vision-Language Model Purified Semi-Supervised Semantic Segmentation for Remote Sensing Images
by: Wang, Shanwen, et al.
Published: (2026)

LocalMamba: Visual State Space Model with Windowed Selective Scan
by: Huang, Tao, et al.
Published: (2024)

UniSync: A Unified Framework for Audio-Visual Synchronization
by: Feng, Tao, et al.
Published: (2025)

DRMOT: A Dataset and Framework for RGBD Referring Multi-Object Tracking
by: Chen, Sijia, et al.
Published: (2026)

A Study of Commonsense Reasoning over Visual Object Properties
by: Kolari, Abhishek, et al.
Published: (2025)

HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention
by: Zheng, Xuzhe, et al.
Published: (2026)

A-VL: Adaptive Attention for Large Vision-Language Models
by: Zhang, Junyang, et al.
Published: (2024)

Cross-Layer Vision Smoothing: Enhancing Visual Understanding via Sustained Focus on Key Objects in Large Vision-Language Models
by: Zhao, Jianfei, et al.
Published: (2025)

Online Handwritten Signature Verification Based on Temporal-Spatial Graph Attention Transformer
by: Yuan, Hai-jie, et al.
Published: (2025)