Saved in:
| Main Authors: | Luo, Xiangyang, Cheng, Junhao, Xie, Yifan, Zhang, Xin, Feng, Tao, Liu, Zhou, Ma, Fei, Yu, Fei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.23353 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MuseFace: Text-driven Face Editing via Diffusion-based Mask Generation Approach
by: Zhang, Xin, et al.
Published: (2025)
by: Zhang, Xin, et al.
Published: (2025)
CCIS-Diff: A Generative Model with Stable Diffusion Prior for Controlled Colonoscopy Image Synthesis
by: Xie, Yifan, et al.
Published: (2024)
by: Xie, Yifan, et al.
Published: (2024)
Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning
by: Xie, Yifan, et al.
Published: (2025)
by: Xie, Yifan, et al.
Published: (2025)
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
by: He, Huiguo, et al.
Published: (2024)
by: He, Huiguo, et al.
Published: (2024)
OnlineHOI: Towards Online Human-Object Interaction Generation and Perception
by: Ji, Yihong, et al.
Published: (2025)
by: Ji, Yihong, et al.
Published: (2025)
Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis
by: Song, Tianyi, et al.
Published: (2023)
by: Song, Tianyi, et al.
Published: (2023)
Universal Visuo-Tactile Video Understanding for Embodied Interaction
by: Xie, Yifan, et al.
Published: (2025)
by: Xie, Yifan, et al.
Published: (2025)
SalM$^{2}$: An Extremely Lightweight Saliency Mamba Model for Real-Time Cognitive Awareness of Driver Attention
by: Zhao, Chunyu, et al.
Published: (2025)
by: Zhao, Chunyu, et al.
Published: (2025)
ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context
by: Zheng, Sixiao, et al.
Published: (2024)
by: Zheng, Sixiao, et al.
Published: (2024)
Learning Physical Dynamics for Object-centric Visual Prediction
by: Xu, Huilin, et al.
Published: (2024)
by: Xu, Huilin, et al.
Published: (2024)
Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking
by: Zhou, Meng, et al.
Published: (2025)
by: Zhou, Meng, et al.
Published: (2025)
Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models
by: Dong, Xinpeng, et al.
Published: (2026)
by: Dong, Xinpeng, et al.
Published: (2026)
EEG-Driven 3D Object Reconstruction with Style Consistency and Diffusion Prior
by: Xiang, Xin, et al.
Published: (2024)
by: Xiang, Xin, et al.
Published: (2024)
STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives
by: Wang, Bo, et al.
Published: (2025)
by: Wang, Bo, et al.
Published: (2025)
Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs
by: Yu, Liu, et al.
Published: (2025)
by: Yu, Liu, et al.
Published: (2025)
MMCL-Bench: Multimodal Context Learning from Visual Rules, Procedures, and Evidence
by: Chen, Yifan, et al.
Published: (2026)
by: Chen, Yifan, et al.
Published: (2026)
TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization
by: Luo, Yucong, et al.
Published: (2024)
by: Luo, Yucong, et al.
Published: (2024)
Collaborative Attention and Consistent-Guided Fusion of MRI and PET for Alzheimer's Disease Diagnosis
by: Ma, Delin, et al.
Published: (2025)
by: Ma, Delin, et al.
Published: (2025)
Integrating Object Interaction Self-Attention and GAN-Based Debiasing for Visual Question Answering
by: Li, Zhifei, et al.
Published: (2025)
by: Li, Zhifei, et al.
Published: (2025)
Bridging Coarse and Fine Recognition: A Hybrid Approach for Open-Ended Multi-Granularity Object Recognition in Interactive Educational Games
by: Yi, Hanling, et al.
Published: (2026)
by: Yi, Hanling, et al.
Published: (2026)
Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection
by: Ma, Yuhang, et al.
Published: (2024)
by: Ma, Yuhang, et al.
Published: (2024)
Architecture-Agnostic Modality-Isolated Gated Fusion for Robust Multi-Modal Prostate MRI Segmentation
by: Shu, Yongbo, et al.
Published: (2026)
by: Shu, Yongbo, et al.
Published: (2026)
Serial Over Parallel: Learning Continual Unification for Multi-Modal Visual Object Tracking and Benchmarking
by: Tang, Zhangyong, et al.
Published: (2025)
by: Tang, Zhangyong, et al.
Published: (2025)
CARE: Contrastive Alignment for ADL Recognition from Event-Triggered Sensor Streams
by: Zhao, Junhao, et al.
Published: (2025)
by: Zhao, Junhao, et al.
Published: (2025)
Visual Object Tracking across Diverse Data Modalities: A Review
by: Wang, Mengmeng, et al.
Published: (2024)
by: Wang, Mengmeng, et al.
Published: (2024)
Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning
by: Ge, Yuyao, et al.
Published: (2025)
by: Ge, Yuyao, et al.
Published: (2025)
RELO: Reinforcement Learning to Localize for Visual Object Tracking
by: Chen, Xin, et al.
Published: (2026)
by: Chen, Xin, et al.
Published: (2026)
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
by: Luo, Xiangyang, et al.
Published: (2026)
by: Luo, Xiangyang, et al.
Published: (2026)
PathGLS: Evaluating Pathology Vision-Language Models without Ground Truth through Multi-Dimensional Consistency
by: Chen, Minbing, et al.
Published: (2026)
by: Chen, Minbing, et al.
Published: (2026)
Correcting Visual Blur Induced by Attention Distraction to Reduce Hallucinations: Algorithm and Theory
by: Li, Quanjiang, et al.
Published: (2026)
by: Li, Quanjiang, et al.
Published: (2026)
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
by: Liu, Tao, et al.
Published: (2025)
by: Liu, Tao, et al.
Published: (2025)
Vision-Language Model Purified Semi-Supervised Semantic Segmentation for Remote Sensing Images
by: Wang, Shanwen, et al.
Published: (2026)
by: Wang, Shanwen, et al.
Published: (2026)
LocalMamba: Visual State Space Model with Windowed Selective Scan
by: Huang, Tao, et al.
Published: (2024)
by: Huang, Tao, et al.
Published: (2024)
UniSync: A Unified Framework for Audio-Visual Synchronization
by: Feng, Tao, et al.
Published: (2025)
by: Feng, Tao, et al.
Published: (2025)
DRMOT: A Dataset and Framework for RGBD Referring Multi-Object Tracking
by: Chen, Sijia, et al.
Published: (2026)
by: Chen, Sijia, et al.
Published: (2026)
A Study of Commonsense Reasoning over Visual Object Properties
by: Kolari, Abhishek, et al.
Published: (2025)
by: Kolari, Abhishek, et al.
Published: (2025)
HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention
by: Zheng, Xuzhe, et al.
Published: (2026)
by: Zheng, Xuzhe, et al.
Published: (2026)
A-VL: Adaptive Attention for Large Vision-Language Models
by: Zhang, Junyang, et al.
Published: (2024)
by: Zhang, Junyang, et al.
Published: (2024)
Cross-Layer Vision Smoothing: Enhancing Visual Understanding via Sustained Focus on Key Objects in Large Vision-Language Models
by: Zhao, Jianfei, et al.
Published: (2025)
by: Zhao, Jianfei, et al.
Published: (2025)
Online Handwritten Signature Verification Based on Temporal-Spatial Graph Attention Transformer
by: Yuan, Hai-jie, et al.
Published: (2025)
by: Yuan, Hai-jie, et al.
Published: (2025)
Similar Items
-
MuseFace: Text-driven Face Editing via Diffusion-based Mask Generation Approach
by: Zhang, Xin, et al.
Published: (2025) -
CCIS-Diff: A Generative Model with Stable Diffusion Prior for Controlled Colonoscopy Image Synthesis
by: Xie, Yifan, et al.
Published: (2024) -
Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning
by: Xie, Yifan, et al.
Published: (2025) -
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
by: He, Huiguo, et al.
Published: (2024) -
OnlineHOI: Towards Online Human-Object Interaction Generation and Perception
by: Ji, Yihong, et al.
Published: (2025)