:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Songyuan, Yu, Weijiang, Liu, Ziyu, Tang, Guijian, Yang, Wenjing, Tan, Huibin, Xiao, Nong
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.04372
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reinforce to Learn, Elect to Reason: A Dual Paradigm for Video Reasoning
by: Yang, Songyuan, et al.
Published: (2026)

AnyUser: Translating Sketched User Intent into Domestic Robots
by: Yang, Songyuan, et al.
Published: (2026)

FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion
by: Lu, Yu, et al.
Published: (2025)

Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning
by: Zheng, Fudan, et al.
Published: (2024)

Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval
by: Liu, Zhuchenyang, et al.
Published: (2026)

FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis
by: Tan, Jiangtong, et al.
Published: (2025)

Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation
by: Han, Su Ho, et al.
Published: (2025)

KGEdit: Ambiguity-Aware Knowledge Graphs for Training-Free Precise Video Generation and Editing
by: Cai, Mingshu, et al.
Published: (2026)

BeyondFacial: Identity-Preserving Personalized Generation Beyond Facial Close-ups
by: Zhang, Songsong, et al.
Published: (2025)

Object-Aware Video Matting with Cross-Frame Guidance
by: Zhang, Huayu, et al.
Published: (2025)

Video Finetuning Improves Reasoning Between Frames
by: Yang, Ruiqi, et al.
Published: (2025)

Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features
by: Yang, Liying, et al.
Published: (2025)

Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
by: Jang, Sangwon, et al.
Published: (2025)

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
by: Wang, Zehan, et al.
Published: (2024)

Intensive Vision-guided Network for Radiology Report Generation
by: Zheng, Fudan, et al.
Published: (2024)

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos
by: Feng, X., et al.
Published: (2026)

Training-Free and Interpretable Hateful Video Detection via Multi-stage Adversarial Reasoning
by: Yang, Shuonan, et al.
Published: (2026)

FIS-DiT: Breaking the Few-Step Video Inference Barrier via Training-Free Frame Interleaved Sparsity
by: Tang, Jian, et al.
Published: (2026)

DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion
by: Hwang, Geunmin, et al.
Published: (2025)

HFS: Holistic Query-Aware Frame Selection for Efficient Video Reasoning
by: Yang, Yiqing, et al.
Published: (2025)

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention
by: Lu, Yu, et al.
Published: (2024)

Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs
by: Han, Kai, et al.
Published: (2024)

See It, Say It, Sorted: An Iterative Training-Free Framework for Visually-Grounded Multimodal Reasoning in LVLMs
by: Zhang, Yongchang, et al.
Published: (2026)

Auditing Training-Free 3D Shape Retrieval with Diffused Geodesic Moments
by: Du, Zhicheng, et al.
Published: (2026)

EchoPilot: Training-Free Ultrasound Video Segmentation via Scale-Space Semantic Prompting and Reliability-Gated Memory
by: Xiao, Ruiqiang, et al.
Published: (2026)

When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition
by: Sun, Xiaokun, et al.
Published: (2026)

Detection-Fusion for Knowledge Graph Extraction from Videos
by: Das, Taniya, et al.
Published: (2024)

Tracking Any Point with Frame-Event Fusion Network at High Frame Rate
by: Liu, Jiaxiong, et al.
Published: (2024)

UKnow: A Unified Knowledge Protocol with Multimodal Knowledge Graph Datasets for Reasoning and Vision-Language Pre-Training
by: Gong, Biao, et al.
Published: (2023)

Beyond the Last Frame: Process-aware Evaluation for Generative Video Reasoning
by: Li, Yifan, et al.
Published: (2025)

An Empirical Comparison of Video Frame Sampling Methods for Multi-Modal RAG Retrieval
by: Kandhare, Mahesh, et al.
Published: (2024)

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
by: Yang, Shaoshu, et al.
Published: (2024)

Beyond Boundary Frames: Context-Centric Video Interpolation with Audio-Visual Semantics
by: Deng, Yuchen, et al.
Published: (2025)

Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment
by: Shan, Ziyu, et al.
Published: (2024)

FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning
by: Ge, Haonan, et al.
Published: (2025)

Reasoning-Aware Multimodal Fusion for Hateful Video Detection
by: Yang, Shuonan, et al.
Published: (2025)

Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning
by: Tan, Zhangyun, et al.
Published: (2026)

Enhancing Perception Capabilities of Multimodal LLMs with Training-Free Fusion
by: Chen, Zhuokun, et al.
Published: (2024)

Progress-Aware Video Frame Captioning
by: Xue, Zihui, et al.
Published: (2024)

Dynamic Training-Free Fusion of Subject and Style LoRAs
by: Cao, Qinglong, et al.
Published: (2026)