Saved in:
| Main Authors: | Yang, Songyuan, Yu, Weijiang, Liu, Ziyu, Tang, Guijian, Yang, Wenjing, Tan, Huibin, Xiao, Nong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.04372 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reinforce to Learn, Elect to Reason: A Dual Paradigm for Video Reasoning
by: Yang, Songyuan, et al.
Published: (2026)
by: Yang, Songyuan, et al.
Published: (2026)
AnyUser: Translating Sketched User Intent into Domestic Robots
by: Yang, Songyuan, et al.
Published: (2026)
by: Yang, Songyuan, et al.
Published: (2026)
FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion
by: Lu, Yu, et al.
Published: (2025)
by: Lu, Yu, et al.
Published: (2025)
Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning
by: Zheng, Fudan, et al.
Published: (2024)
by: Zheng, Fudan, et al.
Published: (2024)
Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval
by: Liu, Zhuchenyang, et al.
Published: (2026)
by: Liu, Zhuchenyang, et al.
Published: (2026)
FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis
by: Tan, Jiangtong, et al.
Published: (2025)
by: Tan, Jiangtong, et al.
Published: (2025)
Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation
by: Han, Su Ho, et al.
Published: (2025)
by: Han, Su Ho, et al.
Published: (2025)
KGEdit: Ambiguity-Aware Knowledge Graphs for Training-Free Precise Video Generation and Editing
by: Cai, Mingshu, et al.
Published: (2026)
by: Cai, Mingshu, et al.
Published: (2026)
BeyondFacial: Identity-Preserving Personalized Generation Beyond Facial Close-ups
by: Zhang, Songsong, et al.
Published: (2025)
by: Zhang, Songsong, et al.
Published: (2025)
Object-Aware Video Matting with Cross-Frame Guidance
by: Zhang, Huayu, et al.
Published: (2025)
by: Zhang, Huayu, et al.
Published: (2025)
Video Finetuning Improves Reasoning Between Frames
by: Yang, Ruiqi, et al.
Published: (2025)
by: Yang, Ruiqi, et al.
Published: (2025)
Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features
by: Yang, Liying, et al.
Published: (2025)
by: Yang, Liying, et al.
Published: (2025)
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
by: Jang, Sangwon, et al.
Published: (2025)
by: Jang, Sangwon, et al.
Published: (2025)
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
by: Wang, Zehan, et al.
Published: (2024)
by: Wang, Zehan, et al.
Published: (2024)
Intensive Vision-guided Network for Radiology Report Generation
by: Zheng, Fudan, et al.
Published: (2024)
by: Zheng, Fudan, et al.
Published: (2024)
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos
by: Feng, X., et al.
Published: (2026)
by: Feng, X., et al.
Published: (2026)
Training-Free and Interpretable Hateful Video Detection via Multi-stage Adversarial Reasoning
by: Yang, Shuonan, et al.
Published: (2026)
by: Yang, Shuonan, et al.
Published: (2026)
FIS-DiT: Breaking the Few-Step Video Inference Barrier via Training-Free Frame Interleaved Sparsity
by: Tang, Jian, et al.
Published: (2026)
by: Tang, Jian, et al.
Published: (2026)
DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion
by: Hwang, Geunmin, et al.
Published: (2025)
by: Hwang, Geunmin, et al.
Published: (2025)
HFS: Holistic Query-Aware Frame Selection for Efficient Video Reasoning
by: Yang, Yiqing, et al.
Published: (2025)
by: Yang, Yiqing, et al.
Published: (2025)
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention
by: Lu, Yu, et al.
Published: (2024)
by: Lu, Yu, et al.
Published: (2024)
Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs
by: Han, Kai, et al.
Published: (2024)
by: Han, Kai, et al.
Published: (2024)
See It, Say It, Sorted: An Iterative Training-Free Framework for Visually-Grounded Multimodal Reasoning in LVLMs
by: Zhang, Yongchang, et al.
Published: (2026)
by: Zhang, Yongchang, et al.
Published: (2026)
Auditing Training-Free 3D Shape Retrieval with Diffused Geodesic Moments
by: Du, Zhicheng, et al.
Published: (2026)
by: Du, Zhicheng, et al.
Published: (2026)
EchoPilot: Training-Free Ultrasound Video Segmentation via Scale-Space Semantic Prompting and Reliability-Gated Memory
by: Xiao, Ruiqiang, et al.
Published: (2026)
by: Xiao, Ruiqiang, et al.
Published: (2026)
When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition
by: Sun, Xiaokun, et al.
Published: (2026)
by: Sun, Xiaokun, et al.
Published: (2026)
Detection-Fusion for Knowledge Graph Extraction from Videos
by: Das, Taniya, et al.
Published: (2024)
by: Das, Taniya, et al.
Published: (2024)
Tracking Any Point with Frame-Event Fusion Network at High Frame Rate
by: Liu, Jiaxiong, et al.
Published: (2024)
by: Liu, Jiaxiong, et al.
Published: (2024)
UKnow: A Unified Knowledge Protocol with Multimodal Knowledge Graph Datasets for Reasoning and Vision-Language Pre-Training
by: Gong, Biao, et al.
Published: (2023)
by: Gong, Biao, et al.
Published: (2023)
Beyond the Last Frame: Process-aware Evaluation for Generative Video Reasoning
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
An Empirical Comparison of Video Frame Sampling Methods for Multi-Modal RAG Retrieval
by: Kandhare, Mahesh, et al.
Published: (2024)
by: Kandhare, Mahesh, et al.
Published: (2024)
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
by: Yang, Shaoshu, et al.
Published: (2024)
by: Yang, Shaoshu, et al.
Published: (2024)
Beyond Boundary Frames: Context-Centric Video Interpolation with Audio-Visual Semantics
by: Deng, Yuchen, et al.
Published: (2025)
by: Deng, Yuchen, et al.
Published: (2025)
Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment
by: Shan, Ziyu, et al.
Published: (2024)
by: Shan, Ziyu, et al.
Published: (2024)
FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning
by: Ge, Haonan, et al.
Published: (2025)
by: Ge, Haonan, et al.
Published: (2025)
Reasoning-Aware Multimodal Fusion for Hateful Video Detection
by: Yang, Shuonan, et al.
Published: (2025)
by: Yang, Shuonan, et al.
Published: (2025)
Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning
by: Tan, Zhangyun, et al.
Published: (2026)
by: Tan, Zhangyun, et al.
Published: (2026)
Enhancing Perception Capabilities of Multimodal LLMs with Training-Free Fusion
by: Chen, Zhuokun, et al.
Published: (2024)
by: Chen, Zhuokun, et al.
Published: (2024)
Progress-Aware Video Frame Captioning
by: Xue, Zihui, et al.
Published: (2024)
by: Xue, Zihui, et al.
Published: (2024)
Dynamic Training-Free Fusion of Subject and Style LoRAs
by: Cao, Qinglong, et al.
Published: (2026)
by: Cao, Qinglong, et al.
Published: (2026)
Similar Items
-
Reinforce to Learn, Elect to Reason: A Dual Paradigm for Video Reasoning
by: Yang, Songyuan, et al.
Published: (2026) -
AnyUser: Translating Sketched User Intent into Domestic Robots
by: Yang, Songyuan, et al.
Published: (2026) -
FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion
by: Lu, Yu, et al.
Published: (2025) -
Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning
by: Zheng, Fudan, et al.
Published: (2024) -
Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval
by: Liu, Zhuchenyang, et al.
Published: (2026)