Saved in:
| Main Authors: | Wang, Yuan, Liao, Borui, Huang, Huijuan, Lu, Jinda, Li, Ouxiang, Liu, Kuien, Wang, Meng, Wang, Xiang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.04033 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
by: Wang, Yuan, et al.
Published: (2026)
by: Wang, Yuan, et al.
Published: (2026)
Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters
by: Wang, Yuan, et al.
Published: (2024)
by: Wang, Yuan, et al.
Published: (2024)
Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models
by: Liu, Zijian, et al.
Published: (2026)
by: Liu, Zijian, et al.
Published: (2026)
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
by: He, Zefeng, et al.
Published: (2025)
by: He, Zefeng, et al.
Published: (2025)
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
by: Li, Ouxiang, et al.
Published: (2025)
by: Li, Ouxiang, et al.
Published: (2025)
Beyond Where to Look: Trajectory-Guided Reinforcement Learning for Multimodal RLVR
by: Lu, Jinda, et al.
Published: (2026)
by: Lu, Jinda, et al.
Published: (2026)
When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition
by: Sun, Xiaokun, et al.
Published: (2026)
by: Sun, Xiaokun, et al.
Published: (2026)
Frame-Level Captions for Long Video Generation with Complex Multi Scenes
by: Zheng, Guangcong, et al.
Published: (2025)
by: Zheng, Guangcong, et al.
Published: (2025)
Self-supervised Learning of Event-guided Video Frame Interpolation for Rolling Shutter Frames
by: Lu, Yunfan, et al.
Published: (2023)
by: Lu, Yunfan, et al.
Published: (2023)
VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction
by: Ji, Longbin, et al.
Published: (2026)
by: Ji, Longbin, et al.
Published: (2026)
TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction
by: Ma, Yukuo, et al.
Published: (2025)
by: Ma, Yukuo, et al.
Published: (2025)
Detecting AI-Generated Video via Frame Consistency
by: Ma, Long, et al.
Published: (2024)
by: Ma, Long, et al.
Published: (2024)
Autoregressive Video Generation beyond Next Frames Prediction
by: Ren, Sucheng, et al.
Published: (2025)
by: Ren, Sucheng, et al.
Published: (2025)
FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning
by: Ge, Haonan, et al.
Published: (2025)
by: Ge, Haonan, et al.
Published: (2025)
Video Frame Interpolation for Polarization via Swin-Transformer
by: Huang, Feng, et al.
Published: (2024)
by: Huang, Feng, et al.
Published: (2024)
VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate
by: Yuan, Zhihang, et al.
Published: (2025)
by: Yuan, Zhihang, et al.
Published: (2025)
Rethinking Visual Content Refinement in Low-Shot CLIP Adaptation
by: Lu, Jinda, et al.
Published: (2024)
by: Lu, Jinda, et al.
Published: (2024)
360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation
by: Lu, Wenxuan, et al.
Published: (2024)
by: Lu, Wenxuan, et al.
Published: (2024)
Perception-Oriented Video Frame Interpolation via Asymmetric Blending
by: Wu, Guangyang, et al.
Published: (2024)
by: Wu, Guangyang, et al.
Published: (2024)
Velocity Disambiguation for Video Frame Interpolation
by: Zhong, Zhihang, et al.
Published: (2023)
by: Zhong, Zhihang, et al.
Published: (2023)
Frame by Familiar Frame: Understanding Replication in Video Diffusion Models
by: Rahman, Aimon, et al.
Published: (2024)
by: Rahman, Aimon, et al.
Published: (2024)
Frame-Voyager: Learning to Query Frames for Video Large Language Models
by: Yu, Sicheng, et al.
Published: (2024)
by: Yu, Sicheng, et al.
Published: (2024)
Frame In-N-Out: Unbounded Controllable Image-to-Video Generation
by: Wang, Boyang, et al.
Published: (2025)
by: Wang, Boyang, et al.
Published: (2025)
Generative Inbetweening through Frame-wise Conditions-Driven Video Generation
by: Zhu, Tianyi, et al.
Published: (2024)
by: Zhu, Tianyi, et al.
Published: (2024)
DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation
by: Yuan, Zhihang, et al.
Published: (2025)
by: Yuan, Zhihang, et al.
Published: (2025)
Motion-aware Latent Diffusion Models for Video Frame Interpolation
by: Huang, Zhilin, et al.
Published: (2024)
by: Huang, Zhilin, et al.
Published: (2024)
Beyond the Last Frame: Process-aware Evaluation for Generative Video Reasoning
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
VFIMamba: Video Frame Interpolation with State Space Models
by: Zhang, Guozhen, et al.
Published: (2024)
by: Zhang, Guozhen, et al.
Published: (2024)
STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives
by: Wang, Bo, et al.
Published: (2025)
by: Wang, Bo, et al.
Published: (2025)
InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
by: Yang, Shaoshu, et al.
Published: (2025)
by: Yang, Shaoshu, et al.
Published: (2025)
Sparse Global Matching for Video Frame Interpolation with Large Motion
by: Liu, Chunxu, et al.
Published: (2024)
by: Liu, Chunxu, et al.
Published: (2024)
Benchmarking Video Frame Interpolation
by: Kiefhaber, Simon, et al.
Published: (2024)
by: Kiefhaber, Simon, et al.
Published: (2024)
Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning
by: Ghazanfari, Sara, et al.
Published: (2025)
by: Ghazanfari, Sara, et al.
Published: (2025)
Mamba-FETrack: Frame-Event Tracking via State Space Model
by: Huang, Ju, et al.
Published: (2024)
by: Huang, Ju, et al.
Published: (2024)
End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling
by: Liang, Jianxin, et al.
Published: (2024)
by: Liang, Jianxin, et al.
Published: (2024)
FrameBridge: Improving Image-to-Video Generation with Bridge Models
by: Wang, Yuji, et al.
Published: (2024)
by: Wang, Yuji, et al.
Published: (2024)
Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models
by: Zhang, Lvmin, et al.
Published: (2025)
by: Zhang, Lvmin, et al.
Published: (2025)
DreamFrame: Enhancing Video Understanding via Automatically Generated QA and Style-Consistent Keyframes
by: Song, Zhende, et al.
Published: (2024)
by: Song, Zhende, et al.
Published: (2024)
DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation
by: Liu, Jiawei, et al.
Published: (2025)
by: Liu, Jiawei, et al.
Published: (2025)
Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding
by: Tan, Wenhui, et al.
Published: (2026)
by: Tan, Wenhui, et al.
Published: (2026)
Similar Items
-
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
by: Wang, Yuan, et al.
Published: (2026) -
Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters
by: Wang, Yuan, et al.
Published: (2024) -
Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models
by: Liu, Zijian, et al.
Published: (2026) -
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
by: He, Zefeng, et al.
Published: (2025) -
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
by: Li, Ouxiang, et al.
Published: (2025)