Saved in:
| Main Authors: | Li, Haoxuan, Li, Mengyan, Zheng, Junjun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.07366 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming
by: Chen, Jiahui, et al.
Published: (2026)
by: Chen, Jiahui, et al.
Published: (2026)
RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements
by: Zheng, Guangcong, et al.
Published: (2025)
by: Zheng, Guangcong, et al.
Published: (2025)
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation
by: Dong, Wenqi, et al.
Published: (2025)
by: Dong, Wenqi, et al.
Published: (2025)
HiDE: Hierarchical Dictionary-Based Entropy Modeling for Learned Image Compression
by: Xiong, Haoxuan, et al.
Published: (2026)
by: Xiong, Haoxuan, et al.
Published: (2026)
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
by: Lin, Kevin Qinghong, et al.
Published: (2025)
by: Lin, Kevin Qinghong, et al.
Published: (2025)
VideoAuteur: Towards Long Narrative Video Generation
by: Xiao, Junfei, et al.
Published: (2025)
by: Xiao, Junfei, et al.
Published: (2025)
HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising
by: Zou, Kai, et al.
Published: (2026)
by: Zou, Kai, et al.
Published: (2026)
LLMs Behind the Scenes: Enabling Narrative Scene Illustration
by: Roemmele, Melissa, et al.
Published: (2025)
by: Roemmele, Melissa, et al.
Published: (2025)
HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation
by: Zhang, Ce, et al.
Published: (2024)
by: Zhang, Ce, et al.
Published: (2024)
Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos
by: Wu, Junyi, et al.
Published: (2025)
by: Wu, Junyi, et al.
Published: (2025)
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
by: Nadeem, Asmar, et al.
Published: (2024)
by: Nadeem, Asmar, et al.
Published: (2024)
TextVidBench: A Benchmark for Long Video Scene Text Understanding
by: Zhong, Yangyang, et al.
Published: (2025)
by: Zhong, Yangyang, et al.
Published: (2025)
Aether Weaver: Multimodal Affective Narrative Co-Generation with Dynamic Scene Graphs
by: Ghorbani, Saeed
Published: (2025)
by: Ghorbani, Saeed
Published: (2025)
VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models
by: Lan, Xiaohan, et al.
Published: (2024)
by: Lan, Xiaohan, et al.
Published: (2024)
BachVid: Training-Free Video Generation with Consistent Background and Character
by: Yan, Han, et al.
Published: (2025)
by: Yan, Han, et al.
Published: (2025)
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
by: Meng, Yihao, et al.
Published: (2025)
by: Meng, Yihao, et al.
Published: (2025)
CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives
by: Meng, Yihao, et al.
Published: (2026)
by: Meng, Yihao, et al.
Published: (2026)
HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation
by: Kwan, Ho Man, et al.
Published: (2023)
by: Kwan, Ho Man, et al.
Published: (2023)
VidLeaks: Membership Inference Attacks Against Text-to-Video Models
by: Wang, Li, et al.
Published: (2026)
by: Wang, Li, et al.
Published: (2026)
HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models
by: Guo, Yansong, et al.
Published: (2026)
by: Guo, Yansong, et al.
Published: (2026)
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
by: Wang, Yi, et al.
Published: (2023)
by: Wang, Yi, et al.
Published: (2023)
Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation
by: Li, Daxin, et al.
Published: (2025)
by: Li, Daxin, et al.
Published: (2025)
HiGS: Hierarchical Generative Scene Framework for Multi-Step Associative Semantic Spatial Composition
by: Hong, Jiacheng, et al.
Published: (2025)
by: Hong, Jiacheng, et al.
Published: (2025)
TTA-Vid: Generalized Test-Time Adaptation for Video Reasoning
by: Jahagirdar, Soumya Shamarao, et al.
Published: (2026)
by: Jahagirdar, Soumya Shamarao, et al.
Published: (2026)
OmniVid: A Generative Framework for Universal Video Understanding
by: Wang, Junke, et al.
Published: (2024)
by: Wang, Junke, et al.
Published: (2024)
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions
by: Qin, Bosheng, et al.
Published: (2023)
by: Qin, Bosheng, et al.
Published: (2023)
STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives
by: Wang, Bo, et al.
Published: (2025)
by: Wang, Bo, et al.
Published: (2025)
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
by: Wang, Xiaofeng, et al.
Published: (2024)
by: Wang, Xiaofeng, et al.
Published: (2024)
SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control
by: Zhang, Zhida, et al.
Published: (2026)
by: Zhang, Zhida, et al.
Published: (2026)
HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts
by: Liu, Xinyu, et al.
Published: (2024)
by: Liu, Xinyu, et al.
Published: (2024)
Narrative Aligned Long Form Video Question Answering
by: Jain, Rahul, et al.
Published: (2026)
by: Jain, Rahul, et al.
Published: (2026)
Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion
by: Parthasarathy, Rishab, et al.
Published: (2024)
by: Parthasarathy, Rishab, et al.
Published: (2024)
Generating Narrated Lecture Videos from Slides with Synchronized Highlights
by: Holmberg, Alexander
Published: (2025)
by: Holmberg, Alexander
Published: (2025)
STAGE: Storyboard-Anchored Generation for Cinematic Multi-shot Narrative
by: Zhang, Peixuan, et al.
Published: (2025)
by: Zhang, Peixuan, et al.
Published: (2025)
MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval
by: Cai, Weitong, et al.
Published: (2024)
by: Cai, Weitong, et al.
Published: (2024)
MedicalNarratives: Connecting Medical Vision and Language with Localized Narratives
by: Ikezogwo, Wisdom O., et al.
Published: (2025)
by: Ikezogwo, Wisdom O., et al.
Published: (2025)
Hi-GaTA: Hierarchical Gated Temporal Aggregation Adapter for Surgical Video Report Generation
by: Sun, Kedi, et al.
Published: (2026)
by: Sun, Kedi, et al.
Published: (2026)
VidCLearn: A Continual Learning Approach for Text-to-Video Generation
by: Zanchetta, Luca, et al.
Published: (2025)
by: Zanchetta, Luca, et al.
Published: (2025)
Controllable Generative Video Compression
by: Ding, Ding, et al.
Published: (2026)
by: Ding, Ding, et al.
Published: (2026)
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context
by: Li, Jungang, et al.
Published: (2024)
by: Li, Jungang, et al.
Published: (2024)
Similar Items
-
HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming
by: Chen, Jiahui, et al.
Published: (2026) -
RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements
by: Zheng, Guangcong, et al.
Published: (2025) -
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation
by: Dong, Wenqi, et al.
Published: (2025) -
HiDE: Hierarchical Dictionary-Based Entropy Modeling for Learned Image Compression
by: Xiong, Haoxuan, et al.
Published: (2026) -
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
by: Lin, Kevin Qinghong, et al.
Published: (2025)