:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Haoxuan, Li, Mengyan, Zheng, Junjun
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.07366
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming
by: Chen, Jiahui, et al.
Published: (2026)

RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements
by: Zheng, Guangcong, et al.
Published: (2025)

HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation
by: Dong, Wenqi, et al.
Published: (2025)

HiDE: Hierarchical Dictionary-Based Entropy Modeling for Learned Image Compression
by: Xiong, Haoxuan, et al.
Published: (2026)

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
by: Lin, Kevin Qinghong, et al.
Published: (2025)

VideoAuteur: Towards Long Narrative Video Generation
by: Xiao, Junfei, et al.
Published: (2025)

HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising
by: Zou, Kai, et al.
Published: (2026)

LLMs Behind the Scenes: Enabling Narrative Scene Illustration
by: Roemmele, Melissa, et al.
Published: (2025)

HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation
by: Zhang, Ce, et al.
Published: (2024)

Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos
by: Wu, Junyi, et al.
Published: (2025)

NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
by: Nadeem, Asmar, et al.
Published: (2024)

TextVidBench: A Benchmark for Long Video Scene Text Understanding
by: Zhong, Yangyang, et al.
Published: (2025)

Aether Weaver: Multimodal Affective Narrative Co-Generation with Dynamic Scene Graphs
by: Ghorbani, Saeed
Published: (2025)

VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models
by: Lan, Xiaohan, et al.
Published: (2024)

BachVid: Training-Free Video Generation with Consistent Background and Character
by: Yan, Han, et al.
Published: (2025)

HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
by: Meng, Yihao, et al.
Published: (2025)

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives
by: Meng, Yihao, et al.
Published: (2026)

HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation
by: Kwan, Ho Man, et al.
Published: (2023)

VidLeaks: Membership Inference Attacks Against Text-to-Video Models
by: Wang, Li, et al.
Published: (2026)

HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models
by: Guo, Yansong, et al.
Published: (2026)

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
by: Wang, Yi, et al.
Published: (2023)

Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation
by: Li, Daxin, et al.
Published: (2025)

HiGS: Hierarchical Generative Scene Framework for Multi-Step Associative Semantic Spatial Composition
by: Hong, Jiacheng, et al.
Published: (2025)

TTA-Vid: Generalized Test-Time Adaptation for Video Reasoning
by: Jahagirdar, Soumya Shamarao, et al.
Published: (2026)

OmniVid: A Generative Framework for Universal Video Understanding
by: Wang, Junke, et al.
Published: (2024)

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions
by: Qin, Bosheng, et al.
Published: (2023)

STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives
by: Wang, Bo, et al.
Published: (2025)

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
by: Wang, Xiaofeng, et al.
Published: (2024)

SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control
by: Zhang, Zhida, et al.
Published: (2026)

HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts
by: Liu, Xinyu, et al.
Published: (2024)

Narrative Aligned Long Form Video Question Answering
by: Jain, Rahul, et al.
Published: (2026)

Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion
by: Parthasarathy, Rishab, et al.
Published: (2024)

Generating Narrated Lecture Videos from Slides with Synchronized Highlights
by: Holmberg, Alexander
Published: (2025)

STAGE: Storyboard-Anchored Generation for Cinematic Multi-shot Narrative
by: Zhang, Peixuan, et al.
Published: (2025)

MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval
by: Cai, Weitong, et al.
Published: (2024)

MedicalNarratives: Connecting Medical Vision and Language with Localized Narratives
by: Ikezogwo, Wisdom O., et al.
Published: (2025)

Hi-GaTA: Hierarchical Gated Temporal Aggregation Adapter for Surgical Video Report Generation
by: Sun, Kedi, et al.
Published: (2026)

VidCLearn: A Continual Learning Approach for Text-to-Video Generation
by: Zanchetta, Luca, et al.
Published: (2025)

Controllable Generative Video Compression
by: Ding, Ding, et al.
Published: (2026)

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context
by: Li, Jungang, et al.
Published: (2024)