:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Yizhi, Chen, Xiaohan, Jiang, Miao, Tang, Wentao, Wang, Gaoang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.23228
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MovieCORE: COgnitive REasoning in Movies
by: Faure, Gueter Josmy, et al.
Published: (2025)

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
by: Ji, Yatai, et al.
Published: (2024)

ScreenWriter: Automatic Screenplay Generation and Movie Summarisation
by: Mahon, Louis, et al.
Published: (2024)

Demystifying Visual Features of Movie Posters for Multi-Label Genre Identification
by: Nareti, Utsav Kumar, et al.
Published: (2023)

Predicting Brain Responses To Natural Movies With Multimodal LLMs
by: Villanueva, Cesar Kadir Torrico, et al.
Published: (2025)

ScriptViz: A Visualization Tool to Aid Scriptwriting based on a Large Movie Database
by: Rao, Anyi, et al.
Published: (2024)

Movie Gen: A Cast of Media Foundation Models
by: Polyak, Adam, et al.
Published: (2024)

StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles
by: Oliveira, Daniel, et al.
Published: (2026)

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
by: Song, Enxin, et al.
Published: (2024)

Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning
by: Zhang, Zhenyu, et al.
Published: (2023)

Predicting Movie Production Years through Facial Recognition of Actors with Machine Learning
by: Abdalah, Asraa Muayed, et al.
Published: (2025)

FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes
by: Liu, Jiaxuan, et al.
Published: (2026)

Movie2Story: A framework for understanding videos and telling stories in the form of novel text
by: Li, Kangning, et al.
Published: (2024)

Movie Trailer Genre Classification Using Multimodal Pretrained Features
by: Sulun, Serkan, et al.
Published: (2024)

See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent
by: Tang, Tianci, et al.
Published: (2026)

Incentivizing Tool-augmented Thinking with Images for Medical Image Analysis
by: Jiang, Yankai, et al.
Published: (2025)

Movie Gen: SWOT Analysis of Meta's Generative AI Foundation Model for Transforming Media Generation, Advertising, and Entertainment Industries
by: Ehtesham, Abul, et al.
Published: (2024)

WithAnyone: Towards Controllable and ID Consistent Image Generation
by: Xu, Hengyuan, et al.
Published: (2025)

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving
by: Huang, Jiehui, et al.
Published: (2024)

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
by: Wu, Weijia, et al.
Published: (2024)

LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound
by: Guo, Xuechen, et al.
Published: (2024)

Hi-LSplat: Hierarchical 3D Language Gaussian Splatting
by: Zhan, Chenlu, et al.
Published: (2025)

Movie101v2: Improved Movie Narration Benchmark
by: Yue, Zihao, et al.
Published: (2024)

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
by: Song, Enxin, et al.
Published: (2023)

Hand3R: Online 4D Hand-Scene Reconstruction in the Wild
by: Hu, Wendi, et al.
Published: (2026)

MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing
by: Zheng, Junjie, et al.
Published: (2025)

DynaHOI: Benchmarking Hand-Object Interaction for Dynamic Target
by: Hu, BoCheng, et al.
Published: (2026)

Large Language Model Guided Progressive Feature Alignment for Multimodal UAV Object Detection
by: Wu, Wentao, et al.
Published: (2025)

Towards Automated Movie Trailer Generation
by: Argaw, Dawit Mureja, et al.
Published: (2024)

Chasing Better Deep Image Priors between Over- and Under-parameterization
by: Wu, Qiming, et al.
Published: (2024)

Captain Cinema: Towards Short Movie Generation
by: Xiao, Junfei, et al.
Published: (2025)

Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
by: Song, Enxin, et al.
Published: (2025)

InstantID: Zero-shot Identity-Preserving Generation in Seconds
by: Wang, Qixun, et al.
Published: (2024)

Character-Centric Understanding of Animated Movies
by: Gui, Zhongrui, et al.
Published: (2025)

Exploring Homogeneous and Heterogeneous Consistent Label Associations for Unsupervised Visible-Infrared Person ReID
by: He, Lingfeng, et al.
Published: (2024)

ReTool-Video: Recursive Tool-Using Video Agents with Meta-Augmented Tool Grounding
by: Liu, Xiao, et al.
Published: (2026)

DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning
by: Ye, Fulong, et al.
Published: (2025)

Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism
by: Li, Kaiyu, et al.
Published: (2025)

Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision
by: Deng, Yu, et al.
Published: (2025)

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
by: Zhao, Canyu, et al.
Published: (2024)