Saved in:
| Main Authors: | Li, Yizhi, Chen, Xiaohan, Jiang, Miao, Tang, Wentao, Wang, Gaoang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.23228 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MovieCORE: COgnitive REasoning in Movies
by: Faure, Gueter Josmy, et al.
Published: (2025)
by: Faure, Gueter Josmy, et al.
Published: (2025)
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
by: Ji, Yatai, et al.
Published: (2024)
by: Ji, Yatai, et al.
Published: (2024)
ScreenWriter: Automatic Screenplay Generation and Movie Summarisation
by: Mahon, Louis, et al.
Published: (2024)
by: Mahon, Louis, et al.
Published: (2024)
Demystifying Visual Features of Movie Posters for Multi-Label Genre Identification
by: Nareti, Utsav Kumar, et al.
Published: (2023)
by: Nareti, Utsav Kumar, et al.
Published: (2023)
Predicting Brain Responses To Natural Movies With Multimodal LLMs
by: Villanueva, Cesar Kadir Torrico, et al.
Published: (2025)
by: Villanueva, Cesar Kadir Torrico, et al.
Published: (2025)
ScriptViz: A Visualization Tool to Aid Scriptwriting based on a Large Movie Database
by: Rao, Anyi, et al.
Published: (2024)
by: Rao, Anyi, et al.
Published: (2024)
Movie Gen: A Cast of Media Foundation Models
by: Polyak, Adam, et al.
Published: (2024)
by: Polyak, Adam, et al.
Published: (2024)
StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles
by: Oliveira, Daniel, et al.
Published: (2026)
by: Oliveira, Daniel, et al.
Published: (2026)
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
by: Song, Enxin, et al.
Published: (2024)
by: Song, Enxin, et al.
Published: (2024)
Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning
by: Zhang, Zhenyu, et al.
Published: (2023)
by: Zhang, Zhenyu, et al.
Published: (2023)
Predicting Movie Production Years through Facial Recognition of Actors with Machine Learning
by: Abdalah, Asraa Muayed, et al.
Published: (2025)
by: Abdalah, Asraa Muayed, et al.
Published: (2025)
FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes
by: Liu, Jiaxuan, et al.
Published: (2026)
by: Liu, Jiaxuan, et al.
Published: (2026)
Movie2Story: A framework for understanding videos and telling stories in the form of novel text
by: Li, Kangning, et al.
Published: (2024)
by: Li, Kangning, et al.
Published: (2024)
Movie Trailer Genre Classification Using Multimodal Pretrained Features
by: Sulun, Serkan, et al.
Published: (2024)
by: Sulun, Serkan, et al.
Published: (2024)
See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent
by: Tang, Tianci, et al.
Published: (2026)
by: Tang, Tianci, et al.
Published: (2026)
Incentivizing Tool-augmented Thinking with Images for Medical Image Analysis
by: Jiang, Yankai, et al.
Published: (2025)
by: Jiang, Yankai, et al.
Published: (2025)
Movie Gen: SWOT Analysis of Meta's Generative AI Foundation Model for Transforming Media Generation, Advertising, and Entertainment Industries
by: Ehtesham, Abul, et al.
Published: (2024)
by: Ehtesham, Abul, et al.
Published: (2024)
WithAnyone: Towards Controllable and ID Consistent Image Generation
by: Xu, Hengyuan, et al.
Published: (2025)
by: Xu, Hengyuan, et al.
Published: (2025)
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving
by: Huang, Jiehui, et al.
Published: (2024)
by: Huang, Jiehui, et al.
Published: (2024)
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
by: Wu, Weijia, et al.
Published: (2024)
by: Wu, Weijia, et al.
Published: (2024)
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound
by: Guo, Xuechen, et al.
Published: (2024)
by: Guo, Xuechen, et al.
Published: (2024)
Hi-LSplat: Hierarchical 3D Language Gaussian Splatting
by: Zhan, Chenlu, et al.
Published: (2025)
by: Zhan, Chenlu, et al.
Published: (2025)
Movie101v2: Improved Movie Narration Benchmark
by: Yue, Zihao, et al.
Published: (2024)
by: Yue, Zihao, et al.
Published: (2024)
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
by: Song, Enxin, et al.
Published: (2023)
by: Song, Enxin, et al.
Published: (2023)
Hand3R: Online 4D Hand-Scene Reconstruction in the Wild
by: Hu, Wendi, et al.
Published: (2026)
by: Hu, Wendi, et al.
Published: (2026)
MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing
by: Zheng, Junjie, et al.
Published: (2025)
by: Zheng, Junjie, et al.
Published: (2025)
DynaHOI: Benchmarking Hand-Object Interaction for Dynamic Target
by: Hu, BoCheng, et al.
Published: (2026)
by: Hu, BoCheng, et al.
Published: (2026)
Large Language Model Guided Progressive Feature Alignment for Multimodal UAV Object Detection
by: Wu, Wentao, et al.
Published: (2025)
by: Wu, Wentao, et al.
Published: (2025)
Towards Automated Movie Trailer Generation
by: Argaw, Dawit Mureja, et al.
Published: (2024)
by: Argaw, Dawit Mureja, et al.
Published: (2024)
Chasing Better Deep Image Priors between Over- and Under-parameterization
by: Wu, Qiming, et al.
Published: (2024)
by: Wu, Qiming, et al.
Published: (2024)
Captain Cinema: Towards Short Movie Generation
by: Xiao, Junfei, et al.
Published: (2025)
by: Xiao, Junfei, et al.
Published: (2025)
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
by: Song, Enxin, et al.
Published: (2025)
by: Song, Enxin, et al.
Published: (2025)
InstantID: Zero-shot Identity-Preserving Generation in Seconds
by: Wang, Qixun, et al.
Published: (2024)
by: Wang, Qixun, et al.
Published: (2024)
Character-Centric Understanding of Animated Movies
by: Gui, Zhongrui, et al.
Published: (2025)
by: Gui, Zhongrui, et al.
Published: (2025)
Exploring Homogeneous and Heterogeneous Consistent Label Associations for Unsupervised Visible-Infrared Person ReID
by: He, Lingfeng, et al.
Published: (2024)
by: He, Lingfeng, et al.
Published: (2024)
ReTool-Video: Recursive Tool-Using Video Agents with Meta-Augmented Tool Grounding
by: Liu, Xiao, et al.
Published: (2026)
by: Liu, Xiao, et al.
Published: (2026)
DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning
by: Ye, Fulong, et al.
Published: (2025)
by: Ye, Fulong, et al.
Published: (2025)
Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism
by: Li, Kaiyu, et al.
Published: (2025)
by: Li, Kaiyu, et al.
Published: (2025)
Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision
by: Deng, Yu, et al.
Published: (2025)
by: Deng, Yu, et al.
Published: (2025)
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
by: Zhao, Canyu, et al.
Published: (2024)
by: Zhao, Canyu, et al.
Published: (2024)
Similar Items
-
MovieCORE: COgnitive REasoning in Movies
by: Faure, Gueter Josmy, et al.
Published: (2025) -
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
by: Ji, Yatai, et al.
Published: (2024) -
ScreenWriter: Automatic Screenplay Generation and Movie Summarisation
by: Mahon, Louis, et al.
Published: (2024) -
Demystifying Visual Features of Movie Posters for Multi-Label Genre Identification
by: Nareti, Utsav Kumar, et al.
Published: (2023) -
Predicting Brain Responses To Natural Movies With Multimodal LLMs
by: Villanueva, Cesar Kadir Torrico, et al.
Published: (2025)