:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jin, Liuyi, Haroon, Amran, Stoleru, Radu, Gunawardena, Pasan, Middleton, Michael, Kim, Jeeeun
Format:	Preprint
Published:	2025
Subjects:	Multimedia Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.14119
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Smart-Glasses for Emergency Medical Services via Multimodal Multitask Learning
by: Jin, Liuyi, et al.
Published: (2025)

ERIC: Estimating Rainfall with Commodity Doorbell Camera for Precision Residential Irrigation
by: Liu, Tian, et al.
Published: (2024)

AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics
by: Dai, Xiangxiang, et al.
Published: (2024)

AI-Integrated Decision Support System for Real-Time Market Growth Forecasting and Multi-Source Content Diffusion Analytics
by: Yin, Ziqing, et al.
Published: (2025)

Semantic-Guided Unsupervised Video Summarization
by: Liu, Haizhou, et al.
Published: (2026)

GRACE: Loss-Resilient Real-Time Video through Neural Codecs
by: Cheng, Yihua, et al.
Published: (2023)

PromptMobile: Efficient Promptus for Low Bandwidth Mobile Video Streaming
by: Liu, Liming, et al.
Published: (2025)

EVER: Edge-Assisted Auto-Verification for Mobile MR-Aided Operation
by: Chen, Jiangong, et al.
Published: (2025)

RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
by: Ding, Muhe, et al.
Published: (2024)

Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study
by: Wang, Zhe, et al.
Published: (2024)

Towards Open-Vocabulary Video Semantic Segmentation
by: Li, Xinhao, et al.
Published: (2024)

Semantic Communication-Enabled Cloud-Edge-End-collaborative Metaverse Services Architecure
by: Li, Yuxuan, et al.
Published: (2025)

Human-in-the-Loop Bandwidth Estimation for Quality of Experience Optimization in Real-Time Video Communication
by: Khairy, Sami, et al.
Published: (2025)

MM-HSD: Multi-Modal Hate Speech Detection in Videos
by: Céspedes-Sarrias, Berta, et al.
Published: (2025)

Wireless Video Semantic Communication with Decoupled Diffusion Multi-frame Compensation
by: Xie, Bingyan, et al.
Published: (2025)

Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios
by: Zhang, Yuan, et al.
Published: (2024)

Attention of a Kiss: Exploring Attention Maps in Video Diffusion for XAIxArts
by: Cole, Adam, et al.
Published: (2025)

QMAVIS: Long Video-Audio Understanding using Fusion of Large Multimodal Models
by: Lin, Zixing, et al.
Published: (2026)

Exposing Cross-Modal Consistency for Fake News Detection in Short-Form Videos
by: Tian, Chong, et al.
Published: (2026)

LL-GABR: Energy Efficient Live Video Streaming Using Reinforcement Learning
by: Raman, Adithya, et al.
Published: (2024)

Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction
by: Wang, Dali, et al.
Published: (2026)

End-to-End Learning-based Video Streaming Enhancement Pipeline: A Generative AI Approach
by: Artioli, Emanuele, et al.
Published: (2025)

Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming
by: He, Zhiqiang, et al.
Published: (2025)

AsyncVoice Agent: Real-Time Explanation for LLM Planning and Reasoning
by: Lin, Yueqian, et al.
Published: (2025)

Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline
by: Oh, Minwoo, et al.
Published: (2025)

BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind
by: Mao, Yuanyuan, et al.
Published: (2024)

Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training
by: You, Hong-Jie, et al.
Published: (2025)

V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval
by: Kim, Donghyuk, et al.
Published: (2025)

MMTB: Evaluating Terminal Agents on Multimedia-File Tasks
by: Heo, Chiyeong, et al.
Published: (2026)

MTAVG-Bench 2.0: Diagnosing Failure Modes of Cinematic Expressiveness in Multi-Talker Audio-Video Generation
by: Li, Haitian, et al.
Published: (2026)

Co-Director: Agentic Generative Video Storytelling
by: Song, Yale, et al.
Published: (2026)

Stage Light is Sequence$^2$: Multi-Light Control via Imitation Learning
by: Zhao, Zijian, et al.
Published: (2026)

Towards Automatic Soccer Commentary Generation with Knowledge-Enhanced Visual Reasoning
by: Jin, Zeyu, et al.
Published: (2026)

DeLoad: Demand-Driven Short-Video Preloading with Scalable Watch-Time Estimation
by: Liu, Tong, et al.
Published: (2025)

HiQuE: Hierarchical Question Embedding Network for Multimodal Depression Detection
by: Jung, Juho, et al.
Published: (2024)

A Multimedia Analytics Model for the Foundation Model Era
by: Worring, Marcel, et al.
Published: (2025)

MAVOS-DD: Multilingual Audio-Video Open-Set Deepfake Detection Benchmark
by: Croitoru, Florinel-Alin, et al.
Published: (2025)

Controllable Video-to-Music Generation with Multiple Time-Varying Conditions
by: Wu, Junxian, et al.
Published: (2025)

LazyVLM: Neuro-Symbolic Approach to Video Analytics
by: Jian, Xiangru, et al.
Published: (2025)

Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation
by: Croitoru, Florinel-Alin, et al.
Published: (2022)