:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chiang, Chia-Yen, Zhong, Ruikang, Ding, Jennifer, Wood, Joseph, Bee, Stephen, Jaber, Mona
Format:	Preprint
Published:	2024
Subjects:	Multimedia
Online Access:	https://arxiv.org/abs/2404.10528
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Intelligent Travel Activity Monitoring: Generalized Distributed Acoustic Sensing Approaches
by: Zhong, Ruikang, et al.
Published: (2025)

RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation
by: Yan, Feng, et al.
Published: (2024)

One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning
by: Sun, Hao, et al.
Published: (2024)

Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression
by: Ke, Jingcheng, et al.
Published: (2024)

Recognizing Everything from All Modalities at Once: Grounded Multimodal Universal Information Extraction
by: Zhang, Meishan, et al.
Published: (2024)

A multimodal stress detection dataset with facial expressions and physiological signals
by: Hosseini, Majid, et al.
Published: (2022)

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
by: Zhang, Zhenxing, et al.
Published: (2024)

RoSMM: A Robust and Secure Multi-Modal Watermarking Framework for Diffusion Models
by: Fang, ZhongLi, et al.
Published: (2025)

Short-Form Video Viewing Behavior Analysis and Multi-Step Viewing Time Prediction
by: Yen, Vu Thi Hai, et al.
Published: (2026)

Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity Prediction
by: Hsu, Chih-Chung, et al.
Published: (2024)

PRISM: Exposing and Resolving Spurious Isolation in Federated Multimodal Continual Learning
by: Wu, Beining, et al.
Published: (2026)

A low complexity contextual stacked ensemble-learning approach for pedestrian intent prediction
by: Chiang, Chia-Yen, et al.
Published: (2024)

FakeSV-VLM: Taming VLM for Detecting Fake Short-Video News via Progressive Mixture-Of-Experts Adapter
by: Wang, Junxi, et al.
Published: (2025)

Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides
by: Zhao, Jinghua, et al.
Published: (2025)

Anchoring Trends: Mitigating Social Media Popularity Prediction Drift via Feature Clustering and Expansion
by: Lee, Chia-Ming, et al.
Published: (2025)

RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving
by: Huang, Zhijian, et al.
Published: (2024)

MISS: Memory-efficient Instance Segmentation Framework By Visual Inductive Priors Flow Propagation
by: Hsu, Chih-Chung, et al.
Published: (2024)

Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks
by: Geiger, Jonas, et al.
Published: (2025)

Towards Alleviating Text-to-Image Retrieval Hallucination for CLIP in Zero-shot Learning
by: Wang, Hanyao, et al.
Published: (2024)

Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method
by: Zhang, Shufang, et al.
Published: (2025)

SCI-Reason: A Dataset with Chain-of-Thought Rationales for Complex Multimodal Reasoning in Academic Areas
by: Ma, Chenghao, et al.
Published: (2025)

VARFVV: View-Adaptive Real-Time Interactive Free-View Video Streaming with Edge Computing
by: Hu, Qiang, et al.
Published: (2025)

Multimodal LLM-based Query Paraphrasing for Video Search
by: Wu, Jiaxin, et al.
Published: (2024)

Stemphonic: All-at-once Flexible Multi-stem Music Generation
by: Wu, Shih-Lun, et al.
Published: (2026)

Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
by: Wu, Qiong, et al.
Published: (2024)

DeepTextMark: A Deep Learning-Driven Text Watermarking Approach for Identifying Large Language Model Generated Text
by: Munyer, Travis, et al.
Published: (2023)

Enabling immersive experiences in challenging network conditions
by: Aggarwal, Pooja, et al.
Published: (2023)

Augment Before Copy-Paste: Data and Memory Efficiency-Oriented Instance Segmentation Framework for Sport-scenes
by: Hsu, Chih-Chung, et al.
Published: (2024)

Interoperable Provenance Authentication of Broadcast Media using Open Standards-based Metadata, Watermarking and Cryptography
by: Simmons, John C., et al.
Published: (2024)

Multimodal Infusion Tuning for Large Models
by: Sun, Hao, et al.
Published: (2024)

AI TrackMate: Finally, Someone Who Will Give Your Music More Than Just "Sounds Great!"
by: Jiang, Yi-Lin, et al.
Published: (2024)

ChartAdapter: Large Vision-Language Model for Chart Summarization
by: Xu, Peixin, et al.
Published: (2024)

Music Arena: Live Evaluation for Text-to-Music
by: Kim, Yonghyun, et al.
Published: (2025)

Dynamic Interaction-Aware and Causality-Disentangled Framework for Multimodal Sentiment Analysis
by: Dong, Guangyuan, et al.
Published: (2026)

One Size Doesn't Fit All: Age-Aware Gamification Mechanics for Multimedia Learning Environments
by: Kaißer, Sarah, et al.
Published: (2025)

Enabling Distributed Generative Artificial Intelligence in 6G: Mobile Edge Generation
by: Zhong, Ruikang, et al.
Published: (2024)

A Single Atlas is All You Need: Decoder-Side Gaussian Splatting for Immersive Video
by: Mieloch, Dawid, et al.
Published: (2026)

GustosonicSense: Towards understanding the design of playful gustosonic eating experiences
by: Wang, Yan, et al.
Published: (2024)

Differential Multimodal Transformers
by: Li, Jerry, et al.
Published: (2025)

Gamification with Purpose: What Learners Prefer to Motivate Their Learning
by: Marquardt, Kai, et al.
Published: (2025)