Saved in:
| Main Authors: | Chiang, Chia-Yen, Zhong, Ruikang, Ding, Jennifer, Wood, Joseph, Bee, Stephen, Jaber, Mona |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.10528 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Intelligent Travel Activity Monitoring: Generalized Distributed Acoustic Sensing Approaches
by: Zhong, Ruikang, et al.
Published: (2025)
by: Zhong, Ruikang, et al.
Published: (2025)
RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation
by: Yan, Feng, et al.
Published: (2024)
by: Yan, Feng, et al.
Published: (2024)
One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning
by: Sun, Hao, et al.
Published: (2024)
by: Sun, Hao, et al.
Published: (2024)
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression
by: Ke, Jingcheng, et al.
Published: (2024)
by: Ke, Jingcheng, et al.
Published: (2024)
Recognizing Everything from All Modalities at Once: Grounded Multimodal Universal Information Extraction
by: Zhang, Meishan, et al.
Published: (2024)
by: Zhang, Meishan, et al.
Published: (2024)
A multimodal stress detection dataset with facial expressions and physiological signals
by: Hosseini, Majid, et al.
Published: (2022)
by: Hosseini, Majid, et al.
Published: (2022)
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
by: Zhang, Zhenxing, et al.
Published: (2024)
by: Zhang, Zhenxing, et al.
Published: (2024)
RoSMM: A Robust and Secure Multi-Modal Watermarking Framework for Diffusion Models
by: Fang, ZhongLi, et al.
Published: (2025)
by: Fang, ZhongLi, et al.
Published: (2025)
Short-Form Video Viewing Behavior Analysis and Multi-Step Viewing Time Prediction
by: Yen, Vu Thi Hai, et al.
Published: (2026)
by: Yen, Vu Thi Hai, et al.
Published: (2026)
Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity Prediction
by: Hsu, Chih-Chung, et al.
Published: (2024)
by: Hsu, Chih-Chung, et al.
Published: (2024)
PRISM: Exposing and Resolving Spurious Isolation in Federated Multimodal Continual Learning
by: Wu, Beining, et al.
Published: (2026)
by: Wu, Beining, et al.
Published: (2026)
A low complexity contextual stacked ensemble-learning approach for pedestrian intent prediction
by: Chiang, Chia-Yen, et al.
Published: (2024)
by: Chiang, Chia-Yen, et al.
Published: (2024)
FakeSV-VLM: Taming VLM for Detecting Fake Short-Video News via Progressive Mixture-Of-Experts Adapter
by: Wang, Junxi, et al.
Published: (2025)
by: Wang, Junxi, et al.
Published: (2025)
Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides
by: Zhao, Jinghua, et al.
Published: (2025)
by: Zhao, Jinghua, et al.
Published: (2025)
Anchoring Trends: Mitigating Social Media Popularity Prediction Drift via Feature Clustering and Expansion
by: Lee, Chia-Ming, et al.
Published: (2025)
by: Lee, Chia-Ming, et al.
Published: (2025)
RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving
by: Huang, Zhijian, et al.
Published: (2024)
by: Huang, Zhijian, et al.
Published: (2024)
MISS: Memory-efficient Instance Segmentation Framework By Visual Inductive Priors Flow Propagation
by: Hsu, Chih-Chung, et al.
Published: (2024)
by: Hsu, Chih-Chung, et al.
Published: (2024)
Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks
by: Geiger, Jonas, et al.
Published: (2025)
by: Geiger, Jonas, et al.
Published: (2025)
Towards Alleviating Text-to-Image Retrieval Hallucination for CLIP in Zero-shot Learning
by: Wang, Hanyao, et al.
Published: (2024)
by: Wang, Hanyao, et al.
Published: (2024)
Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method
by: Zhang, Shufang, et al.
Published: (2025)
by: Zhang, Shufang, et al.
Published: (2025)
SCI-Reason: A Dataset with Chain-of-Thought Rationales for Complex Multimodal Reasoning in Academic Areas
by: Ma, Chenghao, et al.
Published: (2025)
by: Ma, Chenghao, et al.
Published: (2025)
VARFVV: View-Adaptive Real-Time Interactive Free-View Video Streaming with Edge Computing
by: Hu, Qiang, et al.
Published: (2025)
by: Hu, Qiang, et al.
Published: (2025)
Multimodal LLM-based Query Paraphrasing for Video Search
by: Wu, Jiaxin, et al.
Published: (2024)
by: Wu, Jiaxin, et al.
Published: (2024)
Stemphonic: All-at-once Flexible Multi-stem Music Generation
by: Wu, Shih-Lun, et al.
Published: (2026)
by: Wu, Shih-Lun, et al.
Published: (2026)
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
by: Wu, Qiong, et al.
Published: (2024)
by: Wu, Qiong, et al.
Published: (2024)
DeepTextMark: A Deep Learning-Driven Text Watermarking Approach for Identifying Large Language Model Generated Text
by: Munyer, Travis, et al.
Published: (2023)
by: Munyer, Travis, et al.
Published: (2023)
Enabling immersive experiences in challenging network conditions
by: Aggarwal, Pooja, et al.
Published: (2023)
by: Aggarwal, Pooja, et al.
Published: (2023)
Augment Before Copy-Paste: Data and Memory Efficiency-Oriented Instance Segmentation Framework for Sport-scenes
by: Hsu, Chih-Chung, et al.
Published: (2024)
by: Hsu, Chih-Chung, et al.
Published: (2024)
Interoperable Provenance Authentication of Broadcast Media using Open Standards-based Metadata, Watermarking and Cryptography
by: Simmons, John C., et al.
Published: (2024)
by: Simmons, John C., et al.
Published: (2024)
Multimodal Infusion Tuning for Large Models
by: Sun, Hao, et al.
Published: (2024)
by: Sun, Hao, et al.
Published: (2024)
AI TrackMate: Finally, Someone Who Will Give Your Music More Than Just "Sounds Great!"
by: Jiang, Yi-Lin, et al.
Published: (2024)
by: Jiang, Yi-Lin, et al.
Published: (2024)
ChartAdapter: Large Vision-Language Model for Chart Summarization
by: Xu, Peixin, et al.
Published: (2024)
by: Xu, Peixin, et al.
Published: (2024)
Music Arena: Live Evaluation for Text-to-Music
by: Kim, Yonghyun, et al.
Published: (2025)
by: Kim, Yonghyun, et al.
Published: (2025)
Dynamic Interaction-Aware and Causality-Disentangled Framework for Multimodal Sentiment Analysis
by: Dong, Guangyuan, et al.
Published: (2026)
by: Dong, Guangyuan, et al.
Published: (2026)
One Size Doesn't Fit All: Age-Aware Gamification Mechanics for Multimedia Learning Environments
by: Kaißer, Sarah, et al.
Published: (2025)
by: Kaißer, Sarah, et al.
Published: (2025)
Enabling Distributed Generative Artificial Intelligence in 6G: Mobile Edge Generation
by: Zhong, Ruikang, et al.
Published: (2024)
by: Zhong, Ruikang, et al.
Published: (2024)
A Single Atlas is All You Need: Decoder-Side Gaussian Splatting for Immersive Video
by: Mieloch, Dawid, et al.
Published: (2026)
by: Mieloch, Dawid, et al.
Published: (2026)
GustosonicSense: Towards understanding the design of playful gustosonic eating experiences
by: Wang, Yan, et al.
Published: (2024)
by: Wang, Yan, et al.
Published: (2024)
Differential Multimodal Transformers
by: Li, Jerry, et al.
Published: (2025)
by: Li, Jerry, et al.
Published: (2025)
Gamification with Purpose: What Learners Prefer to Motivate Their Learning
by: Marquardt, Kai, et al.
Published: (2025)
by: Marquardt, Kai, et al.
Published: (2025)
Similar Items
-
Intelligent Travel Activity Monitoring: Generalized Distributed Acoustic Sensing Approaches
by: Zhong, Ruikang, et al.
Published: (2025) -
RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation
by: Yan, Feng, et al.
Published: (2024) -
One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning
by: Sun, Hao, et al.
Published: (2024) -
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression
by: Ke, Jingcheng, et al.
Published: (2024) -
Recognizing Everything from All Modalities at Once: Grounded Multimodal Universal Information Extraction
by: Zhang, Meishan, et al.
Published: (2024)