Saved in:
| Main Authors: | Xu, Tianlong, Zhang, Yi-Fan, Chu, Zhendong, Wang, Shen, Wen, Qingsong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.09403 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multimodal AI Teacher: Integrating Edge Computing and Reasoning Models for Enhanced Student Error Analysis
by: Tianlong Xu, et al.
Published: (2025)
by: Tianlong Xu, et al.
Published: (2025)
SOMONITOR: Combining Explainable AI & Large Language Models for Marketing Analytics
by: Farseev, Aleksandr, et al.
Published: (2024)
by: Farseev, Aleksandr, et al.
Published: (2024)
Modeling Human Responses to Multimodal AI Content
by: Shen, Zhiqi, et al.
Published: (2025)
by: Shen, Zhiqi, et al.
Published: (2025)
UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning
by: Bai, Hayes, et al.
Published: (2026)
by: Bai, Hayes, et al.
Published: (2026)
TopoCode: Topologically Informed Error Detection and Correction in Communication Systems
by: Guo, Hongzhi
Published: (2024)
by: Guo, Hongzhi
Published: (2024)
MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction
by: He, Jiajun, et al.
Published: (2024)
by: He, Jiajun, et al.
Published: (2024)
An Efficient NVoD Scheme Using Implicit Error Correction and Subchannels for Wireless Networks
by: Asorey-Cacheda, Rafael, et al.
Published: (2025)
by: Asorey-Cacheda, Rafael, et al.
Published: (2025)
Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning
by: Bin, Yi, et al.
Published: (2024)
by: Bin, Yi, et al.
Published: (2024)
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
by: Zhang, Yi-Fan, et al.
Published: (2025)
by: Zhang, Yi-Fan, et al.
Published: (2025)
Towards Pretraining Robust ASR Foundation Model with Acoustic-Aware Data Augmentation
by: Liu, Dancheng, et al.
Published: (2025)
by: Liu, Dancheng, et al.
Published: (2025)
ChartAdapter: Large Vision-Language Model for Chart Summarization
by: Xu, Peixin, et al.
Published: (2024)
by: Xu, Peixin, et al.
Published: (2024)
Crafting Dynamic Virtual Activities with Advanced Multimodal Models
by: Li, Changyang, et al.
Published: (2024)
by: Li, Changyang, et al.
Published: (2024)
Robust Steganography with Boundary-Preserving Overflow Alleviation and Adaptive Error Correction
by: Cheng, Yu, et al.
Published: (2024)
by: Cheng, Yu, et al.
Published: (2024)
Enhancing Film Grain Coding in VVC: Improving Encoding Quality and Efficiency
by: Menon, Vignesh V, et al.
Published: (2024)
by: Menon, Vignesh V, et al.
Published: (2024)
MindCine: Multimodal EEG-to-Video Reconstruction with Large-Scale Pretrained Models
by: Zhou, Tian-Yi, et al.
Published: (2026)
by: Zhou, Tian-Yi, et al.
Published: (2026)
Fact-Checking with Contextual Narratives: Leveraging Retrieval-Augmented LLMs for Social Media Analysis
by: Dey, Arka Ujjal, et al.
Published: (2025)
by: Dey, Arka Ujjal, et al.
Published: (2025)
Rethinking Bjøntegaard Delta for Compression Efficiency Evaluation: Are We Calculating It Precisely and Reliably?
by: Hang, Xinyu, et al.
Published: (2024)
by: Hang, Xinyu, et al.
Published: (2024)
PC-JND: Subjective Study and Dataset on Just Noticeable Difference for Point Clouds in 6DoF Virtual Reality
by: Fan, Chunling, et al.
Published: (2025)
by: Fan, Chunling, et al.
Published: (2025)
Building and Evaluating a Realistic Virtual World for Large Scale Urban Exploration from 360° Videos
by: Takenawa, Mizuki, et al.
Published: (2025)
by: Takenawa, Mizuki, et al.
Published: (2025)
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models
by: Zhang, Yao, et al.
Published: (2025)
by: Zhang, Yao, et al.
Published: (2025)
StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation
by: Yang, An, et al.
Published: (2025)
by: Yang, An, et al.
Published: (2025)
CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization
by: Chen, Nan, et al.
Published: (2024)
by: Chen, Nan, et al.
Published: (2024)
MindFuse: Towards GenAI Explainability in Marketing Strategy Co-Creation
by: Farseev, Aleksandr, et al.
Published: (2025)
by: Farseev, Aleksandr, et al.
Published: (2025)
Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method
by: Zhang, Shufang, et al.
Published: (2025)
by: Zhang, Shufang, et al.
Published: (2025)
Enhancing Video Music Recommendation with Transformer-Driven Audio-Visual Embeddings
by: Liu, Shimiao, et al.
Published: (2025)
by: Liu, Shimiao, et al.
Published: (2025)
Bringing Robots Home: The Rise of AI Robots in Consumer Electronics
by: Dong, Haiwei, et al.
Published: (2024)
by: Dong, Haiwei, et al.
Published: (2024)
Multimodal Framework for Explainable Autonomous Driving: Integrating Video, Sensor, and Textual Data for Enhanced Decision-Making and Transparency
by: Zarghani, Abolfazl, et al.
Published: (2025)
by: Zarghani, Abolfazl, et al.
Published: (2025)
AMD: Autoregressive Motion Diffusion
by: Han, Bo, et al.
Published: (2023)
by: Han, Bo, et al.
Published: (2023)
MORE-R1: Guiding LVLM for Multimodal Object-Entity Relation Extraction via Stepwise Reasoning with Reinforcement Learning
by: Yuan, Xiang, et al.
Published: (2026)
by: Yuan, Xiang, et al.
Published: (2026)
HistLLM: A Unified Framework for LLM-Based Multimodal Recommendation with User History Encoding and Compression
by: Zhang, Chen, et al.
Published: (2025)
by: Zhang, Chen, et al.
Published: (2025)
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
by: Li, Lin, et al.
Published: (2024)
by: Li, Lin, et al.
Published: (2024)
Detecting Notational Errors in Digital Music Scores
by: Léo, Géré, et al.
Published: (2025)
by: Léo, Géré, et al.
Published: (2025)
DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis
by: Wang, Pan, et al.
Published: (2024)
by: Wang, Pan, et al.
Published: (2024)
Automatic Camera Trajectory Control with Enhanced Immersion for Virtual Cinematography
by: Wu, Xinyi, et al.
Published: (2023)
by: Wu, Xinyi, et al.
Published: (2023)
Feedback-Driven Rate Control for Learned Video Compression
by: Xu, Zhiheng, et al.
Published: (2026)
by: Xu, Zhiheng, et al.
Published: (2026)
Feature Coding in the Era of Large Models: Dataset, Test Conditions, and Benchmark
by: Gao, Changsheng, et al.
Published: (2024)
by: Gao, Changsheng, et al.
Published: (2024)
RETRACTION: English Writing Correction Based on Intelligent Text Semantic Analysis
by: Advances in Multimedia
Published: (2025)
by: Advances in Multimedia
Published: (2025)
SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding
by: Sun, Luoyi, et al.
Published: (2026)
by: Sun, Luoyi, et al.
Published: (2026)
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
by: Hu, Anwen, et al.
Published: (2023)
by: Hu, Anwen, et al.
Published: (2023)
XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments
by: Qian, Kangan, et al.
Published: (2026)
by: Qian, Kangan, et al.
Published: (2026)
Similar Items
-
Multimodal AI Teacher: Integrating Edge Computing and Reasoning Models for Enhanced Student Error Analysis
by: Tianlong Xu, et al.
Published: (2025) -
SOMONITOR: Combining Explainable AI & Large Language Models for Marketing Analytics
by: Farseev, Aleksandr, et al.
Published: (2024) -
Modeling Human Responses to Multimodal AI Content
by: Shen, Zhiqi, et al.
Published: (2025) -
UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning
by: Bai, Hayes, et al.
Published: (2026) -
TopoCode: Topologically Informed Error Detection and Correction in Communication Systems
by: Guo, Hongzhi
Published: (2024)