:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Tianlong, Zhang, Yi-Fan, Chu, Zhendong, Wang, Shen, Wen, Qingsong
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Multimedia
Online Access:	https://arxiv.org/abs/2409.09403
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Multimodal AI Teacher: Integrating Edge Computing and Reasoning Models for Enhanced Student Error Analysis
by: Tianlong Xu, et al.
Published: (2025)

SOMONITOR: Combining Explainable AI & Large Language Models for Marketing Analytics
by: Farseev, Aleksandr, et al.
Published: (2024)

Modeling Human Responses to Multimodal AI Content
by: Shen, Zhiqi, et al.
Published: (2025)

UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning
by: Bai, Hayes, et al.
Published: (2026)

TopoCode: Topologically Informed Error Detection and Correction in Communication Systems
by: Guo, Hongzhi
Published: (2024)

MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction
by: He, Jiajun, et al.
Published: (2024)

An Efficient NVoD Scheme Using Implicit Error Correction and Subchannels for Wireless Networks
by: Asorey-Cacheda, Rafael, et al.
Published: (2025)

Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning
by: Bin, Yi, et al.
Published: (2024)

From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
by: Zhang, Yi-Fan, et al.
Published: (2025)

Towards Pretraining Robust ASR Foundation Model with Acoustic-Aware Data Augmentation
by: Liu, Dancheng, et al.
Published: (2025)

ChartAdapter: Large Vision-Language Model for Chart Summarization
by: Xu, Peixin, et al.
Published: (2024)

Crafting Dynamic Virtual Activities with Advanced Multimodal Models
by: Li, Changyang, et al.
Published: (2024)

Robust Steganography with Boundary-Preserving Overflow Alleviation and Adaptive Error Correction
by: Cheng, Yu, et al.
Published: (2024)

Enhancing Film Grain Coding in VVC: Improving Encoding Quality and Efficiency
by: Menon, Vignesh V, et al.
Published: (2024)

MindCine: Multimodal EEG-to-Video Reconstruction with Large-Scale Pretrained Models
by: Zhou, Tian-Yi, et al.
Published: (2026)

Fact-Checking with Contextual Narratives: Leveraging Retrieval-Augmented LLMs for Social Media Analysis
by: Dey, Arka Ujjal, et al.
Published: (2025)

Rethinking Bjøntegaard Delta for Compression Efficiency Evaluation: Are We Calculating It Precisely and Reliably?
by: Hang, Xinyu, et al.
Published: (2024)

PC-JND: Subjective Study and Dataset on Just Noticeable Difference for Point Clouds in 6DoF Virtual Reality
by: Fan, Chunling, et al.
Published: (2025)

Building and Evaluating a Realistic Virtual World for Large Scale Urban Exploration from 360° Videos
by: Takenawa, Mizuki, et al.
Published: (2025)

FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models
by: Zhang, Yao, et al.
Published: (2025)

StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation
by: Yang, An, et al.
Published: (2025)

CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization
by: Chen, Nan, et al.
Published: (2024)

MindFuse: Towards GenAI Explainability in Marketing Strategy Co-Creation
by: Farseev, Aleksandr, et al.
Published: (2025)

Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method
by: Zhang, Shufang, et al.
Published: (2025)

Enhancing Video Music Recommendation with Transformer-Driven Audio-Visual Embeddings
by: Liu, Shimiao, et al.
Published: (2025)

Bringing Robots Home: The Rise of AI Robots in Consumer Electronics
by: Dong, Haiwei, et al.
Published: (2024)

Multimodal Framework for Explainable Autonomous Driving: Integrating Video, Sensor, and Textual Data for Enhanced Decision-Making and Transparency
by: Zarghani, Abolfazl, et al.
Published: (2025)

AMD: Autoregressive Motion Diffusion
by: Han, Bo, et al.
Published: (2023)

MORE-R1: Guiding LVLM for Multimodal Object-Entity Relation Extraction via Stepwise Reasoning with Reinforcement Learning
by: Yuan, Xiang, et al.
Published: (2026)

HistLLM: A Unified Framework for LLM-Based Multimodal Recommendation with User History Encoding and Compression
by: Zhang, Chen, et al.
Published: (2025)

A Survey on Multimodal Benchmarks: In the Era of Large AI Models
by: Li, Lin, et al.
Published: (2024)

Detecting Notational Errors in Digital Music Scores
by: Léo, Géré, et al.
Published: (2025)

DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis
by: Wang, Pan, et al.
Published: (2024)

Automatic Camera Trajectory Control with Enhanced Immersion for Virtual Cinematography
by: Wu, Xinyi, et al.
Published: (2023)

Feedback-Driven Rate Control for Learned Video Compression
by: Xu, Zhiheng, et al.
Published: (2026)

Feature Coding in the Era of Large Models: Dataset, Test Conditions, and Benchmark
by: Gao, Changsheng, et al.
Published: (2024)

RETRACTION: English Writing Correction Based on Intelligent Text Semantic Analysis
by: Advances in Multimedia
Published: (2025)

SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding
by: Sun, Luoyi, et al.
Published: (2026)

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
by: Hu, Anwen, et al.
Published: (2023)

XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments
by: Qian, Kangan, et al.
Published: (2026)