Saved in:
| Main Authors: | Zhang, Liyun, Lian, Zheng, Liu, Hong, Takebe, Takanori, Nakashima, Yuta |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.09525 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
QuMAB: Query-based Multi-Annotator Behavior Modeling with Reliability under Sparse Labels
by: Zhang, Liyun, et al.
Published: (2025)
by: Zhang, Liyun, et al.
Published: (2025)
QuMATL: Query-based Multi-annotator Tendency Learning
by: Zhang, Liyun, et al.
Published: (2025)
by: Zhang, Liyun, et al.
Published: (2025)
A Unified Evaluation Framework for Multi-Annotator Tendency Learning
by: Zhang, Liyun, et al.
Published: (2025)
by: Zhang, Liyun, et al.
Published: (2025)
HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning
by: Zheng, Chuhang, et al.
Published: (2025)
by: Zheng, Chuhang, et al.
Published: (2025)
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
by: Chai, Yuxiang, et al.
Published: (2024)
by: Chai, Yuxiang, et al.
Published: (2024)
CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-Modal Multi-Label Emotion Recognition
by: Peng, Cheng, et al.
Published: (2023)
by: Peng, Cheng, et al.
Published: (2023)
Iterative Residual Cross-Attention Mechanism: An Integrated Approach for Audio-Visual Navigation Tasks
by: Zhang, Hailong, et al.
Published: (2025)
by: Zhang, Hailong, et al.
Published: (2025)
Memo2496: Expert-Annotated Dataset and Dual-View Adaptive Framework for Music Emotion Recognition
by: Li, Qilin, et al.
Published: (2025)
by: Li, Qilin, et al.
Published: (2025)
Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality
by: Park, Kyu Ri, et al.
Published: (2024)
by: Park, Kyu Ri, et al.
Published: (2024)
AniME: Adaptive Multi-Agent Planning for Long Animation Generation
by: Zhang, Lisai, et al.
Published: (2025)
by: Zhang, Lisai, et al.
Published: (2025)
Rethinking Prompting Strategies for Multi-Label Recognition with Partial Annotations
by: Rawlekar, Samyak, et al.
Published: (2024)
by: Rawlekar, Samyak, et al.
Published: (2024)
Stage Light is Sequence$^2$: Multi-Light Control via Imitation Learning
by: Zhao, Zijian, et al.
Published: (2026)
by: Zhao, Zijian, et al.
Published: (2026)
History-Guided Iterative Visual Reasoning with Self-Correction
by: Yang, Xinglong, et al.
Published: (2026)
by: Yang, Xinglong, et al.
Published: (2026)
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
by: Cheng, Zebang, et al.
Published: (2024)
by: Cheng, Zebang, et al.
Published: (2024)
FineFake: A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detection
by: Zhou, Ziyi, et al.
Published: (2024)
by: Zhou, Ziyi, et al.
Published: (2024)
Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval
by: Jiang, Chen, et al.
Published: (2023)
by: Jiang, Chen, et al.
Published: (2023)
A Survey of Multi-sensor Fusion Perception for Embodied AI: Background, Methods, Challenges and Prospects
by: Ruan, Shulan, et al.
Published: (2025)
by: Ruan, Shulan, et al.
Published: (2025)
OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination
by: Chen, Junzhe, et al.
Published: (2025)
by: Chen, Junzhe, et al.
Published: (2025)
Uni-Retrieval: A Multi-Style Retrieval Framework for STEM's Education
by: Jia, Yanhao, et al.
Published: (2025)
by: Jia, Yanhao, et al.
Published: (2025)
EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction
by: Jing, Chong, et al.
Published: (2026)
by: Jing, Chong, et al.
Published: (2026)
Cross Modification Attention Based Deliberation Model for Image Captioning
by: Lian, Zheng, et al.
Published: (2021)
by: Lian, Zheng, et al.
Published: (2021)
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
by: Baraldi, Lorenzo, et al.
Published: (2024)
by: Baraldi, Lorenzo, et al.
Published: (2024)
Wireless Video Semantic Communication with Decoupled Diffusion Multi-frame Compensation
by: Xie, Bingyan, et al.
Published: (2025)
by: Xie, Bingyan, et al.
Published: (2025)
LUST: A Multi-Modal Framework with Hierarchical LLM-based Scoring for Learned Thematic Significance Tracking in Multimedia Content
by: Luiz, Anderson de Lima
Published: (2025)
by: Luiz, Anderson de Lima
Published: (2025)
PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis
by: Xie, Heng, et al.
Published: (2025)
by: Xie, Heng, et al.
Published: (2025)
Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming
by: He, Zhiqiang, et al.
Published: (2025)
by: He, Zhiqiang, et al.
Published: (2025)
M3TR: Temporal Retrieval Enhanced Multi-Modal Micro-video Popularity Prediction
by: Lu, Jiacheng, et al.
Published: (2024)
by: Lu, Jiacheng, et al.
Published: (2024)
Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios
by: Zhang, Yuan, et al.
Published: (2024)
by: Zhang, Yuan, et al.
Published: (2024)
ELIQ: A Label-Free Framework for Quality Assessment of Evolving AI-Generated Images
by: Li, Xinyue, et al.
Published: (2026)
by: Li, Xinyue, et al.
Published: (2026)
A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task
by: Toyooka, Mashiro, et al.
Published: (2025)
by: Toyooka, Mashiro, et al.
Published: (2025)
Exposing Cross-Modal Consistency for Fake News Detection in Short-Form Videos
by: Tian, Chong, et al.
Published: (2026)
by: Tian, Chong, et al.
Published: (2026)
Harmony: A Unified Framework for Modality Incremental Learning
by: Song, Yaguang, et al.
Published: (2025)
by: Song, Yaguang, et al.
Published: (2025)
MM-HSD: Multi-Modal Hate Speech Detection in Videos
by: Céspedes-Sarrias, Berta, et al.
Published: (2025)
by: Céspedes-Sarrias, Berta, et al.
Published: (2025)
HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification
by: Ouyang, Shuyi, et al.
Published: (2024)
by: Ouyang, Shuyi, et al.
Published: (2024)
A Multi-modal Fusion Network for Terrain Perception Based on Illumination Aware
by: Wang, Rui, et al.
Published: (2025)
by: Wang, Rui, et al.
Published: (2025)
Contestable Multi-Agent Debate with Arena-based Argumentative Computation for Multimedia Verification
by: Nguyen, Truong Thanh Hung, et al.
Published: (2026)
by: Nguyen, Truong Thanh Hung, et al.
Published: (2026)
MaLoRA: Gated Modality LoRA for Key-Space Alignment in Multimodal LLM Fine-Tuning
by: Zheng, Xinhan, et al.
Published: (2025)
by: Zheng, Xinhan, et al.
Published: (2025)
Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs
by: Chen, Jianhao, et al.
Published: (2026)
by: Chen, Jianhao, et al.
Published: (2026)
Towards Open-Vocabulary Video Semantic Segmentation
by: Li, Xinhao, et al.
Published: (2024)
by: Li, Xinhao, et al.
Published: (2024)
TIP and Polish: Text-Image-Prototype Guided Multi-Modal Generation via Commonality-Discrepancy Modeling and Refinement
by: Ma, Zhiyong, et al.
Published: (2025)
by: Ma, Zhiyong, et al.
Published: (2025)
Similar Items
-
QuMAB: Query-based Multi-Annotator Behavior Modeling with Reliability under Sparse Labels
by: Zhang, Liyun, et al.
Published: (2025) -
QuMATL: Query-based Multi-annotator Tendency Learning
by: Zhang, Liyun, et al.
Published: (2025) -
A Unified Evaluation Framework for Multi-Annotator Tendency Learning
by: Zhang, Liyun, et al.
Published: (2025) -
HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning
by: Zheng, Chuhang, et al.
Published: (2025) -
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
by: Chai, Yuxiang, et al.
Published: (2024)