Saved in:
| Main Authors: | Wang, Xinyu, Zhao, Ziyu, Bai, Ke, Meng, Silin, Shen, Dongming, Chang, Xiao-Wen, HE, Yixuan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.27808 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Back to Basics: Revisiting ASR in the Age of Voice Agents
by: Tay, Geeyang, et al.
Published: (2026)
by: Tay, Geeyang, et al.
Published: (2026)
Multi-modal Speech Emotion Recognition via Feature Distribution Adaptation Network
by: Li, Shaokai, et al.
Published: (2024)
by: Li, Shaokai, et al.
Published: (2024)
Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition
by: Nishida, Naoto, et al.
Published: (2025)
by: Nishida, Naoto, et al.
Published: (2025)
CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-Modal Multi-Label Emotion Recognition
by: Peng, Cheng, et al.
Published: (2023)
by: Peng, Cheng, et al.
Published: (2023)
AMD: Autoregressive Motion Diffusion
by: Han, Bo, et al.
Published: (2023)
by: Han, Bo, et al.
Published: (2023)
DA-PTQ: Drift-Aware Post-Training Quantization for Efficient Vision-Language-Action Models
by: Xu, Siyuan, et al.
Published: (2026)
by: Xu, Siyuan, et al.
Published: (2026)
Human-Inspired Computing for Robust and Efficient Audio-Visual Speech Recognition
by: Liu, Qianhui, et al.
Published: (2024)
by: Liu, Qianhui, et al.
Published: (2024)
Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques
by: Li, Yuanchao, et al.
Published: (2024)
by: Li, Yuanchao, et al.
Published: (2024)
Modality-Aware Contrastive and Uncertainty-Regularized Emotion Recognition
by: Zhuang, Yan, et al.
Published: (2026)
by: Zhuang, Yan, et al.
Published: (2026)
DAT: Dual-Aware Adaptive Transmission for Efficient Multimodal LLM Inference in Edge-Cloud Systems
by: Guo, Qi, et al.
Published: (2026)
by: Guo, Qi, et al.
Published: (2026)
Fully Automatic Content-Aware Tiling Pipeline for Pathology Whole Slide Images
by: Jabar, Falah, et al.
Published: (2024)
by: Jabar, Falah, et al.
Published: (2024)
Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation
by: Phan, Nghia, et al.
Published: (2026)
by: Phan, Nghia, et al.
Published: (2026)
GTLR-GS: Geometry-Texture Aware LiDAR-Regularized 3D Gaussian Splatting for Realistic Scene Reconstruction
by: Fang, Yan, et al.
Published: (2026)
by: Fang, Yan, et al.
Published: (2026)
SRA: Semantic Relation-Aware Flowchart Question Answering
by: Li, Xinyu, et al.
Published: (2026)
by: Li, Xinyu, et al.
Published: (2026)
Bridging the Gap: Sketch-Aware Interpolation Network for High-Quality Animation Sketch Inbetweening
by: Shen, Jiaming, et al.
Published: (2023)
by: Shen, Jiaming, et al.
Published: (2023)
Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark
by: Zhang, Han, et al.
Published: (2025)
by: Zhang, Han, et al.
Published: (2025)
State-Anchored Complete-View Distillation for Robust Conversational Multimodal Emotion Recognition
by: Pan, Zhaoyan, et al.
Published: (2026)
by: Pan, Zhaoyan, et al.
Published: (2026)
HADUA: Hierarchical Attention and Dynamic Uniform Alignment for Robust Cross-Subject Emotion Recognition
by: Tang, Jiahao, et al.
Published: (2026)
by: Tang, Jiahao, et al.
Published: (2026)
Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement
by: Su, Fei, et al.
Published: (2026)
by: Su, Fei, et al.
Published: (2026)
Dynamic Interaction-Aware and Causality-Disentangled Framework for Multimodal Sentiment Analysis
by: Dong, Guangyuan, et al.
Published: (2026)
by: Dong, Guangyuan, et al.
Published: (2026)
Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation
by: Yi, Zijian, et al.
Published: (2024)
by: Yi, Zijian, et al.
Published: (2024)
AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal Fusion
by: Gong, Ziyu, et al.
Published: (2024)
by: Gong, Ziyu, et al.
Published: (2024)
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization
by: Fernandez-Lopez, Adriana, et al.
Published: (2024)
by: Fernandez-Lopez, Adriana, et al.
Published: (2024)
LLM2Manim: Pedagogy-Aware AI Generation of STEM Animations
by: Joshi, Aastha, et al.
Published: (2026)
by: Joshi, Aastha, et al.
Published: (2026)
Predictability-Aware Motion Prediction for Edge XR via High-Order Error-State Kalman Filtering
by: Zhong, Ziyu, et al.
Published: (2025)
by: Zhong, Ziyu, et al.
Published: (2025)
High-Fidelity 3D Gaussian Human Reconstruction via Region-Aware Initialization and Geometric Priors
by: Liu, Yang, et al.
Published: (2026)
by: Liu, Yang, et al.
Published: (2026)
Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition
by: Wu, Linzhi, et al.
Published: (2026)
by: Wu, Linzhi, et al.
Published: (2026)
SpeechEE: A Novel Benchmark for Speech Event Extraction
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
Rethinking Bjøntegaard Delta for Compression Efficiency Evaluation: Are We Calculating It Precisely and Reliably?
by: Hang, Xinyu, et al.
Published: (2024)
by: Hang, Xinyu, et al.
Published: (2024)
Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition
by: Wang, Bingbing, et al.
Published: (2024)
by: Wang, Bingbing, et al.
Published: (2024)
Private Speech Classification without Collapse: Stabilized DP Training and Offline Distillation
by: Wen, Yadi, et al.
Published: (2026)
by: Wen, Yadi, et al.
Published: (2026)
Towards Structure-aware Model for Multi-modal Knowledge Graph Completion
by: Li, Linyu, et al.
Published: (2025)
by: Li, Linyu, et al.
Published: (2025)
CueNet: Robust Audio-Visual Speaker Extraction through Cross-Modal Cue Mining and Interaction
by: Wang, Jiadong, et al.
Published: (2026)
by: Wang, Jiadong, et al.
Published: (2026)
PDStream: Slashing Long-Tail Delay in Interactive Video Streaming via Pseudo-Dual Streaming
by: Xiao, Xuedou, et al.
Published: (2025)
by: Xiao, Xuedou, et al.
Published: (2025)
Speech-to-See: End-to-End Speech-Driven Open-Set Object Detection
by: Lu, Wenhuan, et al.
Published: (2025)
by: Lu, Wenhuan, et al.
Published: (2025)
Human-Machine Ritual: Synergic Performance through Real-Time Motion Recognition
by: Cai, Zhuodi, et al.
Published: (2025)
by: Cai, Zhuodi, et al.
Published: (2025)
Video Streaming with Kairos: An MPC-Based ABR with Streaming-Aware Throughput Prediction
by: Zhong, Ziyu, et al.
Published: (2025)
by: Zhong, Ziyu, et al.
Published: (2025)
UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning
by: Bai, Hayes, et al.
Published: (2026)
by: Bai, Hayes, et al.
Published: (2026)
Explainable Multimodal Emotion Recognition
by: Lian, Zheng, et al.
Published: (2023)
by: Lian, Zheng, et al.
Published: (2023)
Similar Items
-
Back to Basics: Revisiting ASR in the Age of Voice Agents
by: Tay, Geeyang, et al.
Published: (2026) -
Multi-modal Speech Emotion Recognition via Feature Distribution Adaptation Network
by: Li, Shaokai, et al.
Published: (2024) -
Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition
by: Nishida, Naoto, et al.
Published: (2025) -
CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-Modal Multi-Label Emotion Recognition
by: Peng, Cheng, et al.
Published: (2023) -
AMD: Autoregressive Motion Diffusion
by: Han, Bo, et al.
Published: (2023)