Saved in:
| Main Authors: | Wei, Xiao, Wen, Bin, Lin, Yuqin, Li, Kai, gu, Mingyang, Wang, Xiaobao, Wang, Longbiao, Dang, Jianwu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.14655 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations
by: Wu, Sheng, et al.
Published: (2024)
by: Wu, Sheng, et al.
Published: (2024)
Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content
by: Wu, Sheng, et al.
Published: (2024)
by: Wu, Sheng, et al.
Published: (2024)
Integration of Old and New Knowledge for Generalized Intent Discovery: A Consistency-driven Prototype-Prompting Framework
by: Wei, Xiao, et al.
Published: (2025)
by: Wei, Xiao, et al.
Published: (2025)
Rethinking Contrastive Learning in Graph Anomaly Detection: A Clean-View Perspective
by: Jin, Di, et al.
Published: (2025)
by: Jin, Di, et al.
Published: (2025)
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement
by: Wang, Junyu, et al.
Published: (2024)
by: Wang, Junyu, et al.
Published: (2024)
LORT: Locally Refined Convolution and Taylor Transformer for Monaural Speech Enhancement
by: Wang, Junyu, et al.
Published: (2025)
by: Wang, Junyu, et al.
Published: (2025)
MSR-HuBERT: Self-supervised Pre-training for Adaptation to Multiple Sampling Rates
by: Huang, Zikang, et al.
Published: (2026)
by: Huang, Zikang, et al.
Published: (2026)
CECOR: Correction-oriented synthetic data construction for factual error correction
by: Zhu, Lei, et al.
Published: (2026)
by: Zhu, Lei, et al.
Published: (2026)
Progressive Residual Extraction based Pre-training for Speech Representation Learning
by: Wang, Tianrui, et al.
Published: (2024)
by: Wang, Tianrui, et al.
Published: (2024)
Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition
by: Shu, Yuchun, et al.
Published: (2024)
by: Shu, Yuchun, et al.
Published: (2024)
ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning
by: Wang, Junyu, et al.
Published: (2025)
by: Wang, Junyu, et al.
Published: (2025)
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
by: Gong, Cheng, et al.
Published: (2023)
by: Gong, Cheng, et al.
Published: (2023)
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
by: Wang, Tianrui, et al.
Published: (2025)
by: Wang, Tianrui, et al.
Published: (2025)
Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module
by: Cui, Zhongjian, et al.
Published: (2025)
by: Cui, Zhongjian, et al.
Published: (2025)
POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
by: Li, Xuanchen, et al.
Published: (2025)
by: Li, Xuanchen, et al.
Published: (2025)
SecoustiCodec: Cross-Modal Aligned Streaming Single-Codecbook Speech Codec
by: Qiang, Chunyu, et al.
Published: (2025)
by: Qiang, Chunyu, et al.
Published: (2025)
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
by: Qiang, Chunyu, et al.
Published: (2024)
by: Qiang, Chunyu, et al.
Published: (2024)
Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
by: Wang, Haoyu, et al.
Published: (2024)
by: Wang, Haoyu, et al.
Published: (2024)
Efficient Emotion and Speaker Adaptation in LLM-Based TTS via Characteristic-Specific Partial Fine-Tuning
by: Wang, Tianrui, et al.
Published: (2025)
by: Wang, Tianrui, et al.
Published: (2025)
Separate First, Fuse Later: Mitigating Cross-Modal Interference in Audio-Visual LLMs Reasoning with Modality-Specific Chain-of-Thought
by: Li, Xuanchen, et al.
Published: (2026)
by: Li, Xuanchen, et al.
Published: (2026)
UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions
by: Qiang, Chunyu, et al.
Published: (2026)
by: Qiang, Chunyu, et al.
Published: (2026)
Pay More Attention To Audio: Mitigating Imbalance of Cross-Modal Attention in Large Audio Language Models
by: Wang, Junyu, et al.
Published: (2025)
by: Wang, Junyu, et al.
Published: (2025)
LEAP: Optimization Hierarchical Federated Learning on Non-IID Data with Coalition Formation Game
by: Lu, Jianfeng, et al.
Published: (2024)
by: Lu, Jianfeng, et al.
Published: (2024)
Perturbation Self-Supervised Representations for Cross-Lingual Emotion TTS: Stage-Wise Modeling of Emotion and Speaker
by: Gong, Cheng, et al.
Published: (2025)
by: Gong, Cheng, et al.
Published: (2025)
Exploring an Audio‐based Approach for Early Detection of Alzheimer’s Disease using Chinese Speech Data
by: Hung‐Wei Lee, et al.
Published: (2024)
by: Hung‐Wei Lee, et al.
Published: (2024)
Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments
by: Cheng, Longbiao, et al.
Published: (2026)
by: Cheng, Longbiao, et al.
Published: (2026)
Evaluating the Expressive Appropriateness of Speech in Rich Contexts
by: Wang, Tianrui, et al.
Published: (2026)
by: Wang, Tianrui, et al.
Published: (2026)
InstructAudio: Unified speech and music generation with natural language instruction
by: Qiang, Chunyu, et al.
Published: (2025)
by: Qiang, Chunyu, et al.
Published: (2025)
RAL:Redundancy-Aware Lipreading Model Based on Differential Learning with Symmetric Views
by: gu, Zejun, et al.
Published: (2024)
by: gu, Zejun, et al.
Published: (2024)
MaFMatch : Semi‐Supervised Medical Image Segmentation Network Based on Mixed Data and Feature Augmentation
by: Jianwu Long, et al.
Published: (2025)
by: Jianwu Long, et al.
Published: (2025)
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
by: Chen, Sen, et al.
Published: (2022)
by: Chen, Sen, et al.
Published: (2022)
Prediction of fatigue limit stress in C/SiC composites: Effect of stochastic load spectrum
by: Longbiao Li
Published: (2024)
by: Longbiao Li
Published: (2024)
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
by: Gong, Cheng, et al.
Published: (2024)
by: Gong, Cheng, et al.
Published: (2024)
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech
by: Li, Jingyu, et al.
Published: (2025)
by: Li, Jingyu, et al.
Published: (2025)
Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models
by: Zhang, Wenda, et al.
Published: (2026)
by: Zhang, Wenda, et al.
Published: (2026)
FAConvLSTM: Factorized-Attention ConvLSTM for Efficient Feature Extraction in Multivariate Climate Data
by: Nji, Francis Ndikum, et al.
Published: (2026)
by: Nji, Francis Ndikum, et al.
Published: (2026)
Breaking Latent Prior Bias in Detectors for Generalizable AIGC Image Detection
by: Zhou, Yue, et al.
Published: (2025)
by: Zhou, Yue, et al.
Published: (2025)
1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation
by: Li, Haoyu, et al.
Published: (2026)
by: Li, Haoyu, et al.
Published: (2026)
Intelligent Diagnosis of Alzheimer's Disease Based on Machine Learning
by: Li, Mingyang, et al.
Published: (2024)
by: Li, Mingyang, et al.
Published: (2024)
HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech
by: Dong, Zhongren, et al.
Published: (2024)
by: Dong, Zhongren, et al.
Published: (2024)
Similar Items
-
AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations
by: Wu, Sheng, et al.
Published: (2024) -
Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content
by: Wu, Sheng, et al.
Published: (2024) -
Integration of Old and New Knowledge for Generalized Intent Discovery: A Consistency-driven Prototype-Prompting Framework
by: Wei, Xiao, et al.
Published: (2025) -
Rethinking Contrastive Learning in Graph Anomaly Detection: A Clean-View Perspective
by: Jin, Di, et al.
Published: (2025) -
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement
by: Wang, Junyu, et al.
Published: (2024)