:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wei, Xiao, Wen, Bin, Lin, Yuqin, Li, Kai, gu, Mingyang, Wang, Xiaobao, Wang, Longbiao, Dang, Jianwu
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.14655
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations
by: Wu, Sheng, et al.
Published: (2024)

Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content
by: Wu, Sheng, et al.
Published: (2024)

Integration of Old and New Knowledge for Generalized Intent Discovery: A Consistency-driven Prototype-Prompting Framework
by: Wei, Xiao, et al.
Published: (2025)

Rethinking Contrastive Learning in Graph Anomaly Detection: A Clean-View Perspective
by: Jin, Di, et al.
Published: (2025)

Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement
by: Wang, Junyu, et al.
Published: (2024)

LORT: Locally Refined Convolution and Taylor Transformer for Monaural Speech Enhancement
by: Wang, Junyu, et al.
Published: (2025)

MSR-HuBERT: Self-supervised Pre-training for Adaptation to Multiple Sampling Rates
by: Huang, Zikang, et al.
Published: (2026)

CECOR: Correction-oriented synthetic data construction for factual error correction
by: Zhu, Lei, et al.
Published: (2026)

Progressive Residual Extraction based Pre-training for Speech Representation Learning
by: Wang, Tianrui, et al.
Published: (2024)

Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition
by: Shu, Yuchun, et al.
Published: (2024)

ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning
by: Wang, Junyu, et al.
Published: (2025)

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
by: Gong, Cheng, et al.
Published: (2023)

Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
by: Wang, Tianrui, et al.
Published: (2025)

Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module
by: Cui, Zhongjian, et al.
Published: (2025)

POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
by: Li, Xuanchen, et al.
Published: (2025)

SecoustiCodec: Cross-Modal Aligned Streaming Single-Codecbook Speech Codec
by: Qiang, Chunyu, et al.
Published: (2025)

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
by: Qiang, Chunyu, et al.
Published: (2024)

Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
by: Wang, Haoyu, et al.
Published: (2024)

Efficient Emotion and Speaker Adaptation in LLM-Based TTS via Characteristic-Specific Partial Fine-Tuning
by: Wang, Tianrui, et al.
Published: (2025)

Separate First, Fuse Later: Mitigating Cross-Modal Interference in Audio-Visual LLMs Reasoning with Modality-Specific Chain-of-Thought
by: Li, Xuanchen, et al.
Published: (2026)

UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions
by: Qiang, Chunyu, et al.
Published: (2026)

Pay More Attention To Audio: Mitigating Imbalance of Cross-Modal Attention in Large Audio Language Models
by: Wang, Junyu, et al.
Published: (2025)

LEAP: Optimization Hierarchical Federated Learning on Non-IID Data with Coalition Formation Game
by: Lu, Jianfeng, et al.
Published: (2024)

Perturbation Self-Supervised Representations for Cross-Lingual Emotion TTS: Stage-Wise Modeling of Emotion and Speaker
by: Gong, Cheng, et al.
Published: (2025)

Exploring an Audio‐based Approach for Early Detection of Alzheimer’s Disease using Chinese Speech Data
by: Hung‐Wei Lee, et al.
Published: (2024)

Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments
by: Cheng, Longbiao, et al.
Published: (2026)

Evaluating the Expressive Appropriateness of Speech in Rich Contexts
by: Wang, Tianrui, et al.
Published: (2026)

InstructAudio: Unified speech and music generation with natural language instruction
by: Qiang, Chunyu, et al.
Published: (2025)

RAL:Redundancy-Aware Lipreading Model Based on Differential Learning with Symmetric Views
by: gu, Zejun, et al.
Published: (2024)

MaFMatch : Semi‐Supervised Medical Image Segmentation Network Based on Mixed Data and Feature Augmentation
by: Jianwu Long, et al.
Published: (2025)

Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
by: Chen, Sen, et al.
Published: (2022)

Prediction of fatigue limit stress in C/SiC composites: Effect of stochastic load spectrum
by: Longbiao Li
Published: (2024)

An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
by: Gong, Cheng, et al.
Published: (2024)

Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech
by: Li, Jingyu, et al.
Published: (2025)

Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models
by: Zhang, Wenda, et al.
Published: (2026)

FAConvLSTM: Factorized-Attention ConvLSTM for Efficient Feature Extraction in Multivariate Climate Data
by: Nji, Francis Ndikum, et al.
Published: (2026)

Breaking Latent Prior Bias in Detectors for Generalizable AIGC Image Detection
by: Zhou, Yue, et al.
Published: (2025)

1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation
by: Li, Haoyu, et al.
Published: (2026)

Intelligent Diagnosis of Alzheimer's Disease Based on Machine Learning
by: Li, Mingyang, et al.
Published: (2024)

HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech
by: Dong, Zhongren, et al.
Published: (2024)