Saved in:
| Main Authors: | Zhu, Lei, Wang, Xiaobao, Yang, Jianbiao, Wang, Chenyang, He, Dongxiao, Wang, Longbiao, Dang, Jianwu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.02277 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations
by: Wu, Sheng, et al.
Published: (2024)
by: Wu, Sheng, et al.
Published: (2024)
Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content
by: Wu, Sheng, et al.
Published: (2024)
by: Wu, Sheng, et al.
Published: (2024)
Integration of Old and New Knowledge for Generalized Intent Discovery: A Consistency-driven Prototype-Prompting Framework
by: Wei, Xiao, et al.
Published: (2025)
by: Wei, Xiao, et al.
Published: (2025)
Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition
by: Shu, Yuchun, et al.
Published: (2024)
by: Shu, Yuchun, et al.
Published: (2024)
Breaking Data Efficiency Dilemma: A Federated and Augmented Learning Framework For Alzheimer's Disease Detection via Speech
by: Wei, Xiao, et al.
Published: (2026)
by: Wei, Xiao, et al.
Published: (2026)
ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning
by: Wang, Junyu, et al.
Published: (2025)
by: Wang, Junyu, et al.
Published: (2025)
Rethinking Contrastive Learning in Graph Anomaly Detection: A Clean-View Perspective
by: Jin, Di, et al.
Published: (2025)
by: Jin, Di, et al.
Published: (2025)
A Dynamic Knowledge Update-Driven Model with Large Language Models for Fake News Detection
by: Jin, Di, et al.
Published: (2025)
by: Jin, Di, et al.
Published: (2025)
Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
by: Wang, Haoyu, et al.
Published: (2024)
by: Wang, Haoyu, et al.
Published: (2024)
Pay More Attention To Audio: Mitigating Imbalance of Cross-Modal Attention in Large Audio Language Models
by: Wang, Junyu, et al.
Published: (2025)
by: Wang, Junyu, et al.
Published: (2025)
Scrambled text: training Language Models to correct OCR errors using synthetic data
by: Bourne, Jonathan
Published: (2024)
by: Bourne, Jonathan
Published: (2024)
POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
by: Li, Xuanchen, et al.
Published: (2025)
by: Li, Xuanchen, et al.
Published: (2025)
MSR-HuBERT: Self-supervised Pre-training for Adaptation to Multiple Sampling Rates
by: Huang, Zikang, et al.
Published: (2026)
by: Huang, Zikang, et al.
Published: (2026)
InstructAudio: Unified speech and music generation with natural language instruction
by: Qiang, Chunyu, et al.
Published: (2025)
by: Qiang, Chunyu, et al.
Published: (2025)
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
by: Gong, Cheng, et al.
Published: (2024)
by: Gong, Cheng, et al.
Published: (2024)
SecoustiCodec: Cross-Modal Aligned Streaming Single-Codecbook Speech Codec
by: Qiang, Chunyu, et al.
Published: (2025)
by: Qiang, Chunyu, et al.
Published: (2025)
Measuring short-form factuality in large language models
by: Wei, Jason, et al.
Published: (2024)
by: Wei, Jason, et al.
Published: (2024)
Long-form factuality in large language models
by: Wei, Jerry, et al.
Published: (2024)
by: Wei, Jerry, et al.
Published: (2024)
UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions
by: Qiang, Chunyu, et al.
Published: (2026)
by: Qiang, Chunyu, et al.
Published: (2026)
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
by: Qiang, Chunyu, et al.
Published: (2024)
by: Qiang, Chunyu, et al.
Published: (2024)
VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation
by: Song, Yixiao, et al.
Published: (2024)
by: Song, Yixiao, et al.
Published: (2024)
VeriFastScore: Speeding up long-form factuality evaluation
by: Rajendhran, Rishanth, et al.
Published: (2025)
by: Rajendhran, Rishanth, et al.
Published: (2025)
Collaborative decoding of critical tokens for boosting factuality of large language models
by: Jin, Lifeng, et al.
Published: (2024)
by: Jin, Lifeng, et al.
Published: (2024)
MLRIP: Pre-training a military language representation model with informative factual knowledge and professional knowledge base
by: Li, Hui, et al.
Published: (2022)
by: Li, Hui, et al.
Published: (2022)
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
by: Wu, Xiaobao
Published: (2025)
by: Wu, Xiaobao
Published: (2025)
RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection
by: Fu, Daocheng, et al.
Published: (2025)
by: Fu, Daocheng, et al.
Published: (2025)
KG-BiLM: Knowledge Graph Embedding via Bidirectional Language Models
by: Chen, Zirui, et al.
Published: (2025)
by: Chen, Zirui, et al.
Published: (2025)
LORT: Locally Refined Convolution and Taylor Transformer for Monaural Speech Enhancement
by: Wang, Junyu, et al.
Published: (2025)
by: Wang, Junyu, et al.
Published: (2025)
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement
by: Wang, Junyu, et al.
Published: (2024)
by: Wang, Junyu, et al.
Published: (2024)
AKEW: Assessing Knowledge Editing in the Wild
by: Wu, Xiaobao, et al.
Published: (2024)
by: Wu, Xiaobao, et al.
Published: (2024)
Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation
by: Wysocka, Magdalena, et al.
Published: (2023)
by: Wysocka, Magdalena, et al.
Published: (2023)
Enhancing Goal-oriented Proactive Dialogue Systems via Consistency Reflection and Correction
by: Zhang, Didi, et al.
Published: (2025)
by: Zhang, Didi, et al.
Published: (2025)
High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR
by: Banerjee, Sourav, et al.
Published: (2024)
by: Banerjee, Sourav, et al.
Published: (2024)
Tag and correct: high precision post-editing approach to correction of speech recognition errors
by: Ziętkiewicz, Tomasz
Published: (2024)
by: Ziętkiewicz, Tomasz
Published: (2024)
A safety realignment framework via subspace-oriented model fusion for large language models
by: Yi, Xin, et al.
Published: (2024)
by: Yi, Xin, et al.
Published: (2024)
Chain-of-Though (CoT) prompting strategies for medical error detection and correction
by: Wu, Zhaolong, et al.
Published: (2024)
by: Wu, Zhaolong, et al.
Published: (2024)
Progressive Residual Extraction based Pre-training for Speech Representation Learning
by: Wang, Tianrui, et al.
Published: (2024)
by: Wang, Tianrui, et al.
Published: (2024)
LLMs cannot find reasoning errors, but can correct them given the error location
by: Tyen, Gladys, et al.
Published: (2023)
by: Tyen, Gladys, et al.
Published: (2023)
BootTOD: Bootstrap Task-oriented Dialogue Representations by Aligning Diverse Responses
by: Zeng, Weihao, et al.
Published: (2024)
by: Zeng, Weihao, et al.
Published: (2024)
Generate Then Correct: Single Shot Global Correction for Aspect Sentiment Quad Prediction
by: He, Shidong, et al.
Published: (2026)
by: He, Shidong, et al.
Published: (2026)
Similar Items
-
AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations
by: Wu, Sheng, et al.
Published: (2024) -
Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content
by: Wu, Sheng, et al.
Published: (2024) -
Integration of Old and New Knowledge for Generalized Intent Discovery: A Consistency-driven Prototype-Prompting Framework
by: Wei, Xiao, et al.
Published: (2025) -
Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition
by: Shu, Yuchun, et al.
Published: (2024) -
Breaking Data Efficiency Dilemma: A Federated and Augmented Learning Framework For Alzheimer's Disease Detection via Speech
by: Wei, Xiao, et al.
Published: (2026)