Saved in:
| Main Authors: | Wei, Jui-Chiang, Lin, Yi-Cheng, Ritter-Gutierrez, Fabian, Lee, Hung-yi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.07237 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Distilling a speech and music encoder with task arithmetic
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
ASTAR-NTU solution to AudioMOS Challenge 2025 Track1
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
Dataset-Distillation Generative Model for Speech Emotion Recognition
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
by: Lin, Yi-Cheng, et al.
Published: (2024)
by: Lin, Yi-Cheng, et al.
Published: (2024)
Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
Emo-bias: A Large Scale Evaluation of Social Bias on Speech Emotion Recognition
by: Lin, Yi-Cheng, et al.
Published: (2024)
by: Lin, Yi-Cheng, et al.
Published: (2024)
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection
by: Lin, Hsi-Che, et al.
Published: (2024)
by: Lin, Hsi-Che, et al.
Published: (2024)
SPAR-K: Scheduled Periodic Alternating Early Exit for Spoken Language Models
by: Huang, Hsiao-Ying, et al.
Published: (2026)
by: Huang, Hsiao-Ying, et al.
Published: (2026)
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models
by: Lu, Ke-Han, et al.
Published: (2025)
by: Lu, Ke-Han, et al.
Published: (2025)
MMMOS: Multi-domain Multi-axis Audio Quality Assessment
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
Gender Bias in Instruction-Guided Speech Synthesis Models
by: Kuan, Chun-Yi, et al.
Published: (2025)
by: Kuan, Chun-Yi, et al.
Published: (2025)
CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition
by: Tsai, Yun-Shao, et al.
Published: (2025)
by: Tsai, Yun-Shao, et al.
Published: (2025)
A correlation-permutation approach for speech-music encoders model merging
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
by: Lin, Tzu-Quan, et al.
Published: (2024)
by: Lin, Tzu-Quan, et al.
Published: (2024)
Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI
by: Lin, Yi-Cheng, et al.
Published: (2026)
by: Lin, Yi-Cheng, et al.
Published: (2026)
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
by: Wang, Shih-heng, et al.
Published: (2024)
by: Wang, Shih-heng, et al.
Published: (2024)
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech
by: Lin, Yi-Cheng, et al.
Published: (2026)
by: Lin, Yi-Cheng, et al.
Published: (2026)
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
by: Lu, Ke-Han, et al.
Published: (2024)
by: Lu, Ke-Han, et al.
Published: (2024)
HighRateMOS: Sampling-Rate Aware Modeling for Speech Quality Assessment
by: Ren, Wenze, et al.
Published: (2025)
by: Ren, Wenze, et al.
Published: (2025)
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
Reducing Object Hallucination in Large Audio-Language Models via Audio-Aware Decoding
by: Hsu, Tzu-wen, et al.
Published: (2025)
by: Hsu, Tzu-wen, et al.
Published: (2025)
SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
by: Hsu, Ming-Hao, et al.
Published: (2024)
by: Hsu, Ming-Hao, et al.
Published: (2024)
Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning
by: Shen, Liang-Yeh, et al.
Published: (2025)
by: Shen, Liang-Yeh, et al.
Published: (2025)
How Contrastive Decoding Enhances Large Audio Language Models?
by: Lin, Tzu-Quan, et al.
Published: (2026)
by: Lin, Tzu-Quan, et al.
Published: (2026)
MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model
by: Huang, Hsiao-Ying, et al.
Published: (2025)
by: Huang, Hsiao-Ying, et al.
Published: (2025)
MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment
by: Ren, Wenze, et al.
Published: (2026)
by: Ren, Wenze, et al.
Published: (2026)
Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget
by: Liu, Andy T., et al.
Published: (2024)
by: Liu, Andy T., et al.
Published: (2024)
Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)
by: Lin, Tzu-Quan, et al.
Published: (2025)
SpeechCaps: Advancing Instruction-Based Universal Speech Models with Multi-Talker Speaking Style Captioning
by: Huang, Chien-yu, et al.
Published: (2024)
by: Huang, Chien-yu, et al.
Published: (2024)
Parallel Synthesis for Autoregressive Speech Generation
by: Hsu, Po-chun, et al.
Published: (2022)
by: Hsu, Po-chun, et al.
Published: (2022)
Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance
by: Chou, Huang-Cheng, et al.
Published: (2024)
by: Chou, Huang-Cheng, et al.
Published: (2024)
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
Full-Duplex-Bench v1.5: Evaluating Overlap Handling for Full-Duplex Speech Models
by: Lin, Guan-Ting, et al.
Published: (2025)
by: Lin, Guan-Ting, et al.
Published: (2025)
USAD: Universal Speech and Audio Representation via Distillation
by: Chang, Heng-Jui, et al.
Published: (2025)
by: Chang, Heng-Jui, et al.
Published: (2025)
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
by: Tseng, Liang-Hsuan, et al.
Published: (2025)
by: Tseng, Liang-Hsuan, et al.
Published: (2025)
Similar Items
-
Distilling a speech and music encoder with task arithmetic
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025) -
ASTAR-NTU solution to AudioMOS Challenge 2025 Track1
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025) -
Dataset-Distillation Generative Model for Speech Emotion Recognition
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024) -
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
by: Lin, Yi-Cheng, et al.
Published: (2024) -
Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach
by: Lin, Yi-Cheng, et al.
Published: (2025)