Saved in:
| Main Authors: | Lin, Yi-Cheng, Lin, Tzu-Quan, Lin, Hsi-Che, Liu, Andy T., Lee, Hung-yi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.04997 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
by: Lin, Tzu-Quan, et al.
Published: (2024)
by: Lin, Tzu-Quan, et al.
Published: (2024)
MelHuBERT: A simplified HuBERT on Mel spectrograms
by: Lin, Tzu-Quan, et al.
Published: (2022)
by: Lin, Tzu-Quan, et al.
Published: (2022)
Property Neurons in Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2024)
by: Lin, Tzu-Quan, et al.
Published: (2024)
Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget
by: Liu, Andy T., et al.
Published: (2024)
by: Liu, Andy T., et al.
Published: (2024)
Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)
by: Lin, Tzu-Quan, et al.
Published: (2025)
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection
by: Lin, Hsi-Che, et al.
Published: (2024)
by: Lin, Hsi-Che, et al.
Published: (2024)
How Contrastive Decoding Enhances Large Audio Language Models?
by: Lin, Tzu-Quan, et al.
Published: (2026)
by: Lin, Tzu-Quan, et al.
Published: (2026)
Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2022)
by: Lin, Tzu-Quan, et al.
Published: (2022)
Emo-bias: A Large Scale Evaluation of Social Bias on Speech Emotion Recognition
by: Lin, Yi-Cheng, et al.
Published: (2024)
by: Lin, Yi-Cheng, et al.
Published: (2024)
Adaptive vector steering: A training-free, layer-wise intervention for hallucination mitigation in large audio and multimodal models
by: Lin, Tsung-En, et al.
Published: (2025)
by: Lin, Tsung-En, et al.
Published: (2025)
Gender Bias in Instruction-Guided Speech Synthesis Models
by: Kuan, Chun-Yi, et al.
Published: (2025)
by: Kuan, Chun-Yi, et al.
Published: (2025)
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning
by: Wu, Haibin, et al.
Published: (2021)
by: Wu, Haibin, et al.
Published: (2021)
Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?
by: Maharana, Sarthak Kumar, et al.
Published: (2023)
by: Maharana, Sarthak Kumar, et al.
Published: (2023)
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
by: Fujita, Kenichi, et al.
Published: (2024)
by: Fujita, Kenichi, et al.
Published: (2024)
Rethinking Entropy Minimization in Test-Time Adaptation for Autoregressive Models
by: Huang, Wei-Ping, et al.
Published: (2026)
by: Huang, Wei-Ping, et al.
Published: (2026)
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model
by: Fang, Hung-Chieh, et al.
Published: (2024)
by: Fang, Hung-Chieh, et al.
Published: (2024)
Towards Generalized Source Tracing for Codec-Based Deepfake Speech
by: Chen, Xuanjun, et al.
Published: (2025)
by: Chen, Xuanjun, et al.
Published: (2025)
A correlation-permutation approach for speech-music encoders model merging
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
A low latency attention module for streaming self-supervised speech representation learning
by: Ma, Jianbo, et al.
Published: (2023)
by: Ma, Jianbo, et al.
Published: (2023)
Distilling a speech and music encoder with task arithmetic
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
by: Lin, Yi-Cheng, et al.
Published: (2024)
by: Lin, Yi-Cheng, et al.
Published: (2024)
AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering
by: Kuan, Chun-Yi, et al.
Published: (2026)
by: Kuan, Chun-Yi, et al.
Published: (2026)
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
by: Kuan, Chun-Yi, et al.
Published: (2025)
by: Kuan, Chun-Yi, et al.
Published: (2025)
Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
Multi-Distillation from Speech and Music Representation Models
by: Wei, Jui-Chiang, et al.
Published: (2025)
by: Wei, Jui-Chiang, et al.
Published: (2025)
Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
MMMOS: Multi-domain Multi-axis Audio Quality Assessment
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model
by: Huang, Hsiao-Ying, et al.
Published: (2025)
by: Huang, Hsiao-Ying, et al.
Published: (2025)
Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)
by: Yao, Zengwei, et al.
Published: (2023)
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models
by: Lin, Yi-Cheng, et al.
Published: (2024)
by: Lin, Yi-Cheng, et al.
Published: (2024)
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
by: Wang, Hsuan-Fu, et al.
Published: (2024)
by: Wang, Hsuan-Fu, et al.
Published: (2024)
emg2speech: Synthesizing speech from electromyography using self-supervised speech models
by: Gowda, Harshavardhana T., et al.
Published: (2025)
by: Gowda, Harshavardhana T., et al.
Published: (2025)
CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)
by: Yao, Zengwei, et al.
Published: (2024)
TaigiSpeech: A Low-Resource Real-World Speech Intent Dataset and Preliminary Results with Scalable Data Mining In-the-Wild
by: Chang, Kai-Wei, et al.
Published: (2026)
by: Chang, Kai-Wei, et al.
Published: (2026)
Tempo estimation as fully self-supervised binary classification
by: Henkel, Florian, et al.
Published: (2024)
by: Henkel, Florian, et al.
Published: (2024)
CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition
by: Tsai, Yun-Shao, et al.
Published: (2025)
by: Tsai, Yun-Shao, et al.
Published: (2025)
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
by: Maiti, Soumi, et al.
Published: (2023)
by: Maiti, Soumi, et al.
Published: (2023)
Similar Items
-
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
by: Lin, Tzu-Quan, et al.
Published: (2024) -
MelHuBERT: A simplified HuBERT on Mel spectrograms
by: Lin, Tzu-Quan, et al.
Published: (2022) -
Property Neurons in Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2024) -
Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget
by: Liu, Andy T., et al.
Published: (2024) -
Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)