Saved in:
| Main Authors: | Yang, Shu-wen, Chang, Heng-Jui, Huang, Zili, Liu, Andy T., Lai, Cheng-I, Wu, Haibin, Shi, Jiatong, Chang, Xuankai, Tsai, Hsiang-Sheng, Huang, Wen-Chin, Feng, Tzu-hsun, Chi, Po-Han, Lin, Yist Y., Chuang, Yung-Sung, Huang, Tzu-Hsien, Tseng, Wei-Cheng, Lakhotia, Kushal, Li, Shang-Wen, Mohamed, Abdelrahman, Watanabe, Shinji, Lee, Hung-yi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.09385 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
by: Shi, Jiatong, et al.
Published: (2023)
by: Shi, Jiatong, et al.
Published: (2023)
Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2022)
by: Lin, Tzu-Quan, et al.
Published: (2022)
Reducing Object Hallucination in Large Audio-Language Models via Audio-Aware Decoding
by: Hsu, Tzu-wen, et al.
Published: (2025)
by: Hsu, Tzu-wen, et al.
Published: (2025)
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
by: Shi, Jiatong, et al.
Published: (2023)
by: Shi, Jiatong, et al.
Published: (2023)
Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
How Contrastive Decoding Enhances Large Audio Language Models?
by: Lin, Tzu-Quan, et al.
Published: (2026)
by: Lin, Tzu-Quan, et al.
Published: (2026)
LV-CTC: Non-autoregressive ASR with CTC and latent variable models
by: Fujita, Yuya, et al.
Published: (2024)
by: Fujita, Yuya, et al.
Published: (2024)
A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability
by: Tseng, Li-Yang, et al.
Published: (2024)
by: Tseng, Li-Yang, et al.
Published: (2024)
Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance
by: Chou, Huang-Cheng, et al.
Published: (2024)
by: Chou, Huang-Cheng, et al.
Published: (2024)
Real-Time Minimum-Energy Operating-Point Tracking for Battery-Powered Micro DC Motors Under Dynamically Variable Loading
by: Huang, Tzu-Hsiang, et al.
Published: (2026)
by: Huang, Tzu-Hsiang, et al.
Published: (2026)
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
by: Wang, Shih-heng, et al.
Published: (2024)
by: Wang, Shih-heng, et al.
Published: (2024)
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
by: Guo, Pengcheng, et al.
Published: (2024)
by: Guo, Pengcheng, et al.
Published: (2024)
The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation
by: Tsai, Yun-Shao, et al.
Published: (2026)
by: Tsai, Yun-Shao, et al.
Published: (2026)
Analytical Model of Groundwater Flow in a Rectangular Domain for Spatiotemporally Distributed Recharge
by: Ping‐Cheng Hsieh, et al.
Published: (2025)
by: Ping‐Cheng Hsieh, et al.
Published: (2025)
Emo-bias: A Large Scale Evaluation of Social Bias on Speech Emotion Recognition
by: Lin, Yi-Cheng, et al.
Published: (2024)
by: Lin, Yi-Cheng, et al.
Published: (2024)
Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)
by: Lin, Tzu-Quan, et al.
Published: (2025)
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
by: Shi, Jiatong, et al.
Published: (2024)
by: Shi, Jiatong, et al.
Published: (2024)
Learning Displacement-Aware WiFi Representations for Weakly Supervised Relative Localization
by: Wei, Tzu-Ti, et al.
Published: (2026)
by: Wei, Tzu-Ti, et al.
Published: (2026)
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
by: Wang, Shih-Heng, et al.
Published: (2024)
by: Wang, Shih-Heng, et al.
Published: (2024)
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
by: Chang, Xuankai, et al.
Published: (2024)
by: Chang, Xuankai, et al.
Published: (2024)
AI-based CSI Feedback with Digital Twins: Real-World Validation and Insights
by: Huang, Tzu-Hao, et al.
Published: (2025)
by: Huang, Tzu-Hao, et al.
Published: (2025)
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
by: Cornell, Samuele, et al.
Published: (2024)
by: Cornell, Samuele, et al.
Published: (2024)
Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data
by: Tseng, Liang-Hsuan, et al.
Published: (2024)
by: Tseng, Liang-Hsuan, et al.
Published: (2024)
On the social bias of speech self-supervised models
by: Lin, Yi-Cheng, et al.
Published: (2024)
by: Lin, Yi-Cheng, et al.
Published: (2024)
ART: Artifact Removal Transformer for Reconstructing Noise-Free Multichannel Electroencephalographic Signals
by: Chuang, Chun-Hsiang, et al.
Published: (2024)
by: Chuang, Chun-Hsiang, et al.
Published: (2024)
Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with an Automated Examiner
by: Lin, Guan-Ting, et al.
Published: (2025)
by: Lin, Guan-Ting, et al.
Published: (2025)
MelHuBERT: A simplified HuBERT on Mel spectrograms
by: Lin, Tzu-Quan, et al.
Published: (2022)
by: Lin, Tzu-Quan, et al.
Published: (2022)
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
by: Lin, Tzu-Quan, et al.
Published: (2024)
by: Lin, Tzu-Quan, et al.
Published: (2024)
Comparing the Effects of Different Dielectric Materials on an Atmospheric Pressure Plasma Jet by Experiments and Simulations
by: Po‐Chun Huang, et al.
Published: (2024)
by: Po‐Chun Huang, et al.
Published: (2024)
Towards Robust Speech Representation Learning for Thousands of Languages
by: Chen, William, et al.
Published: (2024)
by: Chen, William, et al.
Published: (2024)
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
by: Maiti, Soumi, et al.
Published: (2023)
by: Maiti, Soumi, et al.
Published: (2023)
Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling
by: Huang, Shao-Syuan, et al.
Published: (2024)
by: Huang, Shao-Syuan, et al.
Published: (2024)
Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks
by: Wang, Shih-Heng, et al.
Published: (2026)
by: Wang, Shih-Heng, et al.
Published: (2026)
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
by: Lu, Yichen, et al.
Published: (2024)
by: Lu, Yichen, et al.
Published: (2024)
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
by: Chang, Kai-Wei, et al.
Published: (2024)
by: Chang, Kai-Wei, et al.
Published: (2024)
Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
by: Yang, Dongchao, et al.
Published: (2023)
by: Yang, Dongchao, et al.
Published: (2023)
Quantum Information-Empowered Graph Neural Network for Hyperspectral Change Detection
by: Lin, Chia-Hsiang, et al.
Published: (2024)
by: Lin, Chia-Hsiang, et al.
Published: (2024)
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
by: Wu, Yihan, et al.
Published: (2024)
by: Wu, Yihan, et al.
Published: (2024)
Similar Items
-
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
by: Shi, Jiatong, et al.
Published: (2023) -
Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2022) -
Reducing Object Hallucination in Large Audio-Language Models via Audio-Aware Decoding
by: Hsu, Tzu-wen, et al.
Published: (2025) -
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
by: Shi, Jiatong, et al.
Published: (2023) -
Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems
by: Lin, Yi-Cheng, et al.
Published: (2025)