Saved in:
| Main Author: | Zolkepli, Husein |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.20185 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multi-Lingual Malaysian Embedding: Leveraging Large Language Models for Semantic Representations
by: Zolkepli, Husein, et al.
Published: (2024)
by: Zolkepli, Husein, et al.
Published: (2024)
Attenuation of Sound in Glacier Ice from 2 kHz to 35 kHz
by: Meyer, Alexander, et al.
Published: (2019)
by: Meyer, Alexander, et al.
Published: (2019)
U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation
by: Yang, Xusheng, et al.
Published: (2025)
by: Yang, Xusheng, et al.
Published: (2025)
POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
by: Li, Xuanchen, et al.
Published: (2025)
by: Li, Xuanchen, et al.
Published: (2025)
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
by: Wang, Xiaofei, et al.
Published: (2023)
by: Wang, Xiaofei, et al.
Published: (2023)
MMMModal -- Multi-Images Multi-Audio Multi-turn Multi-Modal
by: Zolkepli, Husein, et al.
Published: (2024)
by: Zolkepli, Husein, et al.
Published: (2024)
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
by: Casanova, Edresson, et al.
Published: (2024)
by: Casanova, Edresson, et al.
Published: (2024)
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
by: Xue, Jinlong, et al.
Published: (2024)
by: Xue, Jinlong, et al.
Published: (2024)
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation
by: Li, Yingting, et al.
Published: (2024)
by: Li, Yingting, et al.
Published: (2024)
SecoustiCodec: Cross-Modal Aligned Streaming Single-Codecbook Speech Codec
by: Qiang, Chunyu, et al.
Published: (2025)
by: Qiang, Chunyu, et al.
Published: (2025)
Cross-Lingual Multi-Granularity Framework for Interpretable Parkinson's Disease Diagnosis from Speech
by: Tougui, Ilias, et al.
Published: (2025)
by: Tougui, Ilias, et al.
Published: (2025)
CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition
by: Sung, Hung-Yang, et al.
Published: (2025)
by: Sung, Hung-Yang, et al.
Published: (2025)
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
by: Ahasan, Md Mubtasim, et al.
Published: (2024)
by: Ahasan, Md Mubtasim, et al.
Published: (2024)
NanoCodec: Towards High-Quality Ultra Fast Speech LLM Inference
by: Casanova, Edresson, et al.
Published: (2025)
by: Casanova, Edresson, et al.
Published: (2025)
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
by: Zhao, Qiuming, et al.
Published: (2025)
by: Zhao, Qiuming, et al.
Published: (2025)
Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC
by: Wang, Qingzheng, et al.
Published: (2025)
by: Wang, Qingzheng, et al.
Published: (2025)
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
by: Han, HyoJung, et al.
Published: (2024)
by: Han, HyoJung, et al.
Published: (2024)
Foundation Model-based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study
by: Dong, Zhongren, et al.
Published: (2025)
by: Dong, Zhongren, et al.
Published: (2025)
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
by: Chen, Sanyuan, et al.
Published: (2024)
by: Chen, Sanyuan, et al.
Published: (2024)
MaLLaM -- Malaysia Large Language Model
by: Zolkepli, Husein, et al.
Published: (2024)
by: Zolkepli, Husein, et al.
Published: (2024)
Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding
by: Zolkepli, Husein, et al.
Published: (2024)
by: Zolkepli, Husein, et al.
Published: (2024)
Quantizer-Aware Hierarchical Neural Codec Modeling for Speech Deepfake Detection
by: Wu, Jinyang, et al.
Published: (2026)
by: Wu, Jinyang, et al.
Published: (2026)
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
by: Karapiperis, Sotirios, et al.
Published: (2024)
by: Karapiperis, Sotirios, et al.
Published: (2024)
Spectrogram Patch Codec: A 2D Block-Quantized VQ-VAE and HiFi-GAN for Neural Speech Coding
by: Chary, Luis Felipe, et al.
Published: (2025)
by: Chary, Luis Felipe, et al.
Published: (2025)
Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
by: Yamamoto, Ryuichi, et al.
Published: (2024)
by: Yamamoto, Ryuichi, et al.
Published: (2024)
Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios
by: Gállego, Gerard I., et al.
Published: (2025)
by: Gállego, Gerard I., et al.
Published: (2025)
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
by: Ju, Zeqian, et al.
Published: (2024)
by: Ju, Zeqian, et al.
Published: (2024)
DementiaBank-Emotion: A Multi-Rater Emotion Annotation Corpus for Alzheimer's Disease Speech (Version 1.0)
by: Jeong, Cheonkam, et al.
Published: (2026)
by: Jeong, Cheonkam, et al.
Published: (2026)
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
by: Kong, Jungil, et al.
Published: (2023)
by: Kong, Jungil, et al.
Published: (2023)
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
by: Yu, Wenyi, et al.
Published: (2024)
by: Yu, Wenyi, et al.
Published: (2024)
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
by: Shi, Jiatong, et al.
Published: (2024)
by: Shi, Jiatong, et al.
Published: (2024)
Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
by: Cheng, Shanbo, et al.
Published: (2025)
by: Cheng, Shanbo, et al.
Published: (2025)
FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs
by: Ahasan, Md Mubtasim, et al.
Published: (2025)
by: Ahasan, Md Mubtasim, et al.
Published: (2025)
End-to-End Simultaneous Dysarthric Speech Reconstruction with Frame-Level Adaptor and Multiple Wait-k Knowledge Distillation
by: Wu, Minghui, et al.
Published: (2026)
by: Wu, Minghui, et al.
Published: (2026)
DELULU: Discriminative Embedding Learning Using Latent Units for Speaker-Aware Self-Trained Speech Foundational Model
by: Baali, Massa, et al.
Published: (2025)
by: Baali, Massa, et al.
Published: (2025)
Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing
by: Liu, Tianchi, et al.
Published: (2024)
by: Liu, Tianchi, et al.
Published: (2024)
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
by: Ye, Zhen, et al.
Published: (2024)
by: Ye, Zhen, et al.
Published: (2024)
SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models
by: Yang, Dongchao, et al.
Published: (2024)
by: Yang, Dongchao, et al.
Published: (2024)
Equipping LLM with Directional Multi-Talker Speech Understanding Capabilities
by: Lin, Ju, et al.
Published: (2026)
by: Lin, Ju, et al.
Published: (2026)
Cross-Lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models
by: Han, Zhichen, et al.
Published: (2024)
by: Han, Zhichen, et al.
Published: (2024)
Similar Items
-
Multi-Lingual Malaysian Embedding: Leveraging Large Language Models for Semantic Representations
by: Zolkepli, Husein, et al.
Published: (2024) -
Attenuation of Sound in Glacier Ice from 2 kHz to 35 kHz
by: Meyer, Alexander, et al.
Published: (2019) -
U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation
by: Yang, Xusheng, et al.
Published: (2025) -
POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
by: Li, Xuanchen, et al.
Published: (2025) -
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
by: Wang, Xiaofei, et al.
Published: (2023)