:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Zolkepli, Husein
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Sound
Online Access:	https://arxiv.org/abs/2601.20185
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Multi-Lingual Malaysian Embedding: Leveraging Large Language Models for Semantic Representations
by: Zolkepli, Husein, et al.
Published: (2024)

Attenuation of Sound in Glacier Ice from 2 kHz to 35 kHz
by: Meyer, Alexander, et al.
Published: (2019)

U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation
by: Yang, Xusheng, et al.
Published: (2025)

POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
by: Li, Xuanchen, et al.
Published: (2025)

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
by: Wang, Xiaofei, et al.
Published: (2023)

MMMModal -- Multi-Images Multi-Audio Multi-turn Multi-Modal
by: Zolkepli, Husein, et al.
Published: (2024)

Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
by: Casanova, Edresson, et al.
Published: (2024)

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
by: Xue, Jinlong, et al.
Published: (2024)

Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation
by: Li, Yingting, et al.
Published: (2024)

SecoustiCodec: Cross-Modal Aligned Streaming Single-Codecbook Speech Codec
by: Qiang, Chunyu, et al.
Published: (2025)

Cross-Lingual Multi-Granularity Framework for Interpretable Parkinson's Disease Diagnosis from Speech
by: Tougui, Ilias, et al.
Published: (2025)

CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition
by: Sung, Hung-Yang, et al.
Published: (2025)

DM-Codec: Distilling Multimodal Representations for Speech Tokenization
by: Ahasan, Md Mubtasim, et al.
Published: (2024)

NanoCodec: Towards High-Quality Ultra Fast Speech LLM Inference
by: Casanova, Edresson, et al.
Published: (2025)

Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
by: Zhao, Qiuming, et al.
Published: (2025)

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC
by: Wang, Qingzheng, et al.
Published: (2025)

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
by: Han, HyoJung, et al.
Published: (2024)

Foundation Model-based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study
by: Dong, Zhongren, et al.
Published: (2025)

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
by: Chen, Sanyuan, et al.
Published: (2024)

MaLLaM -- Malaysia Large Language Model
by: Zolkepli, Husein, et al.
Published: (2024)

Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding
by: Zolkepli, Husein, et al.
Published: (2024)

Quantizer-Aware Hierarchical Neural Codec Modeling for Speech Deepfake Detection
by: Wu, Jinyang, et al.
Published: (2026)

Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
by: Karapiperis, Sotirios, et al.
Published: (2024)

Spectrogram Patch Codec: A 2D Block-Quantized VQ-VAE and HiFi-GAN for Neural Speech Coding
by: Chary, Luis Felipe, et al.
Published: (2025)

Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
by: Yamamoto, Ryuichi, et al.
Published: (2024)

Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios
by: Gállego, Gerard I., et al.
Published: (2025)

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
by: Ju, Zeqian, et al.
Published: (2024)

DementiaBank-Emotion: A Multi-Rater Emotion Annotation Corpus for Alzheimer's Disease Speech (Version 1.0)
by: Jeong, Cheonkam, et al.
Published: (2026)

ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
by: Kong, Jungil, et al.
Published: (2023)

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
by: Yu, Wenyi, et al.
Published: (2024)

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
by: Shi, Jiatong, et al.
Published: (2024)

Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
by: Cheng, Shanbo, et al.
Published: (2025)

FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs
by: Ahasan, Md Mubtasim, et al.
Published: (2025)

End-to-End Simultaneous Dysarthric Speech Reconstruction with Frame-Level Adaptor and Multiple Wait-k Knowledge Distillation
by: Wu, Minghui, et al.
Published: (2026)

DELULU: Discriminative Embedding Learning Using Latent Units for Speaker-Aware Self-Trained Speech Foundational Model
by: Baali, Massa, et al.
Published: (2025)

Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing
by: Liu, Tianchi, et al.
Published: (2024)

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
by: Ye, Zhen, et al.
Published: (2024)

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models
by: Yang, Dongchao, et al.
Published: (2024)

Equipping LLM with Directional Multi-Talker Speech Understanding Capabilities
by: Lin, Ju, et al.
Published: (2026)

Cross-Lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models
by: Han, Zhichen, et al.
Published: (2024)