Saved in:
| Main Authors: | Kirakosyan, Grigor, Karamyan, Davit |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.00151 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ERASMO: Leveraging Large Language Models for Enhanced Clustering Segmentation
by: Silva, Fillipe dos Santos, et al.
Published: (2024)
by: Silva, Fillipe dos Santos, et al.
Published: (2024)
Validation Requirements for AI-based Intervention-Evaluation in Aging and Longevity Research and Practice
by: Fuellen, Georg, et al.
Published: (2024)
by: Fuellen, Georg, et al.
Published: (2024)
Radial Neighborhood Smoothing Recommender System
by: Zhang, Zerui, et al.
Published: (2025)
by: Zhang, Zerui, et al.
Published: (2025)
An Outlook on the Opportunities and Challenges of Multi-Agent AI Systems
by: Tian, Fangqiao, et al.
Published: (2025)
by: Tian, Fangqiao, et al.
Published: (2025)
CFAE Framework Specification — ABO (Archetypal Behavior Ontology): Predictive Priors & Active Inference Seeding Layer - version: v0.1
by: Brown, Cameron
Published: (2026)
by: Brown, Cameron
Published: (2026)
Autonomous Agency and Persistent Intent Architecture Specification v0.1
by: Brown, Cameron
Published: (2026)
by: Brown, Cameron
Published: (2026)
DQEP Verification Runtime Specification v0.2
by: Brown, Cameron
Published: (2026)
by: Brown, Cameron
Published: (2026)
Recursive Ontological Emergence and Synthetic Consciousness Ecology Specification v0.1
by: Brown, Cameron
Published: (2026)
by: Brown, Cameron
Published: (2026)
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
by: Yang, Chao-Han Huck, et al.
Published: (2024)
by: Yang, Chao-Han Huck, et al.
Published: (2024)
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
by: Kumar, Anurag, et al.
Published: (2025)
by: Kumar, Anurag, et al.
Published: (2025)
AG-LSEC: Audio Grounded Lexical Speaker Error Correction
by: Paturi, Rohit, et al.
Published: (2024)
by: Paturi, Rohit, et al.
Published: (2024)
HYVE: Hybrid Views for LLM Context Engineering over Machine Data
by: Tan, Jian, et al.
Published: (2026)
by: Tan, Jian, et al.
Published: (2026)
Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation
by: Jia, Dongya, et al.
Published: (2025)
by: Jia, Dongya, et al.
Published: (2025)
Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models
by: Wang, Ziyu, et al.
Published: (2024)
by: Wang, Ziyu, et al.
Published: (2024)
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
by: Liu, Zhijun, et al.
Published: (2024)
by: Liu, Zhijun, et al.
Published: (2024)
Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis
by: Salehi, Pegah, et al.
Published: (2024)
by: Salehi, Pegah, et al.
Published: (2024)
Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting
by: Yang, Chao-Han Huck, et al.
Published: (2023)
by: Yang, Chao-Han Huck, et al.
Published: (2023)
Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech
by: Weise, Tobias, et al.
Published: (2024)
by: Weise, Tobias, et al.
Published: (2024)
GenTSE: Enhancing Target Speaker Extraction via a Coarse-to-Fine Generative Language Model
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Semantically Corrected Amharic Automatic Speech Recognition
by: Adnew, Samuael, et al.
Published: (2024)
by: Adnew, Samuael, et al.
Published: (2024)
Explainable artificial intelligence and its key role in education: Promoting critical thinking and autonomy in the classroom
by: Francisco J. Bellas
Published: (2025)
by: Francisco J. Bellas
Published: (2025)
TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches
by: Wang, Rong, et al.
Published: (2024)
by: Wang, Rong, et al.
Published: (2024)
Beyond Hearing: Learning Task-Agnostic ExG Representations from Earphones via Physiology-Informed Tokenization
by: Yoon, Hyungjun, et al.
Published: (2025)
by: Yoon, Hyungjun, et al.
Published: (2025)
Uncertainty-Penalized Direct Preference Optimization
by: Houliston, Sam, et al.
Published: (2024)
by: Houliston, Sam, et al.
Published: (2024)
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models
by: Noroozi, Vahid, et al.
Published: (2024)
by: Noroozi, Vahid, et al.
Published: (2024)
Rethinking Entropy Minimization in Test-Time Adaptation for Autoregressive Models
by: Huang, Wei-Ping, et al.
Published: (2026)
by: Huang, Wei-Ping, et al.
Published: (2026)
ASR-Synchronized Speaker-Role Diarization
by: Ghosh, Arindam, et al.
Published: (2025)
by: Ghosh, Arindam, et al.
Published: (2025)
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
by: Chang, Kai-Wei, et al.
Published: (2024)
by: Chang, Kai-Wei, et al.
Published: (2024)
Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
by: Stuhlmann, Linus, et al.
Published: (2025)
by: Stuhlmann, Linus, et al.
Published: (2025)
FastAST: Accelerating Audio Spectrogram Transformer via Token Merging and Cross-Model Knowledge Distillation
by: Behera, Swarup Ranjan, et al.
Published: (2024)
by: Behera, Swarup Ranjan, et al.
Published: (2024)
Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition
by: Hono, Yukiya, et al.
Published: (2023)
by: Hono, Yukiya, et al.
Published: (2023)
Improving Acoustic Side-Channel Attacks on Keyboards Using Transformers and Large Language Models
by: Park, Jin Hyun, et al.
Published: (2025)
by: Park, Jin Hyun, et al.
Published: (2025)
Masked Audio Generation using a Single Non-Autoregressive Transformer
by: Ziv, Alon, et al.
Published: (2024)
by: Ziv, Alon, et al.
Published: (2024)
Spoken Language Intelligence of Large Language Models for Language Learning
by: Peng, Linkai, et al.
Published: (2023)
by: Peng, Linkai, et al.
Published: (2023)
What Do Language Models Hear? Probing for Auditory Representations in Language Models
by: Ngo, Jerry, et al.
Published: (2024)
by: Ngo, Jerry, et al.
Published: (2024)
As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli
by: Cooke, Di, et al.
Published: (2024)
by: Cooke, Di, et al.
Published: (2024)
NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model
by: Lin, Yen-Ting, et al.
Published: (2024)
by: Lin, Yen-Ting, et al.
Published: (2024)
Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair
by: Sakai, Yusuke, et al.
Published: (2024)
by: Sakai, Yusuke, et al.
Published: (2024)
Towards Signal Processing In Large Language Models
by: Verma, Prateek, et al.
Published: (2024)
by: Verma, Prateek, et al.
Published: (2024)
Similar Items
-
ERASMO: Leveraging Large Language Models for Enhanced Clustering Segmentation
by: Silva, Fillipe dos Santos, et al.
Published: (2024) -
Validation Requirements for AI-based Intervention-Evaluation in Aging and Longevity Research and Practice
by: Fuellen, Georg, et al.
Published: (2024) -
Radial Neighborhood Smoothing Recommender System
by: Zhang, Zerui, et al.
Published: (2025) -
An Outlook on the Opportunities and Challenges of Multi-Agent AI Systems
by: Tian, Fangqiao, et al.
Published: (2025) -
CFAE Framework Specification — ABO (Archetypal Behavior Ontology): Predictive Priors & Active Inference Seeding Layer - version: v0.1
by: Brown, Cameron
Published: (2026)