:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Hutiri, Wiebke, Papakyriakopoulos, Oresiti, Xiang, Alice
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language Artificial Intelligence Computers and Society Audio and Speech Processing
Accesso online:	https://arxiv.org/abs/2402.01708
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
di: Hutiri, Wiebke, et al.
Pubblicazione: (2025)

How to Evaluate Automatic Speech Recognition: Comparing Different Performance and Bias Measures
di: Patel, Tanvina, et al.
Pubblicazione: (2025)

As Biased as You Measure: Methodological Pitfalls of Bias Evaluations in Speaker Verification Research
di: Hutiri, Wiebke, et al.
Pubblicazione: (2024)

Lost in Phonation: Voice Quality Variation as an Evaluation Dimension for Speech Foundation Models
di: Lameris, Harm, et al.
Pubblicazione: (2025)

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
di: Le, Chenyang, et al.
Pubblicazione: (2024)

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
di: Yang, Guanrou, et al.
Pubblicazione: (2025)

Qualitative Approaches to Voice UX
di: Seaborn, Katie, et al.
Pubblicazione: (2024)

ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis
di: Toyin, Hawau Olamide, et al.
Pubblicazione: (2025)

Super Kawaii Vocalics: Amplifying the "Cute" Factor in Computer Voice
di: Mandai, Yuto, et al.
Pubblicazione: (2025)

Speech Retrieval-Augmented Generation without Automatic Speech Recognition
di: Min, Do June, et al.
Pubblicazione: (2024)

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study
di: Fan, Xiaoran, et al.
Pubblicazione: (2025)

Pheme: Efficient and Conversational Speech Generation
di: Budzianowski, Paweł, et al.
Pubblicazione: (2024)

Inter(sectional) Alia(s): Ambiguity in Voice Agent Identity via Intersectional Japanese Self-Referents
di: Fujii, Takao, et al.
Pubblicazione: (2025)

Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs
di: Satish, Shree Harsha Bokkahalli, et al.
Pubblicazione: (2025)

SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models
di: Wan, Zhen, et al.
Pubblicazione: (2025)

FairLENS: Assessing Fairness in Law Enforcement Speech Recognition
di: Wang, Yicheng, et al.
Pubblicazione: (2024)

Voice EHR: Introducing Multimodal Audio Data for Health
di: Anibal, James, et al.
Pubblicazione: (2024)

VoiceBench: Benchmarking LLM-Based Voice Assistants
di: Chen, Yiming, et al.
Pubblicazione: (2024)

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
di: Peng, Puyuan, et al.
Pubblicazione: (2024)

Phonology-Guided Speech-to-Speech Translation for African Languages
di: Ochieng, Peter, et al.
Pubblicazione: (2024)

Streaming Speech-to-Text Translation with a SpeechLLM
di: Parcollet, Titouan, et al.
Pubblicazione: (2026)

Brilla AI: AI Contestant for the National Science and Maths Quiz
di: Boateng, George, et al.
Pubblicazione: (2024)

The Model Hears You: Audio Language Model Deployments Should Consider the Principle of Least Privilege
di: He, Luxi, et al.
Pubblicazione: (2025)

A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation
di: Ma, Zhengrui, et al.
Pubblicazione: (2024)

LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
di: Ghosh, Sreyan, et al.
Pubblicazione: (2024)

Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations Generation
di: Gkritzali, Evangelia, et al.
Pubblicazione: (2024)

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects
di: Yang, Sicheng, et al.
Pubblicazione: (2026)

Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones
di: Teleki, Maria, et al.
Pubblicazione: (2025)

Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks
di: Hsu, Ming-Hao, et al.
Pubblicazione: (2023)

WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling
di: Yang, Guanrou, et al.
Pubblicazione: (2026)

MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond
di: Huzaifah, Muhammad, et al.
Pubblicazione: (2024)

KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI
di: Kuroki, So, et al.
Pubblicazione: (2025)

SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation
di: Djanibekov, Amirbek, et al.
Pubblicazione: (2026)

Voice Communication Analysis in Esports
di: Vinot, Aymeric, et al.
Pubblicazione: (2024)

VibeVoice Technical Report
di: Peng, Zhiliang, et al.
Pubblicazione: (2025)

Contextual Paralinguistic Data Creation for Multi-Modal Speech-LLM: Data Condensation and Spoken QA Generation
di: Wang, Qiongqiong, et al.
Pubblicazione: (2025)

SimClass: A Classroom Speech Dataset Generated via Game Engine Simulation For Automatic Speech Recognition Research
di: Attia, Ahmed Adel, et al.
Pubblicazione: (2025)

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception
di: Wan, Zhen, et al.
Pubblicazione: (2026)

FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
di: Ma, Min, et al.
Pubblicazione: (2024)

Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
di: Li, Yinghao Aaron, et al.
Pubblicazione: (2024)