Saved in:
| Main Authors: | Züfle, Maike, Klejch, Ondrej, Sanders, Nicholas, Niehues, Jan, Birch, Alexandra, Lam, Tsz Kin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.11329 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
When Helpful Context Leaks: Privacy Risks in Domain-Adapted ASR
by: Züfle, Maike, et al.
Published: (2026)
by: Züfle, Maike, et al.
Published: (2026)
Contrastive Learning for Task-Independent SpeechLLM-Pretraining
by: Züfle, Maike, et al.
Published: (2024)
by: Züfle, Maike, et al.
Published: (2024)
MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations
by: Scott, Aaron, et al.
Published: (2025)
by: Scott, Aaron, et al.
Published: (2025)
Talk2Ref: A Dataset for Reference Prediction from Scientific Talks
by: Broy, Frederik, et al.
Published: (2025)
by: Broy, Frederik, et al.
Published: (2025)
Compact Speech Translation Models via Discrete Speech Units Pretraining
by: Lam, Tsz Kin, et al.
Published: (2024)
by: Lam, Tsz Kin, et al.
Published: (2024)
The Prosody of Emojis
by: Zhou, Giulio, et al.
Published: (2025)
by: Zhou, Giulio, et al.
Published: (2025)
Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases
by: Zhou, Giulio, et al.
Published: (2024)
by: Zhou, Giulio, et al.
Published: (2024)
Beyond Transcripts: A Renewed Perspective on Audio Chaptering
by: Retkowski, Fabian, et al.
Published: (2026)
by: Retkowski, Fabian, et al.
Published: (2026)
NUTSHELL: A Dataset for Abstract Generation from Scientific Talks
by: Züfle, Maike, et al.
Published: (2025)
by: Züfle, Maike, et al.
Published: (2025)
Early-Exit and Instant Confidence Translation Quality Estimation
by: Zouhar, Vilém, et al.
Published: (2025)
by: Zouhar, Vilém, et al.
Published: (2025)
KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025
by: Koneru, Sai, et al.
Published: (2025)
by: Koneru, Sai, et al.
Published: (2025)
A Practitioner's Guide to Building ASR Models for Low-Resource Languages: A Case Study on Scottish Gaelic
by: Klejch, Ondřej, et al.
Published: (2025)
by: Klejch, Ondřej, et al.
Published: (2025)
COMET-poly: Machine Translation Metric Grounded in Other Candidates
by: Züfle, Maike, et al.
Published: (2025)
by: Züfle, Maike, et al.
Published: (2025)
Summarizing Speech: A Comprehensive Survey
by: Retkowski, Fabian, et al.
Published: (2025)
by: Retkowski, Fabian, et al.
Published: (2025)
Do What I Say: A Spoken Prompt Dataset for Instruction-Following
by: Züfle, Maike, et al.
Published: (2026)
by: Züfle, Maike, et al.
Published: (2026)
TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems
by: Minixhofer, Christoph, et al.
Published: (2025)
by: Minixhofer, Christoph, et al.
Published: (2025)
Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR
by: Minixhofer, Christoph, et al.
Published: (2024)
by: Minixhofer, Christoph, et al.
Published: (2024)
TTSDS -- Text-to-Speech Distribution Score
by: Minixhofer, Christoph, et al.
Published: (2024)
by: Minixhofer, Christoph, et al.
Published: (2024)
PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models
by: Roy, Rajarshi, et al.
Published: (2026)
by: Roy, Rajarshi, et al.
Published: (2026)
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
by: Papi, Sara, et al.
Published: (2025)
by: Papi, Sara, et al.
Published: (2025)
How Transferable are Attribute Controllers on Pretrained Multilingual Translation Models?
by: Liu, Danni, et al.
Published: (2023)
by: Liu, Danni, et al.
Published: (2023)
A Bayesian Optimization Approach to Machine Translation Reranking
by: Cheng, Julius, et al.
Published: (2024)
by: Cheng, Julius, et al.
Published: (2024)
Enabling Conversational Behavior Reasoning Capabilities in Full-Duplex Speech
by: Pan, Shuchang, et al.
Published: (2025)
by: Pan, Shuchang, et al.
Published: (2025)
Pitfalls and Outlooks in Using COMET
by: Zouhar, Vilém, et al.
Published: (2024)
by: Zouhar, Vilém, et al.
Published: (2024)
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
by: Zhang, He, et al.
Published: (2025)
by: Zhang, He, et al.
Published: (2025)
Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap
by: Staniek, Michael, et al.
Published: (2023)
by: Staniek, Michael, et al.
Published: (2023)
Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations
by: Singh, Bhaskar, et al.
Published: (2026)
by: Singh, Bhaskar, et al.
Published: (2026)
VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents
by: Mazumdar, Amrita, et al.
Published: (2026)
by: Mazumdar, Amrita, et al.
Published: (2026)
From Seeing it to Experiencing it: Interactive Evaluation of Intersectional Voice Bias in Human-AI Speech Interaction
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026)
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026)
Conditions for Catastrophic Forgetting in Multilingual Translation
by: Liu, Danni, et al.
Published: (2025)
by: Liu, Danni, et al.
Published: (2025)
Are Generative Models Underconfident? Better Quality Estimation with Boosted Model Probability
by: Dinh, Tu Anh, et al.
Published: (2025)
by: Dinh, Tu Anh, et al.
Published: (2025)
Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives
by: Jacobs, Christiaan, et al.
Published: (2025)
by: Jacobs, Christiaan, et al.
Published: (2025)
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
by: Lam, Tsz Kin, et al.
Published: (2025)
by: Lam, Tsz Kin, et al.
Published: (2025)
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities
by: Lu, Xiangyu, et al.
Published: (2025)
by: Lu, Xiangyu, et al.
Published: (2025)
FD-Bench: A Full-Duplex Benchmarking Pipeline Designed for Full Duplex Spoken Dialogue Systems
by: Peng, Yizhou, et al.
Published: (2025)
by: Peng, Yizhou, et al.
Published: (2025)
Multimodal In-context Learning for ASR of Low-resource Languages
by: Li, Zhaolin, et al.
Published: (2026)
by: Li, Zhaolin, et al.
Published: (2026)
Do Slides Help? Multi-modal Context for Automatic Transcription of Conference Talks
by: Sinhamahapatra, Supriti, et al.
Published: (2025)
by: Sinhamahapatra, Supriti, et al.
Published: (2025)
Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs
by: Liu, Danni, et al.
Published: (2025)
by: Liu, Danni, et al.
Published: (2025)
In-context Language Learning for Endangered Languages in Speech Recognition
by: Li, Zhaolin, et al.
Published: (2025)
by: Li, Zhaolin, et al.
Published: (2025)
Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences
by: Ramírez, Guillem, et al.
Published: (2025)
by: Ramírez, Guillem, et al.
Published: (2025)
Similar Items
-
When Helpful Context Leaks: Privacy Risks in Domain-Adapted ASR
by: Züfle, Maike, et al.
Published: (2026) -
Contrastive Learning for Task-Independent SpeechLLM-Pretraining
by: Züfle, Maike, et al.
Published: (2024) -
MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations
by: Scott, Aaron, et al.
Published: (2025) -
Talk2Ref: A Dataset for Reference Prediction from Scientific Talks
by: Broy, Frederik, et al.
Published: (2025) -
Compact Speech Translation Models via Discrete Speech Units Pretraining
by: Lam, Tsz Kin, et al.
Published: (2024)