:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Züfle, Maike, Klejch, Ondrej, Sanders, Nicholas, Niehues, Jan, Birch, Alexandra, Lam, Tsz Kin
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.11329
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

When Helpful Context Leaks: Privacy Risks in Domain-Adapted ASR
by: Züfle, Maike, et al.
Published: (2026)

Contrastive Learning for Task-Independent SpeechLLM-Pretraining
by: Züfle, Maike, et al.
Published: (2024)

MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations
by: Scott, Aaron, et al.
Published: (2025)

Talk2Ref: A Dataset for Reference Prediction from Scientific Talks
by: Broy, Frederik, et al.
Published: (2025)

Compact Speech Translation Models via Discrete Speech Units Pretraining
by: Lam, Tsz Kin, et al.
Published: (2024)

The Prosody of Emojis
by: Zhou, Giulio, et al.
Published: (2025)

Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases
by: Zhou, Giulio, et al.
Published: (2024)

Beyond Transcripts: A Renewed Perspective on Audio Chaptering
by: Retkowski, Fabian, et al.
Published: (2026)

NUTSHELL: A Dataset for Abstract Generation from Scientific Talks
by: Züfle, Maike, et al.
Published: (2025)

Early-Exit and Instant Confidence Translation Quality Estimation
by: Zouhar, Vilém, et al.
Published: (2025)

KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025
by: Koneru, Sai, et al.
Published: (2025)

A Practitioner's Guide to Building ASR Models for Low-Resource Languages: A Case Study on Scottish Gaelic
by: Klejch, Ondřej, et al.
Published: (2025)

COMET-poly: Machine Translation Metric Grounded in Other Candidates
by: Züfle, Maike, et al.
Published: (2025)

Summarizing Speech: A Comprehensive Survey
by: Retkowski, Fabian, et al.
Published: (2025)

Do What I Say: A Spoken Prompt Dataset for Instruction-Following
by: Züfle, Maike, et al.
Published: (2026)

TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems
by: Minixhofer, Christoph, et al.
Published: (2025)

Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR
by: Minixhofer, Christoph, et al.
Published: (2024)

TTSDS -- Text-to-Speech Distribution Score
by: Minixhofer, Christoph, et al.
Published: (2024)

PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models
by: Roy, Rajarshi, et al.
Published: (2026)

MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
by: Papi, Sara, et al.
Published: (2025)

How Transferable are Attribute Controllers on Pretrained Multilingual Translation Models?
by: Liu, Danni, et al.
Published: (2023)

A Bayesian Optimization Approach to Machine Translation Reranking
by: Cheng, Julius, et al.
Published: (2024)

Enabling Conversational Behavior Reasoning Capabilities in Full-Duplex Speech
by: Pan, Shuchang, et al.
Published: (2025)

Pitfalls and Outlooks in Using COMET
by: Zouhar, Vilém, et al.
Published: (2024)

MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
by: Zhang, He, et al.
Published: (2025)

Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap
by: Staniek, Michael, et al.
Published: (2023)

Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations
by: Singh, Bhaskar, et al.
Published: (2026)

VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents
by: Mazumdar, Amrita, et al.
Published: (2026)

From Seeing it to Experiencing it: Interactive Evaluation of Intersectional Voice Bias in Human-AI Speech Interaction
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026)

Conditions for Catastrophic Forgetting in Multilingual Translation
by: Liu, Danni, et al.
Published: (2025)

Are Generative Models Underconfident? Better Quality Estimation with Boosted Model Probability
by: Dinh, Tu Anh, et al.
Published: (2025)

Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives
by: Jacobs, Christiaan, et al.
Published: (2025)

Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
by: Lam, Tsz Kin, et al.
Published: (2025)

DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities
by: Lu, Xiangyu, et al.
Published: (2025)

FD-Bench: A Full-Duplex Benchmarking Pipeline Designed for Full Duplex Spoken Dialogue Systems
by: Peng, Yizhou, et al.
Published: (2025)

Multimodal In-context Learning for ASR of Low-resource Languages
by: Li, Zhaolin, et al.
Published: (2026)

Do Slides Help? Multi-modal Context for Automatic Transcription of Conference Talks
by: Sinhamahapatra, Supriti, et al.
Published: (2025)

Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs
by: Liu, Danni, et al.
Published: (2025)

In-context Language Learning for Endangered Languages in Speech Recognition
by: Li, Zhaolin, et al.
Published: (2025)

Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences
by: Ramírez, Guillem, et al.
Published: (2025)