Saved in:
| Main Authors: | Cheng, Yao-Fei, Futami, Hayato, Kashiwagi, Yosuke, Tsunoo, Emiru, Teo, Wen Shen, Arora, Siddhant, Watanabe, Shinji |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.11274 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
by: Futami, Hayato, et al.
Published: (2024)
by: Futami, Hayato, et al.
Published: (2024)
Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
by: Kashiwagi, Yosuke, et al.
Published: (2024)
by: Kashiwagi, Yosuke, et al.
Published: (2024)
Decoder-only Architecture for Streaming End-to-end Speech Recognition
by: Tsunoo, Emiru, et al.
Published: (2024)
by: Tsunoo, Emiru, et al.
Published: (2024)
Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens
by: Kashiwagi, Yosuke, et al.
Published: (2024)
by: Kashiwagi, Yosuke, et al.
Published: (2024)
Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting
by: Tsunoo, Emiru, et al.
Published: (2025)
by: Tsunoo, Emiru, et al.
Published: (2025)
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
by: Futami, Hayato, et al.
Published: (2025)
by: Futami, Hayato, et al.
Published: (2025)
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
by: Tsunoo, Emiru, et al.
Published: (2023)
by: Tsunoo, Emiru, et al.
Published: (2023)
Optimizing Conversational Quality in Spoken Dialogue Systems with Reinforcement Learning from AI Feedback
by: Arora, Siddhant, et al.
Published: (2026)
by: Arora, Siddhant, et al.
Published: (2026)
Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)
by: Arora, Siddhant, et al.
Published: (2025)
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions
by: Arora, Siddhant, et al.
Published: (2023)
by: Arora, Siddhant, et al.
Published: (2023)
Whale: Large-Scale multilingual ASR model with w2v-BERT and E-Branchformer with large speech data
by: Kashiwagi, Yosuke, et al.
Published: (2025)
by: Kashiwagi, Yosuke, et al.
Published: (2025)
Chain-of-Thought Training for Open E2E Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)
by: Arora, Siddhant, et al.
Published: (2025)
Phonological Tokenizer: Prosody-Aware Phonetic Token via Multi-Objective Fine-Tuning with Differentiable K-Means
by: Onda, Kentaro, et al.
Published: (2026)
by: Onda, Kentaro, et al.
Published: (2026)
Differentiable K-means for Fully-optimized Discrete Token-based ASR
by: Onda, Kentaro, et al.
Published: (2025)
by: Onda, Kentaro, et al.
Published: (2025)
ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)
by: Arora, Siddhant, et al.
Published: (2025)
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
by: Kim, Minsu, et al.
Published: (2024)
by: Kim, Minsu, et al.
Published: (2024)
CMU's IWSLT 2024 Simultaneous Speech Translation System
by: Xu, Xi, et al.
Published: (2024)
by: Xu, Xi, et al.
Published: (2024)
AURA: Agent for Understanding, Reasoning, and Automated Tool Use in Voice-Driven Tasks
by: Maben, Leander Melroy, et al.
Published: (2025)
by: Maben, Leander Melroy, et al.
Published: (2025)
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
by: Wu, Yihan, et al.
Published: (2024)
by: Wu, Yihan, et al.
Published: (2024)
OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models
by: Chen, William, et al.
Published: (2025)
by: Chen, William, et al.
Published: (2025)
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
by: Lu, Yichen, et al.
Published: (2024)
by: Lu, Yichen, et al.
Published: (2024)
Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks
by: Gambardella, Andrew, et al.
Published: (2024)
by: Gambardella, Andrew, et al.
Published: (2024)
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
by: Chang, Kai-Wei, et al.
Published: (2024)
by: Chang, Kai-Wei, et al.
Published: (2024)
Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models
by: Wu, Suhang, et al.
Published: (2025)
by: Wu, Suhang, et al.
Published: (2025)
Evaluating Prompting Strategies and Large Language Models in Systematic Literature Review Screening: Relevance and Task-Stage Classification
by: Han, Binglan, et al.
Published: (2025)
by: Han, Binglan, et al.
Published: (2025)
Bias Vector: Mitigating Biases in Language Models with Task Arithmetic Approach
by: Shirafuji, Daiki, et al.
Published: (2024)
by: Shirafuji, Daiki, et al.
Published: (2024)
SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models
by: Wan, Zhen, et al.
Published: (2025)
by: Wan, Zhen, et al.
Published: (2025)
Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English
by: Kathunia, Aekansh, et al.
Published: (2024)
by: Kathunia, Aekansh, et al.
Published: (2024)
RevOrder: A Novel Method for Enhanced Arithmetic in Language Models
by: Shen, Si, et al.
Published: (2024)
by: Shen, Si, et al.
Published: (2024)
Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks
by: Hsu, Ming-Hao, et al.
Published: (2023)
by: Hsu, Ming-Hao, et al.
Published: (2023)
From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2026)
by: Pan, Yu, et al.
Published: (2026)
Phonology-Guided Speech-to-Speech Translation for African Languages
by: Ochieng, Peter, et al.
Published: (2024)
by: Ochieng, Peter, et al.
Published: (2024)
Towards Robust Speech Representation Learning for Thousands of Languages
by: Chen, William, et al.
Published: (2024)
by: Chen, William, et al.
Published: (2024)
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic
by: Bhardwaj, Rishabh, et al.
Published: (2024)
by: Bhardwaj, Rishabh, et al.
Published: (2024)
Strategic Prompting for Conversational Tasks: A Comparative Analysis of Large Language Models Across Diverse Conversational Tasks
by: Joshi, Ratnesh Kumar, et al.
Published: (2024)
by: Joshi, Ratnesh Kumar, et al.
Published: (2024)
Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment
by: Choi, Kwanghee, et al.
Published: (2025)
by: Choi, Kwanghee, et al.
Published: (2025)
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition
by: Su, Hsuan, et al.
Published: (2024)
by: Su, Hsuan, et al.
Published: (2024)
InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model
by: Ouyang, Siqi, et al.
Published: (2025)
by: Ouyang, Siqi, et al.
Published: (2025)
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024
by: Koneru, Sai, et al.
Published: (2024)
by: Koneru, Sai, et al.
Published: (2024)
On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models
by: Verma, Mudit, et al.
Published: (2024)
by: Verma, Mudit, et al.
Published: (2024)
Similar Items
-
Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
by: Futami, Hayato, et al.
Published: (2024) -
Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
by: Kashiwagi, Yosuke, et al.
Published: (2024) -
Decoder-only Architecture for Streaming End-to-end Speech Recognition
by: Tsunoo, Emiru, et al.
Published: (2024) -
Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens
by: Kashiwagi, Yosuke, et al.
Published: (2024) -
Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting
by: Tsunoo, Emiru, et al.
Published: (2025)