Saved in:
| Main Authors: | Lin, Guan-Ting, Lee, Hung-yi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.11065 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR
by: Huang, Wei-Ping, et al.
Published: (2025)
by: Huang, Wei-Ping, et al.
Published: (2025)
Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency
by: Lin, Guan-Ting, et al.
Published: (2026)
by: Lin, Guan-Ting, et al.
Published: (2026)
Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities
by: Lin, Guan-Ting, et al.
Published: (2025)
by: Lin, Guan-Ting, et al.
Published: (2025)
Prompt-Based One-Shot Exact Length-Controlled Generation with LLMs
by: Xie, Juncheng, et al.
Published: (2025)
by: Xie, Juncheng, et al.
Published: (2025)
Property Neurons in Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2024)
by: Lin, Tzu-Quan, et al.
Published: (2024)
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue
by: Lin, Guan-Ting, et al.
Published: (2023)
by: Lin, Guan-Ting, et al.
Published: (2023)
Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts
by: Chen, Yi-Chang, et al.
Published: (2026)
by: Chen, Yi-Chang, et al.
Published: (2026)
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding
by: So, Yeonkyoung, et al.
Published: (2025)
by: So, Yeonkyoung, et al.
Published: (2025)
Transferring Textual Preferences to Vision-Language Understanding through Model Merging
by: Li, Chen-An, et al.
Published: (2025)
by: Li, Chen-An, et al.
Published: (2025)
Investigating the Effects of Large-Scale Pseudo-Stereo Data and Different Speech Foundation Model on Dialogue Generative Spoken Language Model
by: Fu, Yu-Kuan, et al.
Published: (2024)
by: Fu, Yu-Kuan, et al.
Published: (2024)
Merging Facts, Crafting Fallacies: Evaluating the Contradictory Nature of Aggregated Factual Claims in Long-Form Generations
by: Chiang, Cheng-Han, et al.
Published: (2024)
by: Chiang, Cheng-Han, et al.
Published: (2024)
Over-Reasoning and Redundant Calculation of Large Language Models
by: Chiang, Cheng-Han, et al.
Published: (2024)
by: Chiang, Cheng-Han, et al.
Published: (2024)
When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models
by: Li, Chen-An, et al.
Published: (2025)
by: Li, Chen-An, et al.
Published: (2025)
Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper
by: Yang, Chih-Kai, et al.
Published: (2024)
by: Yang, Chih-Kai, et al.
Published: (2024)
TiCo: Time-Controllable Spoken Dialogue Model
by: Chang, Kai-Wei, et al.
Published: (2026)
by: Chang, Kai-Wei, et al.
Published: (2026)
TASTE-Streaming: Towards Streamable Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
by: Tseng, Liang-Hsuan, et al.
Published: (2026)
by: Tseng, Liang-Hsuan, et al.
Published: (2026)
MMMOS: Multi-domain Multi-axis Audio Quality Assessment
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
Style Amnesia: Investigating Speaking Style Degradation and Mitigation in Multi-Turn Spoken Language Models
by: Lin, Yu-Xiang, et al.
Published: (2025)
by: Lin, Yu-Xiang, et al.
Published: (2025)
ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models
by: Hsiao, Chi-Yuan, et al.
Published: (2026)
by: Hsiao, Chi-Yuan, et al.
Published: (2026)
InstructionCP: A fast approach to transfer Large Language Models into target language
by: Chen, Kuang-Ming, et al.
Published: (2024)
by: Chen, Kuang-Ming, et al.
Published: (2024)
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
by: Hsu, Ming-Hao, et al.
Published: (2024)
by: Hsu, Ming-Hao, et al.
Published: (2024)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
by: Lin, Yi-Cheng, et al.
Published: (2024)
by: Lin, Yi-Cheng, et al.
Published: (2024)
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples
by: Kuan, Chun-Yi, et al.
Published: (2025)
by: Kuan, Chun-Yi, et al.
Published: (2025)
TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge
by: Chiang, Cheng-Han, et al.
Published: (2025)
by: Chiang, Cheng-Han, et al.
Published: (2025)
Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC
by: Syu, Shen-sian, et al.
Published: (2023)
by: Syu, Shen-sian, et al.
Published: (2023)
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging
by: Lin, Tzu-Han, et al.
Published: (2024)
by: Lin, Tzu-Han, et al.
Published: (2024)
MelHuBERT: A simplified HuBERT on Mel spectrograms
by: Lin, Tzu-Quan, et al.
Published: (2022)
by: Lin, Tzu-Quan, et al.
Published: (2022)
Gender Bias in Instruction-Guided Speech Synthesis Models
by: Kuan, Chun-Yi, et al.
Published: (2025)
by: Kuan, Chun-Yi, et al.
Published: (2025)
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
by: Farn, Hua, et al.
Published: (2024)
by: Farn, Hua, et al.
Published: (2024)
Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model
by: Huang, Hsiao-Ying, et al.
Published: (2025)
by: Huang, Hsiao-Ying, et al.
Published: (2025)
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
Non-instructional Fine-tuning: Enabling Instruction-Following Capabilities in Pre-trained Language Models without Instruction-Following Data
by: Xie, Juncheng, et al.
Published: (2024)
by: Xie, Juncheng, et al.
Published: (2024)
Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks
by: Everson, Kevin, et al.
Published: (2024)
by: Everson, Kevin, et al.
Published: (2024)
GSQA: An End-to-End Model for Generative Spoken Question Answering
by: Shih, Min-Han, et al.
Published: (2023)
by: Shih, Min-Han, et al.
Published: (2023)
Causal Tracing of Audio-Text Fusion in Large Audio Language Models
by: Chen, Wei-Chih, et al.
Published: (2026)
by: Chen, Wei-Chih, et al.
Published: (2026)
Similar Items
-
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
by: Lin, Guan-Ting, et al.
Published: (2024) -
SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR
by: Huang, Wei-Ping, et al.
Published: (2025) -
Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency
by: Lin, Guan-Ting, et al.
Published: (2026) -
Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities
by: Lin, Guan-Ting, et al.
Published: (2025) -
Prompt-Based One-Shot Exact Length-Controlled Generation with LLMs
by: Xie, Juncheng, et al.
Published: (2025)