Saved in:
| Main Authors: | Kang, Wonjune, Wang, Yun, Zhang, Shun, Hinsvark, Arthur, He, Qing |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.06321 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Prompting Large Language Models with Audio for General-Purpose Speech Summarization
by: Kang, Wonjune, et al.
Published: (2024)
by: Kang, Wonjune, et al.
Published: (2024)
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
by: Kang, Wonjune, et al.
Published: (2025)
by: Kang, Wonjune, et al.
Published: (2025)
ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings
by: Brannon, William, et al.
Published: (2023)
by: Brannon, William, et al.
Published: (2023)
CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents
by: Roh, Taeyun, et al.
Published: (2026)
by: Roh, Taeyun, et al.
Published: (2026)
An investigation of phrase break prediction in an End-to-End TTS system
by: Vadapalli, Anandaswarup
Published: (2023)
by: Vadapalli, Anandaswarup
Published: (2023)
TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
by: Song, Xingchen, et al.
Published: (2024)
by: Song, Xingchen, et al.
Published: (2024)
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)
by: Bataev, Vladimir, et al.
Published: (2025)
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow
by: Sun, Haoyu, et al.
Published: (2025)
by: Sun, Haoyu, et al.
Published: (2025)
ScholaWrite: A Dataset of End-to-End Scholarly Writing Process
by: Le, Khanh Chi, et al.
Published: (2025)
by: Le, Khanh Chi, et al.
Published: (2025)
F5-TTS-RO: Extending F5-TTS to Romanian TTS via Lightweight Input Adaptation
by: Chivereanu, Radu-Gabriel, et al.
Published: (2025)
by: Chivereanu, Radu-Gabriel, et al.
Published: (2025)
Schema-Aware Multi-Task Learning for Complex Text-to-SQL
by: Wu, Yangjun, et al.
Published: (2024)
by: Wu, Yangjun, et al.
Published: (2024)
MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024)
by: Gumma, Varun, et al.
Published: (2024)
MOSS-TTS Technical Report
by: Gong, Yitian, et al.
Published: (2026)
by: Gong, Yitian, et al.
Published: (2026)
Leveraging Graph Structures and Large Language Models for End-to-End Synthetic Task-Oriented Dialogues
by: Medjad, Maya, et al.
Published: (2025)
by: Medjad, Maya, et al.
Published: (2025)
A Multi-Task Evaluation of LLMs' Processing of Academic Text Input
by: Li, Tianyi, et al.
Published: (2025)
by: Li, Tianyi, et al.
Published: (2025)
WAFFLE: Finetuning Multi-Modal Models for Automated Front-End Development
by: Liang, Shanchao, et al.
Published: (2024)
by: Liang, Shanchao, et al.
Published: (2024)
EE-TTS: Emphatic Expressive TTS with Linguistic Information
by: Zhong, Yi, et al.
Published: (2023)
by: Zhong, Yi, et al.
Published: (2023)
The Role of Natural Language Processing Tasks in Automatic Literary Character Network Construction
by: Amalvy, Arthur, et al.
Published: (2024)
by: Amalvy, Arthur, et al.
Published: (2024)
TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)
by: Liu, Yutong, et al.
Published: (2025)
ChatCFD: An LLM-Driven Agent for End-to-End CFD Automation with Structured Knowledge and Reasoning
by: Fan, E, et al.
Published: (2025)
by: Fan, E, et al.
Published: (2025)
Qwen3-TTS Technical Report
by: Hu, Hangrui, et al.
Published: (2026)
by: Hu, Hangrui, et al.
Published: (2026)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
by: Zhai, Yukun, et al.
Published: (2023)
by: Zhai, Yukun, et al.
Published: (2023)
TextBandit: Evaluating Probabilistic Reasoning in LLMs Through Language-Only Decision Tasks
by: Lim, Jimin, et al.
Published: (2025)
by: Lim, Jimin, et al.
Published: (2025)
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
by: Wei, Zhepei, et al.
Published: (2025)
by: Wei, Zhepei, et al.
Published: (2025)
Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning
by: Lee, Haeju, et al.
Published: (2024)
by: Lee, Haeju, et al.
Published: (2024)
Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs
by: Hu, Yi, et al.
Published: (2025)
by: Hu, Yi, et al.
Published: (2025)
MahaTTS: A Unified Framework for Multilingual Text-to-Speech Synthesis
by: Singh, Jaskaran, et al.
Published: (2025)
by: Singh, Jaskaran, et al.
Published: (2025)
Multi-Task Learning with LLMs for Implicit Sentiment Analysis: Data-level and Task-level Automatic Weight Learning
by: Lai, Wenna, et al.
Published: (2024)
by: Lai, Wenna, et al.
Published: (2024)
RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer
by: Matiyali, Neeraj, et al.
Published: (2025)
by: Matiyali, Neeraj, et al.
Published: (2025)
ManchuTTS: Towards High-Quality Manchu Speech Synthesis via Flow Matching and Hierarchical Text Representation
by: Wang, Suhua, et al.
Published: (2025)
by: Wang, Suhua, et al.
Published: (2025)
AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks
by: Wang, Fali, et al.
Published: (2025)
by: Wang, Fali, et al.
Published: (2025)
End-to-End Beam Retrieval for Multi-Hop Question Answering
by: Zhang, Jiahao, et al.
Published: (2023)
by: Zhang, Jiahao, et al.
Published: (2023)
More Data, Fewer Diacritics: Scaling Arabic TTS
by: Musleh, Ahmed, et al.
Published: (2026)
by: Musleh, Ahmed, et al.
Published: (2026)
FormalASR: End-to-End Spoken Chinese to Formal Text
by: Ning, Wanyi, et al.
Published: (2026)
by: Ning, Wanyi, et al.
Published: (2026)
E2E-AFG: An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation
by: Jiang, Yun, et al.
Published: (2024)
by: Jiang, Yun, et al.
Published: (2024)
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)
by: Liu, Yutong, et al.
Published: (2025)
MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text
by: Li, Chenjun, et al.
Published: (2026)
by: Li, Chenjun, et al.
Published: (2026)
FAID: Fine-Grained AI-Generated Text Detection Using Multi-Task Auxiliary and Multi-Level Contrastive Learning
by: Ta, Minh Ngoc, et al.
Published: (2025)
by: Ta, Minh Ngoc, et al.
Published: (2025)
Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge
by: Wu, Junjie, et al.
Published: (2026)
by: Wu, Junjie, et al.
Published: (2026)
Enhancing End-to-End Multi-Task Dialogue Systems: A Study on Intrinsic Motivation Reinforcement Learning Algorithms for Improved Training and Adaptability
by: Kamuni, Navin, et al.
Published: (2024)
by: Kamuni, Navin, et al.
Published: (2024)
Similar Items
-
Prompting Large Language Models with Audio for General-Purpose Speech Summarization
by: Kang, Wonjune, et al.
Published: (2024) -
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
by: Kang, Wonjune, et al.
Published: (2025) -
ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings
by: Brannon, William, et al.
Published: (2023) -
CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents
by: Roh, Taeyun, et al.
Published: (2026) -
An investigation of phrase break prediction in an End-to-End TTS system
by: Vadapalli, Anandaswarup
Published: (2023)