Saved in:
| Main Authors: | Dinkel, Heinrich, Sun, Xingwei, Li, Gang, Mei, Jiahao, Niu, Yadong, Liu, Jizhong, Li, Xiyang, Liao, Yifan, Zhou, Jiahao, Zhang, Junbo, Luan, Jian |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.23765 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text
by: Mei, Jiahao, et al.
Published: (2026)
by: Mei, Jiahao, et al.
Published: (2026)
MiDashengLM: Efficient Audio Understanding with General Audio Captions
by: Dinkel, Heinrich, et al.
Published: (2025)
by: Dinkel, Heinrich, et al.
Published: (2025)
ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding
by: Niu, Yadong, et al.
Published: (2026)
by: Niu, Yadong, et al.
Published: (2026)
GLAP: General contrastive audio-text pretraining across domains and languages
by: Dinkel, Heinrich, et al.
Published: (2025)
by: Dinkel, Heinrich, et al.
Published: (2025)
Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
by: Li, Gang, et al.
Published: (2025)
by: Li, Gang, et al.
Published: (2025)
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
by: Niu, Yadong, et al.
Published: (2025)
by: Niu, Yadong, et al.
Published: (2025)
Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders
by: Sun, Xingwei, et al.
Published: (2025)
by: Sun, Xingwei, et al.
Published: (2025)
X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance
by: Zhang, Junbo, et al.
Published: (2025)
by: Zhang, Junbo, et al.
Published: (2025)
The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models
by: Dinkel, Heinrich, et al.
Published: (2026)
by: Dinkel, Heinrich, et al.
Published: (2026)
Scaling up masked audio encoder learning for general audio classification
by: Dinkel, Heinrich, et al.
Published: (2024)
by: Dinkel, Heinrich, et al.
Published: (2024)
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
by: Liu, Jizhong, et al.
Published: (2024)
by: Liu, Jizhong, et al.
Published: (2024)
Bridging Language Gaps in Audio-Text Retrieval
by: Yan, Zhiyong, et al.
Published: (2024)
by: Yan, Zhiyong, et al.
Published: (2024)
The ICME 2025 Audio Encoder Capability Challenge
by: Zhang, Junbo, et al.
Published: (2025)
by: Zhang, Junbo, et al.
Published: (2025)
Streaming Audio Transformers for Online Audio Tagging
by: Dinkel, Heinrich, et al.
Published: (2023)
by: Dinkel, Heinrich, et al.
Published: (2023)
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
by: Yu, Jiahao, et al.
Published: (2023)
by: Yu, Jiahao, et al.
Published: (2023)
A unified multimodal understanding and generation model for cross-disciplinary scientific research
by: Yang, Xiaomeng, et al.
Published: (2026)
by: Yang, Xiaomeng, et al.
Published: (2026)
Gauge flux generations of weakly magnetized Dirac spin liquid in a kagomé lattice
by: Pan, Si-Yu, et al.
Published: (2025)
by: Pan, Si-Yu, et al.
Published: (2025)
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
by: Li, Ning, et al.
Published: (2025)
by: Li, Ning, et al.
Published: (2025)
Enhancing Few-Shot Stock Trend Prediction with Large Language Models
by: Deng, Yiqi, et al.
Published: (2024)
by: Deng, Yiqi, et al.
Published: (2024)
TEAdapter: Supply abundant guidance for controllable text-to-music generation
by: Zou, Jialing, et al.
Published: (2024)
by: Zou, Jialing, et al.
Published: (2024)
Efficient 3D Content Reconstruction and Generation
by: Li, Jiahao
Published: (2026)
by: Li, Jiahao
Published: (2026)
ScenDi: 3D-to-2D Scene Diffusion Cascades for Urban Generation
by: Guo, Hanlei, et al.
Published: (2026)
by: Guo, Hanlei, et al.
Published: (2026)
Moreover. Are we just not clever enough to understand the mind?
Published: (1999)
Published: (1999)
Audio Dialogues: Dialogues dataset for audio and music understanding
by: Goel, Arushi, et al.
Published: (2024)
by: Goel, Arushi, et al.
Published: (2024)
asvspoof2015 in WebDataset Format
by: Yadong, Niu
Published: (2025)
by: Yadong, Niu
Published: (2025)
vocalimitationset in WebDataset Format
by: Yadong, Niu
Published: (2025)
by: Yadong, Niu
Published: (2025)
Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation
by: Chang, Kaiyan, et al.
Published: (2024)
by: Chang, Kaiyan, et al.
Published: (2024)
Adalbert Stifter
by: Dinkel, Benjamin
Published: (2025)
by: Dinkel, Benjamin
Published: (2025)
Finite‐Time Inverse Optimal Control for Low‐Order Stochastic Nonlinear Systems With Time‐Varying Orders
by: Mengmeng Jiang, et al.
Published: (2025)
by: Mengmeng Jiang, et al.
Published: (2025)
UniMIC: Token-Based Multimodal Interactive Coding for Human-AI Collaboration
by: Mao, Qi, et al.
Published: (2025)
by: Mao, Qi, et al.
Published: (2025)
A unified physics-informed generative operator framework for general inverse problems
by: Bao, Gang, et al.
Published: (2025)
by: Bao, Gang, et al.
Published: (2025)
Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization
by: Xing, Tiancheng, et al.
Published: (2025)
by: Xing, Tiancheng, et al.
Published: (2025)
Source code for "Urban expansion reconfigures the landscape of a neglected zoonosis by advancing the peri-urban fringe"
by: Li, Xiyang
Published: (2026)
by: Li, Xiyang
Published: (2026)
Heterogeneous Federated Fine-Tuning with Parallel One-Rank Adaptation
by: Zhang, Zikai, et al.
Published: (2026)
by: Zhang, Zikai, et al.
Published: (2026)
Direct numerical simulation of out-scale-actuated spanwise wall oscillation in turbulent boundary layers
by: Zhang, Jizhong, et al.
Published: (2026)
by: Zhang, Jizhong, et al.
Published: (2026)
A Moldable, Tough Mineral‐Dominated Nanocomposite as a Recyclable Structural Material
by: Yadong Yu, et al.
Published: (2025)
by: Yadong Yu, et al.
Published: (2025)
Q2A: Querying Implicit Fully Continuous Feature Pyramid to Align Features for Medical Image Segmentation
by: Yu, Jiahao, et al.
Published: (2024)
by: Yu, Jiahao, et al.
Published: (2024)
A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement Learning
by: Li, Jiahao, et al.
Published: (2025)
by: Li, Jiahao, et al.
Published: (2025)
One stabilization is not enough for contractible 4-manifolds
by: Kang, Sungkyung
Published: (2022)
by: Kang, Sungkyung
Published: (2022)
idekerlab/cellmaps_coembedding: Add umap generation
by: Joanna, et al.
Published: (2025)
by: Joanna, et al.
Published: (2025)
Similar Items
-
Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text
by: Mei, Jiahao, et al.
Published: (2026) -
MiDashengLM: Efficient Audio Understanding with General Audio Captions
by: Dinkel, Heinrich, et al.
Published: (2025) -
ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding
by: Niu, Yadong, et al.
Published: (2026) -
GLAP: General contrastive audio-text pretraining across domains and languages
by: Dinkel, Heinrich, et al.
Published: (2025) -
Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
by: Li, Gang, et al.
Published: (2025)