Saved in:
| Main Authors: | Ru, Jinghan, Xie, Yuxin, Zhuang, Xianwei, Yin, Yuguo, Guo, Zhihui, Liu, Zhiming, Ren, Qianli, Zou, Yuexian |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.06604 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
by: Zhuang, Xianwei, et al.
Published: (2025)
by: Zhuang, Xianwei, et al.
Published: (2025)
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
by: Zhuang, Xianwei, et al.
Published: (2025)
by: Zhuang, Xianwei, et al.
Published: (2025)
ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
by: Yin, Yuguo, et al.
Published: (2025)
by: Yin, Yuguo, et al.
Published: (2025)
DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs
by: Ru, Jinghan, et al.
Published: (2026)
by: Ru, Jinghan, et al.
Published: (2026)
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
by: Zhuang, Xianwei, et al.
Published: (2025)
by: Zhuang, Xianwei, et al.
Published: (2025)
SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization
by: Luo, Jiehui, et al.
Published: (2025)
by: Luo, Jiehui, et al.
Published: (2025)
ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval
by: Fu, Siyuan, et al.
Published: (2025)
by: Fu, Siyuan, et al.
Published: (2025)
Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation
by: Tang, Lexiang, et al.
Published: (2025)
by: Tang, Lexiang, et al.
Published: (2025)
Do we really need the Rademacher complexities?
by: Bartl, Daniel, et al.
Published: (2025)
by: Bartl, Daniel, et al.
Published: (2025)
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?
by: Zanella, Maxime, et al.
Published: (2024)
by: Zanella, Maxime, et al.
Published: (2024)
Sequestration by the biological carbon pump: Do we really know what we are talking about?
by: Andre W. Visser
Published: (2025)
by: Andre W. Visser
Published: (2025)
Do we really ponder about necessity of intravenous hydration in acute bronchiolitis?
by: Sule Yıldırım
Published: (2016)
by: Sule Yıldırım
Published: (2016)
Do we really need Self-Attention for Streaming Automatic Speech Recognition?
by: Dkhissi, Youness, et al.
Published: (2026)
by: Dkhissi, Youness, et al.
Published: (2026)
Religion and spirituality in counselor education: Do we really need to talk about this?
by: Jesse Fox
Published: (2024)
by: Jesse Fox
Published: (2024)
HeartMuLa: A Family of Open Sourced Music Foundation Models
by: Yang, Dongchao, et al.
Published: (2026)
by: Yang, Dongchao, et al.
Published: (2026)
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
by: Xin, Yifei, et al.
Published: (2023)
by: Xin, Yifei, et al.
Published: (2023)
The LHC has ruled out Supersymmetry -- really?
by: Constantin, L., et al.
Published: (2025)
by: Constantin, L., et al.
Published: (2025)
Data filtering methods for training language models
by: Shevchenko, Egor, et al.
Published: (2026)
by: Shevchenko, Egor, et al.
Published: (2026)
AFL-Net: Integrating Audio, Facial, and Lip Modalities with a Two-step Cross-attention for Robust Speaker Diarization in the Wild
by: Yin, Yongkang, et al.
Published: (2023)
by: Yin, Yongkang, et al.
Published: (2023)
Friendship-paradox paradox: Do most people's friends really have more friends than they do?
by: Lee, Sang Hoon
Published: (2025)
by: Lee, Sang Hoon
Published: (2025)
Do Multilingual LLMs have specialized language heads?
by: Naufil, Muhammad
Published: (2026)
by: Naufil, Muhammad
Published: (2026)
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
by: Chen, Zhanpeng, et al.
Published: (2025)
by: Chen, Zhanpeng, et al.
Published: (2025)
How much freedom does an effectiveness metric really have?
by: Alistair Moffat, et al.
Published: (2024)
by: Alistair Moffat, et al.
Published: (2024)
Do prompt positions really matter?
by: Mao, Junyu, et al.
Published: (2023)
by: Mao, Junyu, et al.
Published: (2023)
Conceptualizing transgender experiences in psychology: Do we have a ‘true’ gender?
by: Emma F. Jackson, et al.
Published: (2024)
by: Emma F. Jackson, et al.
Published: (2024)
Getting the most out of your tokenizer for pre-training and domain adaptation
by: Dagan, Gautier, et al.
Published: (2024)
by: Dagan, Gautier, et al.
Published: (2024)
Universal pre-training by iterated random computation
by: Bloem, Peter
Published: (2025)
by: Bloem, Peter
Published: (2025)
COPD: Are we using all the tools we have?
by: A. Araújo
Published: (2016)
by: A. Araújo
Published: (2016)
STAR: Speech-to-Audio Generation via Representation Learning
by: Xie, Zeyu, et al.
Published: (2025)
by: Xie, Zeyu, et al.
Published: (2025)
Uncertainty-aware sign language video retrieval with probability distribution modeling
by: Wu, Xuan, et al.
Published: (2024)
by: Wu, Xuan, et al.
Published: (2024)
The mass-to-flux ratio in molecular clouds. What are we really measuring?
by: Tritsis, Aris
Published: (2025)
by: Tritsis, Aris
Published: (2025)
FakeSound2: A Benchmark for Explainable and Generalizable Deepfake Sound Detection
by: Xie, Zeyu, et al.
Published: (2025)
by: Xie, Zeyu, et al.
Published: (2025)
Do we have to fear tax competition among "new" and "old" European countries?
by: Simon Schnyder
Published: (2006)
by: Simon Schnyder
Published: (2006)
Do massive neutrino states really exist?
by: Shelkovkin, Danil D., et al.
Published: (2025)
by: Shelkovkin, Danil D., et al.
Published: (2025)
Linking agricultural conservation to water quality outcomes in the United States at multiple scales: Do we have the information we need?
by: Laura Naslund, et al.
Published: (2025)
by: Laura Naslund, et al.
Published: (2025)
Tensor train methods for high-dimensional nonlinear filtering problems with correlated noise
by: Meng, Yuhua, et al.
Published: (2026)
by: Meng, Yuhua, et al.
Published: (2026)
What we have accomplished and what we can achieve
by: A. Morais
Published: (2014)
by: A. Morais
Published: (2014)
Can pre-trained language models generate titles for research papers?
by: Rehman, Tohida, et al.
Published: (2024)
by: Rehman, Tohida, et al.
Published: (2024)
Do we have a quantum computer? Expert perspectives on current status and future prospects
by: Doyle, Liam, et al.
Published: (2026)
by: Doyle, Liam, et al.
Published: (2026)
Do we have to choose between economic or environmental performance? The case of the ceramic industry cluster
by: Teresa Vallet‐Bellmunt, et al.
Published: (2024)
by: Teresa Vallet‐Bellmunt, et al.
Published: (2024)
Similar Items
-
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
by: Zhuang, Xianwei, et al.
Published: (2025) -
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
by: Zhuang, Xianwei, et al.
Published: (2025) -
ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
by: Yin, Yuguo, et al.
Published: (2025) -
DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs
by: Ru, Jinghan, et al.
Published: (2026) -
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
by: Zhuang, Xianwei, et al.
Published: (2025)