Saved in:
| Main Authors: | Cui, Fangming, Zhang, Yonggang, Wang, Xuan, Tian, Xinmei, Yu, Jun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.03414 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Similarity Paradigm Through Textual Regularization Without Forgetting
by: Cui, Fangming, et al.
Published: (2025)
by: Cui, Fangming, et al.
Published: (2025)
Generalizable Prompt Learning of CLIP: A Brief Overview
by: Cui, Fangming, et al.
Published: (2025)
by: Cui, Fangming, et al.
Published: (2025)
Advancing Prompt Learning through an External Layer
by: Cui, Fangming, et al.
Published: (2024)
by: Cui, Fangming, et al.
Published: (2024)
Detecting Generated Images by Fitting Natural Image Distributions
by: Zhang, Yonggang, et al.
Published: (2025)
by: Zhang, Yonggang, et al.
Published: (2025)
Epistemic Uncertainty for Generated Image Detection
by: Nie, Jun, et al.
Published: (2024)
by: Nie, Jun, et al.
Published: (2024)
TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration
by: Li, Yanshu, et al.
Published: (2025)
by: Li, Yanshu, et al.
Published: (2025)
Linking Representations with Multimodal Contrastive Learning
by: Arora, Abhishek, et al.
Published: (2023)
by: Arora, Abhishek, et al.
Published: (2023)
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
by: Joshi, Siddharth, et al.
Published: (2025)
by: Joshi, Siddharth, et al.
Published: (2025)
MobileRAG: Enhancing Mobile Agent with Retrieval-Augmented Generation
by: Loo, Gowen, et al.
Published: (2025)
by: Loo, Gowen, et al.
Published: (2025)
Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models
by: Li, Changqun, et al.
Published: (2024)
by: Li, Changqun, et al.
Published: (2024)
Recurrent Visual Feature Extraction and Stereo Attentions for CT Report Generation
by: Tian, Yuanhe, et al.
Published: (2025)
by: Tian, Yuanhe, et al.
Published: (2025)
Computed Tomography Visual Question Answering with Cross-modal Feature Graphing
by: Tian, Yuanhe, et al.
Published: (2025)
by: Tian, Yuanhe, et al.
Published: (2025)
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
by: Tian, Changyao, et al.
Published: (2024)
by: Tian, Changyao, et al.
Published: (2024)
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
by: Jia, Mengzhao, et al.
Published: (2024)
by: Jia, Mengzhao, et al.
Published: (2024)
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
by: Zhao, Zhiyuan, et al.
Published: (2023)
by: Zhao, Zhiyuan, et al.
Published: (2023)
Learning Speaker-Invariant Visual Features for Lipreading
by: Li, Yu, et al.
Published: (2025)
by: Li, Yu, et al.
Published: (2025)
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
by: Gao, Peng, et al.
Published: (2021)
by: Gao, Peng, et al.
Published: (2021)
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
by: Wang, Zhenhailong, et al.
Published: (2025)
by: Wang, Zhenhailong, et al.
Published: (2025)
HPE-CogVLM: Advancing Vision Language Models with a Head Pose Grounding Task
by: Tian, Yu, et al.
Published: (2024)
by: Tian, Yu, et al.
Published: (2024)
Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information
by: Zhao, Junbo, et al.
Published: (2025)
by: Zhao, Junbo, et al.
Published: (2025)
CoTasks: Chain-of-Thought based Video Instruction Tuning Tasks
by: Wang, Yanan, et al.
Published: (2025)
by: Wang, Yanan, et al.
Published: (2025)
Interpreting and Enhancing Emotional Circuits in Large Vision-Language Models via Cross-Modal Information Flow
by: Zhang, Chengsheng, et al.
Published: (2026)
by: Zhang, Chengsheng, et al.
Published: (2026)
Expert Pyramid Tuning: Efficient Parameter Fine-Tuning for Expertise-Driven Task Allocation
by: Zhang, Jia-Chen, et al.
Published: (2026)
by: Zhang, Jia-Chen, et al.
Published: (2026)
Growing a Multi-head Twig via Distillation and Reinforcement Learning to Accelerate Large Vision-Language Models
by: Shao, Zhenwei, et al.
Published: (2025)
by: Shao, Zhenwei, et al.
Published: (2025)
Enhancing Sentiment Analysis through Multimodal Fusion: A BERT-DINOv2 Approach
by: Zhao, Taoxu, et al.
Published: (2025)
by: Zhao, Taoxu, et al.
Published: (2025)
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE
by: Chen, Zeren, et al.
Published: (2023)
by: Chen, Zeren, et al.
Published: (2023)
TASO: Task-Aligned Sparse Optimization for Parameter-Efficient Model Adaptation
by: Miao, Daiye, et al.
Published: (2025)
by: Miao, Daiye, et al.
Published: (2025)
Superpixel Semantics Representation and Pre-training for Vision-Language Task
by: Zhang, Siyu, et al.
Published: (2023)
by: Zhang, Siyu, et al.
Published: (2023)
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
by: Xu, Zhiyang, et al.
Published: (2024)
by: Xu, Zhiyang, et al.
Published: (2024)
Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs
by: Zhang, Xuan, et al.
Published: (2025)
by: Zhang, Xuan, et al.
Published: (2025)
EFLNet: Enhancing Feature Learning for Infrared Small Target Detection
by: Yang, Bo, et al.
Published: (2023)
by: Yang, Bo, et al.
Published: (2023)
FLEX-CLIP: Feature-Level GEneration Network Enhanced CLIP for X-shot Cross-modal Retrieval
by: Xie, Jingyou, et al.
Published: (2024)
by: Xie, Jingyou, et al.
Published: (2024)
Progressive Feature Fusion Network for Enhancing Image Quality Assessment
by: Wu, Kaiqun, et al.
Published: (2024)
by: Wu, Kaiqun, et al.
Published: (2024)
Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal
by: Wang, Yuhao, et al.
Published: (2024)
by: Wang, Yuhao, et al.
Published: (2024)
From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs
by: Liu, Yulong, et al.
Published: (2024)
by: Liu, Yulong, et al.
Published: (2024)
The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning
by: Chen, Renmiao, et al.
Published: (2026)
by: Chen, Renmiao, et al.
Published: (2026)
GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
by: Siingh, Shikhhar, et al.
Published: (2025)
by: Siingh, Shikhhar, et al.
Published: (2025)
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
by: Zhang, Wenqi, et al.
Published: (2025)
by: Zhang, Wenqi, et al.
Published: (2025)
FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual Contexts
by: Kawamura, Kazuki, et al.
Published: (2024)
by: Kawamura, Kazuki, et al.
Published: (2024)
Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning
by: Yan, Yang, et al.
Published: (2025)
by: Yan, Yang, et al.
Published: (2025)
Similar Items
-
A Similarity Paradigm Through Textual Regularization Without Forgetting
by: Cui, Fangming, et al.
Published: (2025) -
Generalizable Prompt Learning of CLIP: A Brief Overview
by: Cui, Fangming, et al.
Published: (2025) -
Advancing Prompt Learning through an External Layer
by: Cui, Fangming, et al.
Published: (2024) -
Detecting Generated Images by Fitting Natural Image Distributions
by: Zhang, Yonggang, et al.
Published: (2025) -
Epistemic Uncertainty for Generated Image Detection
by: Nie, Jun, et al.
Published: (2024)