Saved in:
| Main Authors: | Cui, Fangming, Fong, Jan, Zeng, Rongfei, Tian, Xinmei, Yu, Jun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.14376 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing Target-unspecific Tasks through a Features Matrix
by: Cui, Fangming, et al.
Published: (2025)
by: Cui, Fangming, et al.
Published: (2025)
Advancing Prompt Learning through an External Layer
by: Cui, Fangming, et al.
Published: (2024)
by: Cui, Fangming, et al.
Published: (2024)
Generalizable Prompt Learning of CLIP: A Brief Overview
by: Cui, Fangming, et al.
Published: (2025)
by: Cui, Fangming, et al.
Published: (2025)
Mask the Target: A Plug-and-Play Regularizer Against LoRA Forgetting
by: Xu, Runze, et al.
Published: (2026)
by: Xu, Runze, et al.
Published: (2026)
Fine-tuning MLLMs Without Forgetting Is Easier Than You Think
by: Li, He, et al.
Published: (2026)
by: Li, He, et al.
Published: (2026)
Textual Inversion for Efficient Adaptation of Open-Vocabulary Object Detectors Without Forgetting
by: Ruis, Frank, et al.
Published: (2025)
by: Ruis, Frank, et al.
Published: (2025)
Semantic Textual Similarity Assessment in Chest X-ray Reports Using a Domain-Specific Cosine-Based Metric
by: Picha, Sayeh Gholipour, et al.
Published: (2024)
by: Picha, Sayeh Gholipour, et al.
Published: (2024)
Linking Representations with Multimodal Contrastive Learning
by: Arora, Abhishek, et al.
Published: (2023)
by: Arora, Abhishek, et al.
Published: (2023)
Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation
by: Zhang, Xiaofeng, et al.
Published: (2024)
by: Zhang, Xiaofeng, et al.
Published: (2024)
CLEAR: Character Unlearning in Textual and Visual Modalities
by: Dontsov, Alexey, et al.
Published: (2024)
by: Dontsov, Alexey, et al.
Published: (2024)
Beyond the Textual: Generating Coherent Visual Options for MCQs
by: Wang, Wanqiang, et al.
Published: (2025)
by: Wang, Wanqiang, et al.
Published: (2025)
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
by: Gordon, Brian, et al.
Published: (2023)
by: Gordon, Brian, et al.
Published: (2023)
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
by: Tong, Jingqi, et al.
Published: (2025)
by: Tong, Jingqi, et al.
Published: (2025)
On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI
by: Restrepo, David, et al.
Published: (2025)
by: Restrepo, David, et al.
Published: (2025)
TRACE: Textual Relevance Augmentation and Contextual Encoding for Multimodal Hate Detection
by: Koushik, Girish A., et al.
Published: (2025)
by: Koushik, Girish A., et al.
Published: (2025)
Tell Me What's Next: Textual Foresight for Generic UI Representations
by: Burns, Andrea, et al.
Published: (2024)
by: Burns, Andrea, et al.
Published: (2024)
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions
by: Pi, Renjie, et al.
Published: (2024)
by: Pi, Renjie, et al.
Published: (2024)
Hierarchical Textual Knowledge for Enhanced Image Clustering
by: Zhong, Yijie, et al.
Published: (2026)
by: Zhong, Yijie, et al.
Published: (2026)
FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint Textual and Visual Clues
by: Li, Shuang, et al.
Published: (2024)
by: Li, Shuang, et al.
Published: (2024)
Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings
by: Agrawal, Aakriti, et al.
Published: (2025)
by: Agrawal, Aakriti, et al.
Published: (2025)
Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval
by: Sun, Hao, et al.
Published: (2026)
by: Sun, Hao, et al.
Published: (2026)
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
by: Wang, Yifan, et al.
Published: (2026)
by: Wang, Yifan, et al.
Published: (2026)
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
by: Wang, Chenglong, et al.
Published: (2024)
by: Wang, Chenglong, et al.
Published: (2024)
Prompt-OT: An Optimal Transport Regularization Paradigm for Knowledge Preservation in Vision-Language Model Adaptation
by: Chen, Xiwen, et al.
Published: (2025)
by: Chen, Xiwen, et al.
Published: (2025)
TTL: Test-time Textual Learning for OOD Detection with Pretrained Vision-Language Models
by: Ye, Jinlun, et al.
Published: (2026)
by: Ye, Jinlun, et al.
Published: (2026)
Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning
by: Hua, Jiacheng, et al.
Published: (2026)
by: Hua, Jiacheng, et al.
Published: (2026)
VisText-Mosquito: A Unified Multimodal Dataset for Visual Detection, Segmentation, and Textual Explanation on Mosquito Breeding Sites
by: Islam, Md. Adnanul, et al.
Published: (2025)
by: Islam, Md. Adnanul, et al.
Published: (2025)
Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks
by: Pantazopoulos, Georgios, et al.
Published: (2024)
by: Pantazopoulos, Georgios, et al.
Published: (2024)
VLN-NF: Feasibility-Aware Vision-and-Language Navigation with False-Premise Instructions
by: Su, Hung-Ting, et al.
Published: (2026)
by: Su, Hung-Ting, et al.
Published: (2026)
Perception Without Engagement: Dissecting the Causal Discovery Deficit in LMMs
by: Liang, Jiafeng, et al.
Published: (2026)
by: Liang, Jiafeng, et al.
Published: (2026)
FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions
by: Qin, Bowen, et al.
Published: (2025)
by: Qin, Bowen, et al.
Published: (2025)
Rethinking the Mixture of Vision Encoders Paradigm for Enhanced Visual Understanding in Multimodal LLMs
by: Azadani, Mozhgan Nasr, et al.
Published: (2025)
by: Azadani, Mozhgan Nasr, et al.
Published: (2025)
Beyond Meme Templates: Limitations of Visual Similarity Measures in Meme Matching
by: Hazman, Muzhaffar, et al.
Published: (2025)
by: Hazman, Muzhaffar, et al.
Published: (2025)
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
by: Dagan, Gautier, et al.
Published: (2024)
by: Dagan, Gautier, et al.
Published: (2024)
Think Visually, Reason Textually: Vision-Language Synergy in ARC
by: Zhang, Beichen, et al.
Published: (2025)
by: Zhang, Beichen, et al.
Published: (2025)
Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace
by: Du, Shian, et al.
Published: (2024)
by: Du, Shian, et al.
Published: (2024)
Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media
by: Zhang, Zhizhen, et al.
Published: (2024)
by: Zhang, Zhizhen, et al.
Published: (2024)
Prompting Forgetting: Unlearning in GANs via Textual Guidance
by: Nagasubramaniam, Piyush, et al.
Published: (2025)
by: Nagasubramaniam, Piyush, et al.
Published: (2025)
PUMGPT: A Large Vision-Language Model for Product Understanding
by: Xue, Wei, et al.
Published: (2023)
by: Xue, Wei, et al.
Published: (2023)
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation
by: Guo, Ziyu, et al.
Published: (2025)
by: Guo, Ziyu, et al.
Published: (2025)
Similar Items
-
Enhancing Target-unspecific Tasks through a Features Matrix
by: Cui, Fangming, et al.
Published: (2025) -
Advancing Prompt Learning through an External Layer
by: Cui, Fangming, et al.
Published: (2024) -
Generalizable Prompt Learning of CLIP: A Brief Overview
by: Cui, Fangming, et al.
Published: (2025) -
Mask the Target: A Plug-and-Play Regularizer Against LoRA Forgetting
by: Xu, Runze, et al.
Published: (2026) -
Fine-tuning MLLMs Without Forgetting Is Easier Than You Think
by: Li, He, et al.
Published: (2026)