Saved in:
| Main Author: | Li, Xiu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.23335 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Bayesian Lottery Ticket Hypothesis
by: Kuhn, Nicholas, et al.
Published: (2026)
by: Kuhn, Nicholas, et al.
Published: (2026)
The Manifold Hypothesis for Gradient-Based Explanations
by: Bordt, Sebastian, et al.
Published: (2022)
by: Bordt, Sebastian, et al.
Published: (2022)
A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models
by: Xiu, Lixin, et al.
Published: (2026)
by: Xiu, Lixin, et al.
Published: (2026)
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation
by: Zhang, Junjie, et al.
Published: (2024)
by: Zhang, Junjie, et al.
Published: (2024)
EchoPrime: A Multi-Video View-Informed Vision-Language Model for Comprehensive Echocardiography Interpretation
by: Vukadinovic, Milos, et al.
Published: (2024)
by: Vukadinovic, Milos, et al.
Published: (2024)
Probing Visual Language Priors in VLMs
by: Luo, Tiange, et al.
Published: (2024)
by: Luo, Tiange, et al.
Published: (2024)
When Should We Prefer State-to-Visual DAgger Over Visual Reinforcement Learning?
by: Mu, Tongzhou, et al.
Published: (2024)
by: Mu, Tongzhou, et al.
Published: (2024)
Long-Tailed Object Detection Pre-training: Dynamic Rebalancing Contrastive Learning with Dual Reconstruction
by: Duan, Chen-Long, et al.
Published: (2024)
by: Duan, Chen-Long, et al.
Published: (2024)
The Universal Weight Subspace Hypothesis
by: Kaushik, Prakhar, et al.
Published: (2025)
by: Kaushik, Prakhar, et al.
Published: (2025)
Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models
by: Li, Xu, et al.
Published: (2024)
by: Li, Xu, et al.
Published: (2024)
Mostly Text, Smart Visuals: Asymmetric Text-Visual Pruning for Large Vision-Language Models
by: Li, Sijie, et al.
Published: (2026)
by: Li, Sijie, et al.
Published: (2026)
Visual Prompting in Multimodal Large Language Models: A Survey
by: Wu, Junda, et al.
Published: (2024)
by: Wu, Junda, et al.
Published: (2024)
Visual-Guided Key-Token Regularization for Multimodal Large Language Model Unlearning
by: Cai, Chengyi, et al.
Published: (2026)
by: Cai, Chengyi, et al.
Published: (2026)
Attribute-based Visual Reprogramming for Vision-Language Models
by: Cai, Chengyi, et al.
Published: (2025)
by: Cai, Chengyi, et al.
Published: (2025)
CircuitProbe: Tracing Visual Temporal Evidence Flow in Video Language Models
by: Zhang, Yiming, et al.
Published: (2025)
by: Zhang, Yiming, et al.
Published: (2025)
Bi-Level Unsupervised Feature Selection
by: Liu, Jingjing, et al.
Published: (2025)
by: Liu, Jingjing, et al.
Published: (2025)
GAC-KAN: An Ultra-Lightweight GNSS Interference Classifier for GenAI-Powered Consumer Edge Devices
by: Zeng, Zhihan, et al.
Published: (2026)
by: Zeng, Zhihan, et al.
Published: (2026)
Visual Perturbation and Adaptive Hard Negative Contrastive Learning for Compositional Reasoning in Vision-Language Models
by: Huang, Xin, et al.
Published: (2025)
by: Huang, Xin, et al.
Published: (2025)
Embedding-perturbed Exploration Preference Optimization for Flow Models
by: Hu, Sujie, et al.
Published: (2026)
by: Hu, Sujie, et al.
Published: (2026)
GRPO-TTA: Test-Time Visual Tuning for Vision-Language Models via GRPO-Driven Reinforcement Learning
by: Li, Yujun, et al.
Published: (2026)
by: Li, Yujun, et al.
Published: (2026)
Introducing Visual Perception Token into Multimodal Large Language Model
by: Yu, Runpeng, et al.
Published: (2025)
by: Yu, Runpeng, et al.
Published: (2025)
Towards Interpreting Visual Information Processing in Vision-Language Models
by: Neo, Clement, et al.
Published: (2024)
by: Neo, Clement, et al.
Published: (2024)
Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information
by: Chu, Xu, et al.
Published: (2025)
by: Chu, Xu, et al.
Published: (2025)
VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models
by: Hou, Haowen, et al.
Published: (2024)
by: Hou, Haowen, et al.
Published: (2024)
GHOST: Gaussian Hypothesis Open-Set Technique
by: Rabinowitz, Ryan, et al.
Published: (2025)
by: Rabinowitz, Ryan, et al.
Published: (2025)
LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining
by: Shen, Huawen, et al.
Published: (2024)
by: Shen, Huawen, et al.
Published: (2024)
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
by: Berasi, Davide, et al.
Published: (2025)
by: Berasi, Davide, et al.
Published: (2025)
How Visual Representations Map to Language Feature Space in Multimodal LLMs
by: Venhoff, Constantin, et al.
Published: (2025)
by: Venhoff, Constantin, et al.
Published: (2025)
Self-Evolving Visual Concept Library using Vision-Language Critics
by: Sehgal, Atharva, et al.
Published: (2025)
by: Sehgal, Atharva, et al.
Published: (2025)
EasyARC: Evaluating Vision Language Models on True Visual Reasoning
by: Unsal, Mert, et al.
Published: (2025)
by: Unsal, Mert, et al.
Published: (2025)
STAR-Net: An Interpretable Model-Aided Network for Remote Sensing Image Denoising
by: Liu, Jingjing, et al.
Published: (2025)
by: Liu, Jingjing, et al.
Published: (2025)
Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models
by: Wang, Ruiyu, et al.
Published: (2025)
by: Wang, Ruiyu, et al.
Published: (2025)
Forecasting and Visualizing Air Quality from Sky Images with Vision-Language Models
by: Vahdatpour, Mohammad Saleh, et al.
Published: (2025)
by: Vahdatpour, Mohammad Saleh, et al.
Published: (2025)
Temporal Visual Semantics-Induced Human Motion Understanding with Large Language Models
by: Xing, Zheng, et al.
Published: (2025)
by: Xing, Zheng, et al.
Published: (2025)
Assessing the Visual Enumeration Abilities of Specialized Counting Architectures and Vision-Language Models
by: Hou, Kuinan, et al.
Published: (2025)
by: Hou, Kuinan, et al.
Published: (2025)
InfiniteDance: Scalable 3D Dance Generation Towards in-the-wild Generalization
by: Li, Ronghui, et al.
Published: (2026)
by: Li, Ronghui, et al.
Published: (2026)
Predictive Regularization Against Visual Representation Degradation in Multimodal Large Language Models
by: Wang, Enguang, et al.
Published: (2026)
by: Wang, Enguang, et al.
Published: (2026)
The Platonic Representation Hypothesis
by: Huh, Minyoung, et al.
Published: (2024)
by: Huh, Minyoung, et al.
Published: (2024)
Unleashing the Power of Vision-Language Models for Long-Tailed Multi-Label Visual Recognition
by: Tang, Wei, et al.
Published: (2025)
by: Tang, Wei, et al.
Published: (2025)
TExplain: Explaining Learned Visual Features via Pre-trained (Frozen) Language Models
by: Taghanaki, Saeid Asgari, et al.
Published: (2023)
by: Taghanaki, Saeid Asgari, et al.
Published: (2023)
Similar Items
-
Bayesian Lottery Ticket Hypothesis
by: Kuhn, Nicholas, et al.
Published: (2026) -
The Manifold Hypothesis for Gradient-Based Explanations
by: Bordt, Sebastian, et al.
Published: (2022) -
A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models
by: Xiu, Lixin, et al.
Published: (2026) -
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation
by: Zhang, Junjie, et al.
Published: (2024) -
EchoPrime: A Multi-Video View-Informed Vision-Language Model for Comprehensive Echocardiography Interpretation
by: Vukadinovic, Milos, et al.
Published: (2024)