Saved in:
| Main Authors: | Luo, Tiange, Cao, Ang, Lee, Gunhee, Johnson, Justin, Lee, Honglak |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.00569 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Visual Test-time Scaling for GUI Agent Grounding
by: Luo, Tiange, et al.
Published: (2025)
by: Luo, Tiange, et al.
Published: (2025)
View Selection for 3D Captioning via Diffusion Ranking
by: Luo, Tiange, et al.
Published: (2024)
by: Luo, Tiange, et al.
Published: (2024)
Selective LoRA for Visual Tokens and Attention Heads
by: Luo, Tiange, et al.
Published: (2025)
by: Luo, Tiange, et al.
Published: (2025)
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
by: Jang, Yunseok, et al.
Published: (2025)
by: Jang, Yunseok, et al.
Published: (2025)
Subtask-Aware Visual Reward Learning from Segmented Demonstrations
by: Kim, Changyeon, et al.
Published: (2025)
by: Kim, Changyeon, et al.
Published: (2025)
Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models
by: Zheng, Shunjie-Fabian, et al.
Published: (2025)
by: Zheng, Shunjie-Fabian, et al.
Published: (2025)
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
by: Nasiriany, Soroush, et al.
Published: (2024)
by: Nasiriany, Soroush, et al.
Published: (2024)
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
by: Xu, Sihan, et al.
Published: (2023)
by: Xu, Sihan, et al.
Published: (2023)
VRIQ: Benchmarking and Analyzing Visual-Reasoning IQ of VLMs
by: Khezresmaeilzadeh, Tina, et al.
Published: (2026)
by: Khezresmaeilzadeh, Tina, et al.
Published: (2026)
VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes
by: Gavrikov, Paul, et al.
Published: (2025)
by: Gavrikov, Paul, et al.
Published: (2025)
Vision Verification Enhanced Fusion of VLMs for Efficient Visual Reasoning
by: Tekin, Selim Furkan, et al.
Published: (2026)
by: Tekin, Selim Furkan, et al.
Published: (2026)
Toward Inherently Robust VLMs Against Visual Perception Attacks
by: MohajerAnsari, Pedram, et al.
Published: (2025)
by: MohajerAnsari, Pedram, et al.
Published: (2025)
Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
by: Golovanevsky, Michal, et al.
Published: (2025)
by: Golovanevsky, Michal, et al.
Published: (2025)
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs
by: Zhang, Jianrui, et al.
Published: (2026)
by: Zhang, Jianrui, et al.
Published: (2026)
Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
by: Pan, Zhiyu, et al.
Published: (2026)
by: Pan, Zhiyu, et al.
Published: (2026)
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
by: Karamcheti, Siddharth, et al.
Published: (2024)
by: Karamcheti, Siddharth, et al.
Published: (2024)
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
by: Wang, Xiyao, et al.
Published: (2025)
by: Wang, Xiyao, et al.
Published: (2025)
VGS-Decoding: Visual Grounding Score Guided Decoding for Hallucination Mitigation in Medical VLMs
by: Kolli, Govinda, et al.
Published: (2026)
by: Kolli, Govinda, et al.
Published: (2026)
Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs
by: Izadi, Amirmohammad, et al.
Published: (2025)
by: Izadi, Amirmohammad, et al.
Published: (2025)
VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?
by: Kim, Minkyu, et al.
Published: (2026)
by: Kim, Minkyu, et al.
Published: (2026)
A self-supervised framework for learning whole slide representations
by: Hou, Xinhai, et al.
Published: (2024)
by: Hou, Xinhai, et al.
Published: (2024)
Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders
by: Jang, Jongseong, et al.
Published: (2022)
by: Jang, Jongseong, et al.
Published: (2022)
CircuitProbe: Tracing Visual Temporal Evidence Flow in Video Language Models
by: Zhang, Yiming, et al.
Published: (2025)
by: Zhang, Yiming, et al.
Published: (2025)
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
by: Kim, Chris Dongjoo, et al.
Published: (2025)
by: Kim, Chris Dongjoo, et al.
Published: (2025)
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
by: Qin, Yiming, et al.
Published: (2025)
by: Qin, Yiming, et al.
Published: (2025)
Unified Text-Image-to-Video Generation: A Training-Free Approach to Flexible Visual Conditioning
by: Lai, Bolin, et al.
Published: (2025)
by: Lai, Bolin, et al.
Published: (2025)
Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters
by: Li, Kevin Y., et al.
Published: (2024)
by: Li, Kevin Y., et al.
Published: (2024)
COVR:Collaborative Optimization of VLMs and RL Agent for Visual-Based Control
by: Xia, Canming, et al.
Published: (2026)
by: Xia, Canming, et al.
Published: (2026)
From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs)
by: Mishra, Suyash, et al.
Published: (2026)
by: Mishra, Suyash, et al.
Published: (2026)
Selecting Fine-Tuning Examples by Quizzing VLMs
by: Ji, Tenghao, et al.
Published: (2025)
by: Ji, Tenghao, et al.
Published: (2025)
Exploration of VLMs for Driver Monitoring Systems Applications
by: Cañas, Paola Natalia, et al.
Published: (2025)
by: Cañas, Paola Natalia, et al.
Published: (2025)
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
by: Park, Simon, et al.
Published: (2025)
by: Park, Simon, et al.
Published: (2025)
Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance
by: Wang, Xiangxiang, et al.
Published: (2025)
by: Wang, Xiangxiang, et al.
Published: (2025)
Discovering and Mitigating Visual Biases through Keyword Explanation
by: Kim, Younghyun, et al.
Published: (2023)
by: Kim, Younghyun, et al.
Published: (2023)
Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity and Performance Restoration
by: He, Shwai, et al.
Published: (2024)
by: He, Shwai, et al.
Published: (2024)
Light Cones For Vision: Simple Causal Priors For Visual Hierarchy
by: Kartik, Manglam, et al.
Published: (2026)
by: Kartik, Manglam, et al.
Published: (2026)
Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints
by: Kumar, Sunil, et al.
Published: (2025)
by: Kumar, Sunil, et al.
Published: (2025)
When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective
by: Tsao, Hsi-Ai, et al.
Published: (2024)
by: Tsao, Hsi-Ai, et al.
Published: (2024)
Feudal Networks for Visual Navigation
by: Johnson, Faith, et al.
Published: (2024)
by: Johnson, Faith, et al.
Published: (2024)
Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models
by: Lee, Jihoon, et al.
Published: (2025)
by: Lee, Jihoon, et al.
Published: (2025)
Similar Items
-
Visual Test-time Scaling for GUI Agent Grounding
by: Luo, Tiange, et al.
Published: (2025) -
View Selection for 3D Captioning via Diffusion Ranking
by: Luo, Tiange, et al.
Published: (2024) -
Selective LoRA for Visual Tokens and Attention Heads
by: Luo, Tiange, et al.
Published: (2025) -
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
by: Jang, Yunseok, et al.
Published: (2025) -
Subtask-Aware Visual Reward Learning from Segmented Demonstrations
by: Kim, Changyeon, et al.
Published: (2025)