:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Luo, Tiange, Cao, Ang, Lee, Gunhee, Johnson, Justin, Lee, Honglak
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2501.00569
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Visual Test-time Scaling for GUI Agent Grounding
by: Luo, Tiange, et al.
Published: (2025)

View Selection for 3D Captioning via Diffusion Ranking
by: Luo, Tiange, et al.
Published: (2024)

Selective LoRA for Visual Tokens and Attention Heads
by: Luo, Tiange, et al.
Published: (2025)

Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
by: Jang, Yunseok, et al.
Published: (2025)

Subtask-Aware Visual Reward Learning from Segmented Demonstrations
by: Kim, Changyeon, et al.
Published: (2025)

Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models
by: Zheng, Shunjie-Fabian, et al.
Published: (2025)

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
by: Nasiriany, Soroush, et al.
Published: (2024)

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
by: Xu, Sihan, et al.
Published: (2023)

VRIQ: Benchmarking and Analyzing Visual-Reasoning IQ of VLMs
by: Khezresmaeilzadeh, Tina, et al.
Published: (2026)

VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes
by: Gavrikov, Paul, et al.
Published: (2025)

Vision Verification Enhanced Fusion of VLMs for Efficient Visual Reasoning
by: Tekin, Selim Furkan, et al.
Published: (2026)

Toward Inherently Robust VLMs Against Visual Perception Attacks
by: MohajerAnsari, Pedram, et al.
Published: (2025)

Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
by: Golovanevsky, Michal, et al.
Published: (2025)

Unified Spatio-Temporal Token Scoring for Efficient Video VLMs
by: Zhang, Jianrui, et al.
Published: (2026)

Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
by: Pan, Zhiyu, et al.
Published: (2026)

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
by: Karamcheti, Siddharth, et al.
Published: (2024)

ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
by: Wang, Xiyao, et al.
Published: (2025)

VGS-Decoding: Visual Grounding Score Guided Decoding for Hallucination Mitigation in Medical VLMs
by: Kolli, Govinda, et al.
Published: (2026)

Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs
by: Izadi, Amirmohammad, et al.
Published: (2025)

VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?
by: Kim, Minkyu, et al.
Published: (2026)

A self-supervised framework for learning whole slide representations
by: Hou, Xinhai, et al.
Published: (2024)

Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders
by: Jang, Jongseong, et al.
Published: (2022)

CircuitProbe: Tracing Visual Temporal Evidence Flow in Video Language Models
by: Zhang, Yiming, et al.
Published: (2025)

ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
by: Kim, Chris Dongjoo, et al.
Published: (2025)

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
by: Qin, Yiming, et al.
Published: (2025)

Unified Text-Image-to-Video Generation: A Training-Free Approach to Flexible Visual Conditioning
by: Lai, Bolin, et al.
Published: (2025)

Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters
by: Li, Kevin Y., et al.
Published: (2024)

COVR:Collaborative Optimization of VLMs and RL Agent for Visual-Based Control
by: Xia, Canming, et al.
Published: (2026)

From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs)
by: Mishra, Suyash, et al.
Published: (2026)

Selecting Fine-Tuning Examples by Quizzing VLMs
by: Ji, Tenghao, et al.
Published: (2025)

Exploration of VLMs for Driver Monitoring Systems Applications
by: Cañas, Paola Natalia, et al.
Published: (2025)

Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
by: Park, Simon, et al.
Published: (2025)

Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance
by: Wang, Xiangxiang, et al.
Published: (2025)

Discovering and Mitigating Visual Biases through Keyword Explanation
by: Kim, Younghyun, et al.
Published: (2023)

Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity and Performance Restoration
by: He, Shwai, et al.
Published: (2024)

Light Cones For Vision: Simple Causal Priors For Visual Hierarchy
by: Kartik, Manglam, et al.
Published: (2026)

Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints
by: Kumar, Sunil, et al.
Published: (2025)

When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective
by: Tsao, Hsi-Ai, et al.
Published: (2024)

Feudal Networks for Visual Navigation
by: Johnson, Faith, et al.
Published: (2024)

Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models
by: Lee, Jihoon, et al.
Published: (2025)