:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Zeyi, Ojha, Utkarsh, Ji, Yuyang, Lee, Donghyun, Lee, Yong Jae
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.13058
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
by: Cai, Mu, et al.
Published: (2023)

Towards Universal Fake Image Detectors that Generalize Across Generative Models
by: Ojha, Utkarsh, et al.
Published: (2023)

Aligned Datasets Improve Detection of Latent Diffusion-Generated Images
by: Rajan, Anirudh Sundara, et al.
Published: (2024)

Yo'LLaVA: Your Personalized Language and Vision Assistant
by: Nguyen, Thao, et al.
Published: (2024)

Edit One for All: Interactive Batch Image Editing
by: Nguyen, Thao, et al.
Published: (2024)

VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
by: Huang, Zeyi, et al.
Published: (2025)

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
by: Huang, Zeyi, et al.
Published: (2025)

Do Vision-Language Models Understand Compound Nouns?
by: Kumar, Sonal, et al.
Published: (2024)

FastPoint: Accelerating 3D Point Cloud Model Inference via Sample Point Distance Prediction
by: Lee, Donghyun, et al.
Published: (2025)

Talk in Pieces, See in Whole: Disentangling and Hierarchical Aggregating Representations for Language-based Object Detection
by: An, Sojung, et al.
Published: (2025)

GS-Scale: Unlocking Large-Scale 3D Gaussian Splatting Training via Host Offloading
by: Lee, Donghyun, et al.
Published: (2025)

IMPROVE: Iterative Model Pipeline Refinement and Optimization Leveraging LLM Experts
by: Xue, Eric, et al.
Published: (2025)

PLATYPUS: Progressive Local Surface Estimator for Arbitrary-Scale Point Cloud Upsampling
by: Kim, Donghyun, et al.
Published: (2024)

MATE: Meet At The Embedding -- Connecting Images with Long Texts
by: Jang, Young Kyun, et al.
Published: (2024)

TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding
by: Yang, Fan, et al.
Published: (2026)

Do Your Best and Get Enough Rest for Continual Learning
by: Kang, Hankyul, et al.
Published: (2025)

Language-Guided Invariance Probing of Vision-Language Models
by: Lee, Jae Joong
Published: (2025)

Do Vision Transformers See Like Humans? Evaluating their Perceptual Alignment
by: Hernández-Cámara, Pablo, et al.
Published: (2025)

Your Embedding Model is SMARTer Than You Think
by: Zhang, Jianrui, et al.
Published: (2026)

MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models
by: Zou, Bocheng, et al.
Published: (2026)

Do Multimodal Large Language Models Understand Welding?
by: Khvatskii, Grigorii, et al.
Published: (2025)

Active Prompt Learning in Vision Language Models
by: Bang, Jihwan, et al.
Published: (2023)

VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought
by: Lee, Eunsoo, et al.
Published: (2026)

Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding
by: Ji, Yuyang, et al.
Published: (2025)

uCLIP: Parameter-Efficient Multilingual Extension of Vision-Language Models with Unpaired Data
by: Chung, Dahyun, et al.
Published: (2025)

DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?
by: Zhou, Tianhong, et al.
Published: (2025)

Do Vision Models Encode Object-Level Semantic Relatedness? A Cognitive Psychology-Inspired Benchmark
by: Lee, Hansang, et al.
Published: (2017)

FALCON: Frequency Adjoint Link with CONtinuous Density Mask for Fast Single Image Dehazing
by: Kim, Donghyun, et al.
Published: (2024)

Do Vision Language Models Understand Human Engagement in Games?
by: Wang, Ziyi, et al.
Published: (2026)

Low-Resolution Editing is All You Need for High-Resolution Editing
by: Lee, Junsung, et al.
Published: (2025)

Vision-Language Models Do Not Understand Negation
by: Alhamoud, Kumail, et al.
Published: (2025)

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
by: Nguyen, Le Thien Phuc, et al.
Published: (2025)

Can Large Vision Language Models Read Maps Like a Human?
by: Xing, Shuo, et al.
Published: (2025)

VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
by: Zou, Bocheng, et al.
Published: (2024)

Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks
by: Shandilya, Utkarsh, et al.
Published: (2025)

SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
by: Kim, Hoon, et al.
Published: (2024)

Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection
by: Rajan, Anirudh Sundara, et al.
Published: (2025)

Do Vision-Language Models Understand Visual Persuasiveness?
by: Park, Gyuwon
Published: (2025)

AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
by: Zhu, Yuhan, et al.
Published: (2024)

Toward Interactive Regional Understanding in Vision-Large Language Models
by: Lee, Jungbeom, et al.
Published: (2024)