:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Garcia, Fernando Gabriela, Burns, Spencer, Shaw, Ryan, Young, Hunter
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.04151
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LVLM-Composer's Explicit Planning for Image Generation
by: Ramsey, Spencer, et al.
Published: (2025)

Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation
by: Hwang, Yerin, et al.
Published: (2025)

ContextGuard-LVLM: Enhancing News Veracity through Fine-grained Cross-modal Contextual Consistency Verification
by: Ma, Sihan, et al.
Published: (2025)

LVLM-Aware Multimodal Retrieval for RAG-Based Medical Diagnosis with General-Purpose Models
by: Mazor, Nir, et al.
Published: (2025)

Kestrel: Grounding Self-Refinement for LVLM Hallucination Mitigation
by: Mao, Jiawei, et al.
Published: (2026)

LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models
by: Stan, Gabriela Ben Melech, et al.
Published: (2024)

POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
by: Zhu, Lanyun, et al.
Published: (2025)

LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection
by: Liu, Qingyuan, et al.
Published: (2025)

A Generic Self-Supervised Framework of Learning Invariant Discriminative Features
by: Ntelemis, Foivos, et al.
Published: (2022)

An LLM-LVLM Driven Agent for Iterative and Fine-Grained Image Editing
by: Liang, Zihan, et al.
Published: (2025)

Beyond Generation: Unlocking Universal Editing via Self-Supervised Fine-Tuning
by: Chen, Harold Haodong, et al.
Published: (2024)

ImageInWords: Unlocking Hyper-Detailed Image Descriptions
by: Garg, Roopal, et al.
Published: (2024)

Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance
by: Hur, Jiwan, et al.
Published: (2024)

VAUQ: Vision-Aware Uncertainty Quantification for LVLM Self-Evaluation
by: Park, Seongheon, et al.
Published: (2026)

Language-Unlocked ViT (LUViT): Empowering Self-Supervised Vision Transformers with LLMs
by: Kuzucu, Selim, et al.
Published: (2025)

PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking
by: Zou, Quanchen, et al.
Published: (2025)

AdaIAT: Adaptively Increasing Attention to Generated Text to Alleviate Hallucinations in LVLM
by: Zhong, Li'an, et al.
Published: (2026)

USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
by: Chu, Xiangxiang, et al.
Published: (2025)

Canvas-to-Image: Compositional Image Generation with Multimodal Controls
by: Dalva, Yusuf, et al.
Published: (2025)

Attention-Based Chaotic Self-Supervision for Medical Image Classification
by: Florindo, Joao Batista, et al.
Published: (2026)

V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning
by: Mur-Labadia, Lorenzo, et al.
Published: (2026)

ID-Selection: Importance-Diversity Based Visual Token Selection for Efficient LVLM Inference
by: Huang, Zhaohong, et al.
Published: (2026)

Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models
by: Park, NaHyeon, et al.
Published: (2025)

Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation
by: Kim, Young Kyung, et al.
Published: (2025)

SignRep: Enhancing Self-Supervised Sign Representations
by: Wong, Ryan, et al.
Published: (2025)

A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification
by: Marks, Markus, et al.
Published: (2024)

GenSelfDiff-HIS: Generative Self-Supervision Using Diffusion for Histopathological Image Segmentation
by: Purma, Vishnuvardhan, et al.
Published: (2023)

Multi-Level LVLM Guidance for Untrimmed Video Action Recognition
by: Peng, Liyang, et al.
Published: (2025)

EmoFeedback$^2$: Reinforcement of Continuous Emotional Image Generation via LVLM-based Reward and Textual Feedback
by: Jia, Jingyang, et al.
Published: (2025)

Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective
by: Zhong, Nan, et al.
Published: (2025)

SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation
by: Wu, Zhenbei, et al.
Published: (2024)

Measuring and Controlling the Spectral Bias for Self-Supervised Image Denoising
by: Zhang, Wang, et al.
Published: (2025)

Before Forgetting, Learn to Remember: Revisiting Foundational Learning Failures in LVLM Unlearning Benchmarks
by: Kwon, JuneHyoung, et al.
Published: (2026)

Unlocking Generalization in Polyp Segmentation with DINO Self-Attention "keys"
by: Monteiro, Carla, et al.
Published: (2025)

Self-Supervised Learning of Plant Image Representations
by: Moummad, Ilyass, et al.
Published: (2026)

Unlocking the Potential of Unlabeled Data in Semi-Supervised Domain Generalization
by: Lee, Dongkwan, et al.
Published: (2025)

Mask What Matters: Controllable Text-Guided Masking for Self-Supervised Medical Image Analysis
by: Wang, Ruilang, et al.
Published: (2025)

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
by: Ahn, Sunghyun, et al.
Published: (2025)

MedM-VL: What Makes a Good Medical LVLM?
by: Shi, Yiming, et al.
Published: (2025)

Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression
by: Dastmalchi, Hamidreza, et al.
Published: (2026)