Saved in:
| Main Authors: | Garcia, Fernando Gabriela, Burns, Spencer, Shaw, Ryan, Young, Hunter |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.04151 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LVLM-Composer's Explicit Planning for Image Generation
by: Ramsey, Spencer, et al.
Published: (2025)
by: Ramsey, Spencer, et al.
Published: (2025)
Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation
by: Hwang, Yerin, et al.
Published: (2025)
by: Hwang, Yerin, et al.
Published: (2025)
ContextGuard-LVLM: Enhancing News Veracity through Fine-grained Cross-modal Contextual Consistency Verification
by: Ma, Sihan, et al.
Published: (2025)
by: Ma, Sihan, et al.
Published: (2025)
LVLM-Aware Multimodal Retrieval for RAG-Based Medical Diagnosis with General-Purpose Models
by: Mazor, Nir, et al.
Published: (2025)
by: Mazor, Nir, et al.
Published: (2025)
Kestrel: Grounding Self-Refinement for LVLM Hallucination Mitigation
by: Mao, Jiawei, et al.
Published: (2026)
by: Mao, Jiawei, et al.
Published: (2026)
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models
by: Stan, Gabriela Ben Melech, et al.
Published: (2024)
by: Stan, Gabriela Ben Melech, et al.
Published: (2024)
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
by: Zhu, Lanyun, et al.
Published: (2025)
by: Zhu, Lanyun, et al.
Published: (2025)
LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection
by: Liu, Qingyuan, et al.
Published: (2025)
by: Liu, Qingyuan, et al.
Published: (2025)
A Generic Self-Supervised Framework of Learning Invariant Discriminative Features
by: Ntelemis, Foivos, et al.
Published: (2022)
by: Ntelemis, Foivos, et al.
Published: (2022)
An LLM-LVLM Driven Agent for Iterative and Fine-Grained Image Editing
by: Liang, Zihan, et al.
Published: (2025)
by: Liang, Zihan, et al.
Published: (2025)
Beyond Generation: Unlocking Universal Editing via Self-Supervised Fine-Tuning
by: Chen, Harold Haodong, et al.
Published: (2024)
by: Chen, Harold Haodong, et al.
Published: (2024)
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
by: Garg, Roopal, et al.
Published: (2024)
by: Garg, Roopal, et al.
Published: (2024)
Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance
by: Hur, Jiwan, et al.
Published: (2024)
by: Hur, Jiwan, et al.
Published: (2024)
VAUQ: Vision-Aware Uncertainty Quantification for LVLM Self-Evaluation
by: Park, Seongheon, et al.
Published: (2026)
by: Park, Seongheon, et al.
Published: (2026)
Language-Unlocked ViT (LUViT): Empowering Self-Supervised Vision Transformers with LLMs
by: Kuzucu, Selim, et al.
Published: (2025)
by: Kuzucu, Selim, et al.
Published: (2025)
PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking
by: Zou, Quanchen, et al.
Published: (2025)
by: Zou, Quanchen, et al.
Published: (2025)
AdaIAT: Adaptively Increasing Attention to Generated Text to Alleviate Hallucinations in LVLM
by: Zhong, Li'an, et al.
Published: (2026)
by: Zhong, Li'an, et al.
Published: (2026)
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
by: Chu, Xiangxiang, et al.
Published: (2025)
by: Chu, Xiangxiang, et al.
Published: (2025)
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
by: Dalva, Yusuf, et al.
Published: (2025)
by: Dalva, Yusuf, et al.
Published: (2025)
Attention-Based Chaotic Self-Supervision for Medical Image Classification
by: Florindo, Joao Batista, et al.
Published: (2026)
by: Florindo, Joao Batista, et al.
Published: (2026)
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning
by: Mur-Labadia, Lorenzo, et al.
Published: (2026)
by: Mur-Labadia, Lorenzo, et al.
Published: (2026)
ID-Selection: Importance-Diversity Based Visual Token Selection for Efficient LVLM Inference
by: Huang, Zhaohong, et al.
Published: (2026)
by: Huang, Zhaohong, et al.
Published: (2026)
Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models
by: Park, NaHyeon, et al.
Published: (2025)
by: Park, NaHyeon, et al.
Published: (2025)
Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation
by: Kim, Young Kyung, et al.
Published: (2025)
by: Kim, Young Kyung, et al.
Published: (2025)
SignRep: Enhancing Self-Supervised Sign Representations
by: Wong, Ryan, et al.
Published: (2025)
by: Wong, Ryan, et al.
Published: (2025)
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification
by: Marks, Markus, et al.
Published: (2024)
by: Marks, Markus, et al.
Published: (2024)
GenSelfDiff-HIS: Generative Self-Supervision Using Diffusion for Histopathological Image Segmentation
by: Purma, Vishnuvardhan, et al.
Published: (2023)
by: Purma, Vishnuvardhan, et al.
Published: (2023)
Multi-Level LVLM Guidance for Untrimmed Video Action Recognition
by: Peng, Liyang, et al.
Published: (2025)
by: Peng, Liyang, et al.
Published: (2025)
EmoFeedback$^2$: Reinforcement of Continuous Emotional Image Generation via LVLM-based Reward and Textual Feedback
by: Jia, Jingyang, et al.
Published: (2025)
by: Jia, Jingyang, et al.
Published: (2025)
Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective
by: Zhong, Nan, et al.
Published: (2025)
by: Zhong, Nan, et al.
Published: (2025)
SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation
by: Wu, Zhenbei, et al.
Published: (2024)
by: Wu, Zhenbei, et al.
Published: (2024)
Measuring and Controlling the Spectral Bias for Self-Supervised Image Denoising
by: Zhang, Wang, et al.
Published: (2025)
by: Zhang, Wang, et al.
Published: (2025)
Before Forgetting, Learn to Remember: Revisiting Foundational Learning Failures in LVLM Unlearning Benchmarks
by: Kwon, JuneHyoung, et al.
Published: (2026)
by: Kwon, JuneHyoung, et al.
Published: (2026)
Unlocking Generalization in Polyp Segmentation with DINO Self-Attention "keys"
by: Monteiro, Carla, et al.
Published: (2025)
by: Monteiro, Carla, et al.
Published: (2025)
Self-Supervised Learning of Plant Image Representations
by: Moummad, Ilyass, et al.
Published: (2026)
by: Moummad, Ilyass, et al.
Published: (2026)
Unlocking the Potential of Unlabeled Data in Semi-Supervised Domain Generalization
by: Lee, Dongkwan, et al.
Published: (2025)
by: Lee, Dongkwan, et al.
Published: (2025)
Mask What Matters: Controllable Text-Guided Masking for Self-Supervised Medical Image Analysis
by: Wang, Ruilang, et al.
Published: (2025)
by: Wang, Ruilang, et al.
Published: (2025)
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
by: Ahn, Sunghyun, et al.
Published: (2025)
by: Ahn, Sunghyun, et al.
Published: (2025)
MedM-VL: What Makes a Good Medical LVLM?
by: Shi, Yiming, et al.
Published: (2025)
by: Shi, Yiming, et al.
Published: (2025)
Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression
by: Dastmalchi, Hamidreza, et al.
Published: (2026)
by: Dastmalchi, Hamidreza, et al.
Published: (2026)
Similar Items
-
LVLM-Composer's Explicit Planning for Image Generation
by: Ramsey, Spencer, et al.
Published: (2025) -
Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation
by: Hwang, Yerin, et al.
Published: (2025) -
ContextGuard-LVLM: Enhancing News Veracity through Fine-grained Cross-modal Contextual Consistency Verification
by: Ma, Sihan, et al.
Published: (2025) -
LVLM-Aware Multimodal Retrieval for RAG-Based Medical Diagnosis with General-Purpose Models
by: Mazor, Nir, et al.
Published: (2025) -
Kestrel: Grounding Self-Refinement for LVLM Hallucination Mitigation
by: Mao, Jiawei, et al.
Published: (2026)