Saved in:
| Main Authors: | Balakrishnan, Ravikumar, Mendapara, Sanket, Garg, Ankit |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.12371 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations
by: Balakrishnan, Ravikumar, et al.
Published: (2026)
by: Balakrishnan, Ravikumar, et al.
Published: (2026)
Read or Ignore? A Unified Benchmark for Typographic-Attack Robustness and Text Recognition in Vision-Language Models
by: Waseda, Futa, et al.
Published: (2025)
by: Waseda, Futa, et al.
Published: (2025)
Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models
by: Ying, Zonghao, et al.
Published: (2026)
by: Ying, Zonghao, et al.
Published: (2026)
VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models
by: Balakrishnan, Ravikumar, et al.
Published: (2025)
by: Balakrishnan, Ravikumar, et al.
Published: (2025)
VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
by: Phute, Mansi, et al.
Published: (2025)
by: Phute, Mansi, et al.
Published: (2025)
Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model
by: Cheng, Hao, et al.
Published: (2024)
by: Cheng, Hao, et al.
Published: (2024)
Beyond Pixels: Semantic-aware Typographic Attack for Geo-Privacy Protection
by: Zhu, Jiayi, et al.
Published: (2025)
by: Zhu, Jiayi, et al.
Published: (2025)
Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models
by: Cheng, Hao, et al.
Published: (2024)
by: Cheng, Hao, et al.
Published: (2024)
Typographic Text Generation with Off-the-Shelf Diffusion Model
by: Peong, KhayTze, et al.
Published: (2024)
by: Peong, KhayTze, et al.
Published: (2024)
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
by: Jose, Cijo, et al.
Published: (2024)
by: Jose, Cijo, et al.
Published: (2024)
Automatic Text Box Placement for Supporting Typographic Design
by: Muraoka, Jun, et al.
Published: (2025)
by: Muraoka, Jun, et al.
Published: (2025)
Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
by: Li, Yueyan, et al.
Published: (2025)
by: Li, Yueyan, et al.
Published: (2025)
Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks
by: Qraitem, Maan, et al.
Published: (2024)
by: Qraitem, Maan, et al.
Published: (2024)
Fit Pixels, Get Labels: Meta-learned Implicit Networks for Image Segmentation
by: Vyas, Kushal, et al.
Published: (2025)
by: Vyas, Kushal, et al.
Published: (2025)
In the Era of Prompt Learning with Vision-Language Models
by: Jha, Ankit
Published: (2024)
by: Jha, Ankit
Published: (2024)
Seeing Through Words, Speaking Through Pixels: Deep Representational Alignment Between Vision and Language Models
by: He, Zoe Wanying, et al.
Published: (2025)
by: He, Zoe Wanying, et al.
Published: (2025)
Hierarchical Vision-Language Alignment for Text-to-Image Generation via Diffusion Models
by: Johnson, Emily, et al.
Published: (2025)
by: Johnson, Emily, et al.
Published: (2025)
Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP
by: Hufe, Lorenz, et al.
Published: (2025)
by: Hufe, Lorenz, et al.
Published: (2025)
SineProject: Machine Unlearning for Stable Vision Language Alignment
by: Garg, Arpit, et al.
Published: (2025)
by: Garg, Arpit, et al.
Published: (2025)
A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning
by: Chen, Tianle, et al.
Published: (2026)
by: Chen, Tianle, et al.
Published: (2026)
OG-VLA: Orthographic Image Generation for 3D-Aware Vision-Language Action Model
by: Singh, Ishika, et al.
Published: (2025)
by: Singh, Ishika, et al.
Published: (2025)
SGHA-Attack: Semantic-Guided Hierarchical Alignment for Transferable Targeted Attacks on Vision-Language Models
by: Wang, Haobo, et al.
Published: (2026)
by: Wang, Haobo, et al.
Published: (2026)
SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
by: Cao, Yue, et al.
Published: (2024)
by: Cao, Yue, et al.
Published: (2024)
Mirage: Unveiling Hidden Artifacts in Synthetic Images with Large Vision-Language Models
by: Sharma, Pranav, et al.
Published: (2025)
by: Sharma, Pranav, et al.
Published: (2025)
Revisiting Vision Language Foundations for No-Reference Image Quality Assessment
by: Yadav, Ankit, et al.
Published: (2025)
by: Yadav, Ankit, et al.
Published: (2025)
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
by: Liang, Wenqi, et al.
Published: (2025)
by: Liang, Wenqi, et al.
Published: (2025)
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
by: Liu, Yang, et al.
Published: (2024)
by: Liu, Yang, et al.
Published: (2024)
Language-Image Alignment with Fixed Text Encoders
by: Yang, Jingfeng, et al.
Published: (2025)
by: Yang, Jingfeng, et al.
Published: (2025)
PDA: Text-Augmented Defense Framework for Robust Vision-Language Models against Adversarial Image Attacks
by: Xu, Jingning, et al.
Published: (2026)
by: Xu, Jingning, et al.
Published: (2026)
Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment
by: Liu, Yang, et al.
Published: (2025)
by: Liu, Yang, et al.
Published: (2025)
Image Recognition with Vision and Language Embeddings of VLMs
by: Volkov, Illia, et al.
Published: (2025)
by: Volkov, Illia, et al.
Published: (2025)
Pixel Is Not a Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models
by: Shih, Chun-Yen, et al.
Published: (2024)
by: Shih, Chun-Yen, et al.
Published: (2024)
SPOOF: Simple Pixel Operations for Out-of-Distribution Fooling
by: Gupta, Ankit, et al.
Published: (2025)
by: Gupta, Ankit, et al.
Published: (2025)
FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation
by: Li, Bingyu, et al.
Published: (2025)
by: Li, Bingyu, et al.
Published: (2025)
Embedding Textual Information in Images Using Quinary Pixel Combinations
by: Kandala, A V Uday Kiran
Published: (2026)
by: Kandala, A V Uday Kiran
Published: (2026)
Detecting Text Manipulation in Images using Vision Language Models
by: Vidit, Vidit, et al.
Published: (2025)
by: Vidit, Vidit, et al.
Published: (2025)
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
by: Liu, Zhiheng, et al.
Published: (2026)
by: Liu, Zhiheng, et al.
Published: (2026)
Goal2Pixel: Grounding Goals to Pixels for Vision-Language Navigation
by: Bao, Muyi, et al.
Published: (2026)
by: Bao, Muyi, et al.
Published: (2026)
Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models
by: Cheng, Hao, et al.
Published: (2025)
by: Cheng, Hao, et al.
Published: (2025)
Reading Between the Lanes: Text VideoQA on the Road
by: Tom, George, et al.
Published: (2023)
by: Tom, George, et al.
Published: (2023)
Similar Items
-
One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations
by: Balakrishnan, Ravikumar, et al.
Published: (2026) -
Read or Ignore? A Unified Benchmark for Typographic-Attack Robustness and Text Recognition in Vision-Language Models
by: Waseda, Futa, et al.
Published: (2025) -
Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models
by: Ying, Zonghao, et al.
Published: (2026) -
VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models
by: Balakrishnan, Ravikumar, et al.
Published: (2025) -
VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
by: Phute, Mansi, et al.
Published: (2025)