:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Balakrishnan, Ravikumar, Mendapara, Sanket, Garg, Ankit
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.12371
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations
by: Balakrishnan, Ravikumar, et al.
Published: (2026)

Read or Ignore? A Unified Benchmark for Typographic-Attack Robustness and Text Recognition in Vision-Language Models
by: Waseda, Futa, et al.
Published: (2025)

Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models
by: Ying, Zonghao, et al.
Published: (2026)

VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models
by: Balakrishnan, Ravikumar, et al.
Published: (2025)

VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
by: Phute, Mansi, et al.
Published: (2025)

Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model
by: Cheng, Hao, et al.
Published: (2024)

Beyond Pixels: Semantic-aware Typographic Attack for Geo-Privacy Protection
by: Zhu, Jiayi, et al.
Published: (2025)

Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models
by: Cheng, Hao, et al.
Published: (2024)

Typographic Text Generation with Off-the-Shelf Diffusion Model
by: Peong, KhayTze, et al.
Published: (2024)

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
by: Jose, Cijo, et al.
Published: (2024)

Automatic Text Box Placement for Supporting Typographic Design
by: Muraoka, Jun, et al.
Published: (2025)

Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
by: Li, Yueyan, et al.
Published: (2025)

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks
by: Qraitem, Maan, et al.
Published: (2024)

Fit Pixels, Get Labels: Meta-learned Implicit Networks for Image Segmentation
by: Vyas, Kushal, et al.
Published: (2025)

In the Era of Prompt Learning with Vision-Language Models
by: Jha, Ankit
Published: (2024)

Seeing Through Words, Speaking Through Pixels: Deep Representational Alignment Between Vision and Language Models
by: He, Zoe Wanying, et al.
Published: (2025)

Hierarchical Vision-Language Alignment for Text-to-Image Generation via Diffusion Models
by: Johnson, Emily, et al.
Published: (2025)

Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP
by: Hufe, Lorenz, et al.
Published: (2025)

SineProject: Machine Unlearning for Stable Vision Language Alignment
by: Garg, Arpit, et al.
Published: (2025)

A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning
by: Chen, Tianle, et al.
Published: (2026)

OG-VLA: Orthographic Image Generation for 3D-Aware Vision-Language Action Model
by: Singh, Ishika, et al.
Published: (2025)

SGHA-Attack: Semantic-Guided Hierarchical Alignment for Transferable Targeted Attacks on Vision-Language Models
by: Wang, Haobo, et al.
Published: (2026)

SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
by: Cao, Yue, et al.
Published: (2024)

Mirage: Unveiling Hidden Artifacts in Synthetic Images with Large Vision-Language Models
by: Sharma, Pranav, et al.
Published: (2025)

Revisiting Vision Language Foundations for No-Reference Image Quality Assessment
by: Yadav, Ankit, et al.
Published: (2025)

PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
by: Liang, Wenqi, et al.
Published: (2025)

PiTe: Pixel-Temporal Alignment for Large Video-Language Model
by: Liu, Yang, et al.
Published: (2024)

Language-Image Alignment with Fixed Text Encoders
by: Yang, Jingfeng, et al.
Published: (2025)

PDA: Text-Augmented Defense Framework for Robust Vision-Language Models against Adversarial Image Attacks
by: Xu, Jingning, et al.
Published: (2026)

Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment
by: Liu, Yang, et al.
Published: (2025)

Image Recognition with Vision and Language Embeddings of VLMs
by: Volkov, Illia, et al.
Published: (2025)

Pixel Is Not a Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models
by: Shih, Chun-Yen, et al.
Published: (2024)

SPOOF: Simple Pixel Operations for Out-of-Distribution Fooling
by: Gupta, Ankit, et al.
Published: (2025)

FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation
by: Li, Bingyu, et al.
Published: (2025)

Embedding Textual Information in Images Using Quinary Pixel Combinations
by: Kandala, A V Uday Kiran
Published: (2026)

Detecting Text Manipulation in Images using Vision Language Models
by: Vidit, Vidit, et al.
Published: (2025)

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
by: Liu, Zhiheng, et al.
Published: (2026)

Goal2Pixel: Grounding Goals to Pixels for Vision-Language Navigation
by: Bao, Muyi, et al.
Published: (2026)

Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models
by: Cheng, Hao, et al.
Published: (2025)

Reading Between the Lanes: Text VideoQA on the Road
by: Tom, George, et al.
Published: (2023)