Saved in:
| Main Authors: | Heo, Seongsoo, Choi, Dong-Wan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.23235 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models
by: Lu, Hui, et al.
Published: (2025)
by: Lu, Hui, et al.
Published: (2025)
Improved Ear Verification with Vision Transformers and Overlapping Patches
by: Arun, Deeksha, et al.
Published: (2025)
by: Arun, Deeksha, et al.
Published: (2025)
Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications
by: Hu, Zixuan, et al.
Published: (2025)
by: Hu, Zixuan, et al.
Published: (2025)
Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model
by: Heo, Jaehyuk, et al.
Published: (2024)
by: Heo, Jaehyuk, et al.
Published: (2024)
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers
by: Chowdhury, Md Abtahi Majeed, et al.
Published: (2025)
by: Chowdhury, Md Abtahi Majeed, et al.
Published: (2025)
Accelerating Vision Transformers with Adaptive Patch Sizes
by: Choudhury, Rohan, et al.
Published: (2025)
by: Choudhury, Rohan, et al.
Published: (2025)
Frequency-Aware Token Reduction for Efficient Vision Transformer
by: Lee, Dong-Jae, et al.
Published: (2025)
by: Lee, Dong-Jae, et al.
Published: (2025)
Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models
by: Kong, Dehong, et al.
Published: (2024)
by: Kong, Dehong, et al.
Published: (2024)
Understanding Transformer-based Vision Models through Inversion
by: Rathjens, Jan, et al.
Published: (2024)
by: Rathjens, Jan, et al.
Published: (2024)
Towards Difficulty-Agnostic Efficient Transfer Learning for Vision-Language Models
by: Yang, Yongjin, et al.
Published: (2023)
by: Yang, Yongjin, et al.
Published: (2023)
FastTrackTr:Towards Fast Multi-Object Tracking with Transformers
by: Liao, Pan, et al.
Published: (2024)
by: Liao, Pan, et al.
Published: (2024)
SpecVLM: Fast Speculative Decoding in Vision-Language Models
by: Huang, Haiduo, et al.
Published: (2025)
by: Huang, Haiduo, et al.
Published: (2025)
Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion
by: Shen, Hui, et al.
Published: (2024)
by: Shen, Hui, et al.
Published: (2024)
REOrdering Patches Improves Vision Models
by: Kutscher, Declan, et al.
Published: (2025)
by: Kutscher, Declan, et al.
Published: (2025)
Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers
by: Zheng, Weijie, et al.
Published: (2024)
by: Zheng, Weijie, et al.
Published: (2024)
Cross-Model Transferability of Adversarial Patches in Real-time Segmentation for Autonomous Driving
by: Shekhar, Prashant, et al.
Published: (2025)
by: Shekhar, Prashant, et al.
Published: (2025)
Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
by: Kim, Sanghyun, et al.
Published: (2024)
by: Kim, Sanghyun, et al.
Published: (2024)
Recall-Oriented Continual Learning with Generative Adversarial Meta-Model
by: Kang, Haneol, et al.
Published: (2024)
by: Kang, Haneol, et al.
Published: (2024)
Memory-Efficient Fine-Tuning Diffusion Transformers via Dynamic Patch Sampling and Block Skipping
by: Park, Sunghyun, et al.
Published: (2026)
by: Park, Sunghyun, et al.
Published: (2026)
Towards Visual Text Design Transfer Across Languages
by: Choi, Yejin, et al.
Published: (2024)
by: Choi, Yejin, et al.
Published: (2024)
Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models
by: Samin, Niamul Hassan, et al.
Published: (2026)
by: Samin, Niamul Hassan, et al.
Published: (2026)
Compressing Vision Transformers in Geospatial Transfer Learning with Manifold-Constrained Optimization
by: Snyder, Thomas, et al.
Published: (2026)
by: Snyder, Thomas, et al.
Published: (2026)
Digital-to-Physical Transfer of Adversarial Patches for Aerial Vehicle Detection
by: Woo, Jung Heum, et al.
Published: (2026)
by: Woo, Jung Heum, et al.
Published: (2026)
MANGO: A Global Single-Date Paired Dataset for Mangrove Segmentation
by: Heo, Junhyuk, et al.
Published: (2026)
by: Heo, Junhyuk, et al.
Published: (2026)
Rethinking Electro-Optical Vision Foundation Models for Remote Sensing Retrieval: A Controlled Comparison with Generalist VFM
by: Park, Hyobin, et al.
Published: (2026)
by: Park, Hyobin, et al.
Published: (2026)
Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models
by: Zhang, Naifu, et al.
Published: (2025)
by: Zhang, Naifu, et al.
Published: (2025)
Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
by: Cui, Kaiyuan, et al.
Published: (2026)
by: Cui, Kaiyuan, et al.
Published: (2026)
RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models
by: Woo, Sangmin, et al.
Published: (2024)
by: Woo, Sangmin, et al.
Published: (2024)
DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers
by: Kim, Dahye, et al.
Published: (2026)
by: Kim, Dahye, et al.
Published: (2026)
A Contrastive Learning Scheme with Transformer Innate Patches
by: Jyhne, Sander Riisøen, et al.
Published: (2023)
by: Jyhne, Sander Riisøen, et al.
Published: (2023)
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
by: Zhou, Shijie, et al.
Published: (2025)
by: Zhou, Shijie, et al.
Published: (2025)
Understanding the Transfer Limits of Vision Foundation Models
by: Huang, Shiqi, et al.
Published: (2026)
by: Huang, Shiqi, et al.
Published: (2026)
Towards Robust Vision Transformer via Masked Adaptive Ensemble
by: Lin, Fudong, et al.
Published: (2024)
by: Lin, Fudong, et al.
Published: (2024)
CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models
by: Xiu, Kedong, et al.
Published: (2025)
by: Xiu, Kedong, et al.
Published: (2025)
PatchTraj: Unified Time-Frequency Representation Learning via Dynamic Patches for Trajectory Prediction
by: Liu, Yanghong, et al.
Published: (2025)
by: Liu, Yanghong, et al.
Published: (2025)
Temporal Inversion for Learning Interval Change in Chest X-Rays
by: Ko, Hanbin, et al.
Published: (2026)
by: Ko, Hanbin, et al.
Published: (2026)
Zoom-shot: Fast and Efficient Unsupervised Zero-Shot Transfer of CLIP to Vision Encoders with Multimodal Loss
by: Shipard, Jordan, et al.
Published: (2024)
by: Shipard, Jordan, et al.
Published: (2024)
SiNGER: A Clearer Voice Distills Vision Transformers Further
by: Yu, Geunhyeok, et al.
Published: (2025)
by: Yu, Geunhyeok, et al.
Published: (2025)
Editable Noise Map Inversion: Encoding Target-image into Noise For High-Fidelity Image Manipulation
by: Kang, Mingyu, et al.
Published: (2025)
by: Kang, Mingyu, et al.
Published: (2025)
Disentangling Visual Transformers: Patch-level Interpretability for Image Classification
by: Jeanneret, Guillaume, et al.
Published: (2025)
by: Jeanneret, Guillaume, et al.
Published: (2025)
Similar Items
-
When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models
by: Lu, Hui, et al.
Published: (2025) -
Improved Ear Verification with Vision Transformers and Overlapping Patches
by: Arun, Deeksha, et al.
Published: (2025) -
Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications
by: Hu, Zixuan, et al.
Published: (2025) -
Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model
by: Heo, Jaehyuk, et al.
Published: (2024) -
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers
by: Chowdhury, Md Abtahi Majeed, et al.
Published: (2025)