:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Han, Cheng, Wang, Qifan, Cui, Yiming, Wang, Wenguan, Huang, Lifu, Qi, Siyuan, Liu, Dongfang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2401.12902
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Visual Fourier Prompt Tuning
by: Zeng, Runjia, et al.
Published: (2024)

SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving
by: Cui, Yiming, et al.
Published: (2024)

Multimodal Instruction Tuning with Conditional Mixture of LoRA
by: Shen, Ying, et al.
Published: (2024)

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
by: Qi, Jingyuan, et al.
Published: (2025)

Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
by: Xu, Zhiyang, et al.
Published: (2024)

AMD: Automatic Multi-step Distillation of Large-scale Vision Models
by: Han, Cheng, et al.
Published: (2024)

Self-supervised Adversarial Training of Monocular Depth Estimation against Physical-World Attacks
by: Cheng, Zhiyuan, et al.
Published: (2024)

ProMotion: Prototypes As Motion Learners
by: Lu, Yawen, et al.
Published: (2024)

Image Translation as Diffusion Visual Programmers
by: Han, Cheng, et al.
Published: (2024)

SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting
by: Zhao, Yiming, et al.
Published: (2025)

Unbiased Object Detection Beyond Frequency with Visually Prompted Image Synthesis
by: Cai, Xinhao, et al.
Published: (2025)

A-SelecT: Automatic Timestep Selection for Diffusion Transformer Representation Learning
by: Liu, Changyu, et al.
Published: (2026)

Modality-Specialized Synergizers for Interleaved Vision-Language Generalists
by: Xu, Zhiyang, et al.
Published: (2024)

Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning
by: Liu, Ajian, et al.
Published: (2025)

Neural Clustering based Visual Representation Learning
by: Chen, Guikun, et al.
Published: (2024)

Revisiting the Power of Prompt for Visual Tuning
by: Wang, Yuzhu, et al.
Published: (2024)

Exploring Interpretability for Visual Prompt Tuning with Cross-layer Concepts
by: Wang, Yubin, et al.
Published: (2025)

Attention to the Burstiness in Visual Prompt Tuning!
by: Wang, Yuzhu, et al.
Published: (2025)

Visual Variational Autoencoder Prompt Tuning
by: Xiao, Xi, et al.
Published: (2025)

SinkTrack: Attention Sink based Context Anchoring for Large Language Models
by: Liu, Xu, et al.
Published: (2026)

3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation
by: Gao, Jianzhe, et al.
Published: (2026)

Volumetric Environment Representation for Vision-Language Navigation
by: Liu, Rui, et al.
Published: (2024)

Vision-Language Navigation with Energy-Based Policy
by: Liu, Rui, et al.
Published: (2024)

CVPT: Cross Visual Prompt Tuning
by: Huang, Lingyun, et al.
Published: (2024)

Visual Spatial Tuning
by: Yang, Rui, et al.
Published: (2025)

Inference Compute-Optimal Video Vision Language Models
by: Wang, Peiqi, et al.
Published: (2025)

PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation
by: Jing, Zonglei, et al.
Published: (2025)

Visual Prompt Tuning in Null Space for Continual Learning
by: Lu, Yue, et al.
Published: (2024)

Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
by: Jiang, Fangling, et al.
Published: (2025)

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
by: Xu, Chao, et al.
Published: (2024)

Visual Knowledge in the Big Model Era: Retrospect and Prospect
by: Wang, Wenguan, et al.
Published: (2024)

Radiance Field Learners As UAV First-Person Viewers
by: Yan, Liqi, et al.
Published: (2024)

IDRetracor: Towards Visual Forensics Against Malicious Face Swapping
by: Cheng, Jikang, et al.
Published: (2024)

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
by: Wang, Haibo, et al.
Published: (2024)

Visual Instance-aware Prompt Tuning
by: Xiao, Xi, et al.
Published: (2025)

Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models
by: Li, Liulei, et al.
Published: (2024)

Learning Human-Object Interaction as Groups
by: Hong, Jiajun, et al.
Published: (2025)

Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition
by: Li, Mengke, et al.
Published: (2024)

Navigation Instruction Generation with BEV Perception and Large Language Models
by: Fan, Sheng, et al.
Published: (2024)

Learning to See the Elephant in the Room: Self-Supervised Context Reasoning in Humans and AI
by: Liu, Xiao, et al.
Published: (2022)