Saved in:
| Main Authors: | Han, Cheng, Wang, Qifan, Cui, Yiming, Wang, Wenguan, Huang, Lifu, Qi, Siyuan, Liu, Dongfang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.12902 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Visual Fourier Prompt Tuning
by: Zeng, Runjia, et al.
Published: (2024)
by: Zeng, Runjia, et al.
Published: (2024)
SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving
by: Cui, Yiming, et al.
Published: (2024)
by: Cui, Yiming, et al.
Published: (2024)
Multimodal Instruction Tuning with Conditional Mixture of LoRA
by: Shen, Ying, et al.
Published: (2024)
by: Shen, Ying, et al.
Published: (2024)
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
by: Qi, Jingyuan, et al.
Published: (2025)
by: Qi, Jingyuan, et al.
Published: (2025)
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
by: Xu, Zhiyang, et al.
Published: (2024)
by: Xu, Zhiyang, et al.
Published: (2024)
AMD: Automatic Multi-step Distillation of Large-scale Vision Models
by: Han, Cheng, et al.
Published: (2024)
by: Han, Cheng, et al.
Published: (2024)
Self-supervised Adversarial Training of Monocular Depth Estimation against Physical-World Attacks
by: Cheng, Zhiyuan, et al.
Published: (2024)
by: Cheng, Zhiyuan, et al.
Published: (2024)
ProMotion: Prototypes As Motion Learners
by: Lu, Yawen, et al.
Published: (2024)
by: Lu, Yawen, et al.
Published: (2024)
Image Translation as Diffusion Visual Programmers
by: Han, Cheng, et al.
Published: (2024)
by: Han, Cheng, et al.
Published: (2024)
SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting
by: Zhao, Yiming, et al.
Published: (2025)
by: Zhao, Yiming, et al.
Published: (2025)
Unbiased Object Detection Beyond Frequency with Visually Prompted Image Synthesis
by: Cai, Xinhao, et al.
Published: (2025)
by: Cai, Xinhao, et al.
Published: (2025)
A-SelecT: Automatic Timestep Selection for Diffusion Transformer Representation Learning
by: Liu, Changyu, et al.
Published: (2026)
by: Liu, Changyu, et al.
Published: (2026)
Modality-Specialized Synergizers for Interleaved Vision-Language Generalists
by: Xu, Zhiyang, et al.
Published: (2024)
by: Xu, Zhiyang, et al.
Published: (2024)
Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning
by: Liu, Ajian, et al.
Published: (2025)
by: Liu, Ajian, et al.
Published: (2025)
Neural Clustering based Visual Representation Learning
by: Chen, Guikun, et al.
Published: (2024)
by: Chen, Guikun, et al.
Published: (2024)
Revisiting the Power of Prompt for Visual Tuning
by: Wang, Yuzhu, et al.
Published: (2024)
by: Wang, Yuzhu, et al.
Published: (2024)
Exploring Interpretability for Visual Prompt Tuning with Cross-layer Concepts
by: Wang, Yubin, et al.
Published: (2025)
by: Wang, Yubin, et al.
Published: (2025)
Attention to the Burstiness in Visual Prompt Tuning!
by: Wang, Yuzhu, et al.
Published: (2025)
by: Wang, Yuzhu, et al.
Published: (2025)
Visual Variational Autoencoder Prompt Tuning
by: Xiao, Xi, et al.
Published: (2025)
by: Xiao, Xi, et al.
Published: (2025)
SinkTrack: Attention Sink based Context Anchoring for Large Language Models
by: Liu, Xu, et al.
Published: (2026)
by: Liu, Xu, et al.
Published: (2026)
3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation
by: Gao, Jianzhe, et al.
Published: (2026)
by: Gao, Jianzhe, et al.
Published: (2026)
Volumetric Environment Representation for Vision-Language Navigation
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
Vision-Language Navigation with Energy-Based Policy
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
CVPT: Cross Visual Prompt Tuning
by: Huang, Lingyun, et al.
Published: (2024)
by: Huang, Lingyun, et al.
Published: (2024)
Visual Spatial Tuning
by: Yang, Rui, et al.
Published: (2025)
by: Yang, Rui, et al.
Published: (2025)
Inference Compute-Optimal Video Vision Language Models
by: Wang, Peiqi, et al.
Published: (2025)
by: Wang, Peiqi, et al.
Published: (2025)
PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation
by: Jing, Zonglei, et al.
Published: (2025)
by: Jing, Zonglei, et al.
Published: (2025)
Visual Prompt Tuning in Null Space for Continual Learning
by: Lu, Yue, et al.
Published: (2024)
by: Lu, Yue, et al.
Published: (2024)
Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
by: Jiang, Fangling, et al.
Published: (2025)
by: Jiang, Fangling, et al.
Published: (2025)
FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
by: Xu, Chao, et al.
Published: (2024)
by: Xu, Chao, et al.
Published: (2024)
Visual Knowledge in the Big Model Era: Retrospect and Prospect
by: Wang, Wenguan, et al.
Published: (2024)
by: Wang, Wenguan, et al.
Published: (2024)
Radiance Field Learners As UAV First-Person Viewers
by: Yan, Liqi, et al.
Published: (2024)
by: Yan, Liqi, et al.
Published: (2024)
IDRetracor: Towards Visual Forensics Against Malicious Face Swapping
by: Cheng, Jikang, et al.
Published: (2024)
by: Cheng, Jikang, et al.
Published: (2024)
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
by: Wang, Haibo, et al.
Published: (2024)
by: Wang, Haibo, et al.
Published: (2024)
Visual Instance-aware Prompt Tuning
by: Xiao, Xi, et al.
Published: (2025)
by: Xiao, Xi, et al.
Published: (2025)
Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models
by: Li, Liulei, et al.
Published: (2024)
by: Li, Liulei, et al.
Published: (2024)
Learning Human-Object Interaction as Groups
by: Hong, Jiajun, et al.
Published: (2025)
by: Hong, Jiajun, et al.
Published: (2025)
Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition
by: Li, Mengke, et al.
Published: (2024)
by: Li, Mengke, et al.
Published: (2024)
Navigation Instruction Generation with BEV Perception and Large Language Models
by: Fan, Sheng, et al.
Published: (2024)
by: Fan, Sheng, et al.
Published: (2024)
Learning to See the Elephant in the Room: Self-Supervised Context Reasoning in Humans and AI
by: Liu, Xiao, et al.
Published: (2022)
by: Liu, Xiao, et al.
Published: (2022)
Similar Items
-
Visual Fourier Prompt Tuning
by: Zeng, Runjia, et al.
Published: (2024) -
SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving
by: Cui, Yiming, et al.
Published: (2024) -
Multimodal Instruction Tuning with Conditional Mixture of LoRA
by: Shen, Ying, et al.
Published: (2024) -
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
by: Qi, Jingyuan, et al.
Published: (2025) -
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
by: Xu, Zhiyang, et al.
Published: (2024)