Saved in:
| Main Authors: | Zhu, Guangyang, Zhang, Jianfeng, Feng, Yuanzhi, Lan, Hai |
|---|---|
| Format: | Preprint |
| Published: |
2022
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2201.01410 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mitigating Object Hallucinations in Vision-Language Models through Region-Aware Attention Recalibration
by: Xu, Yuanzhi, et al.
Published: (2026)
by: Xu, Yuanzhi, et al.
Published: (2026)
Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning
by: Nai, Ruiqian, et al.
Published: (2024)
by: Nai, Ruiqian, et al.
Published: (2024)
HEP-NAS: Towards Efficient Few-shot Neural Architecture Search via Hierarchical Edge Partitioning
by: Li, Jianfeng, et al.
Published: (2024)
by: Li, Jianfeng, et al.
Published: (2024)
AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer
by: Shan, Jiquan, et al.
Published: (2025)
by: Shan, Jiquan, et al.
Published: (2025)
Fairness-aware Vision Transformer via Debiased Self-Attention
by: Qiang, Yao, et al.
Published: (2023)
by: Qiang, Yao, et al.
Published: (2023)
Attention Guided Alignment in Efficient Vision-Language Models
by: Mahajan, Shweta, et al.
Published: (2025)
by: Mahajan, Shweta, et al.
Published: (2025)
Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs
by: Lahoti, Aakash, et al.
Published: (2024)
by: Lahoti, Aakash, et al.
Published: (2024)
M2H: Multi-Task Learning with Efficient Window-Based Cross-Task Attention for Monocular Spatial Perception
by: Udugama, U. V. B. L, et al.
Published: (2025)
by: Udugama, U. V. B. L, et al.
Published: (2025)
Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images
by: Zhu, Qinfeng, et al.
Published: (2024)
by: Zhu, Qinfeng, et al.
Published: (2024)
SGW-based Multi-Task Learning in Vision Tasks
by: Zhang, Ruiyuan, et al.
Published: (2024)
by: Zhang, Ruiyuan, et al.
Published: (2024)
Attention Transfer Is Not Universally Effective for Vision Transformers
by: Qin, Huaiyuan, et al.
Published: (2026)
by: Qin, Huaiyuan, et al.
Published: (2026)
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
by: Xiao, Jinqi, et al.
Published: (2023)
by: Xiao, Jinqi, et al.
Published: (2023)
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference
by: Guo, Yichen, et al.
Published: (2025)
by: Guo, Yichen, et al.
Published: (2025)
Discrete Cosine Transform Based Decorrelated Attention for Vision Transformers
by: Pan, Hongyi, et al.
Published: (2024)
by: Pan, Hongyi, et al.
Published: (2024)
Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model
by: Shi, Jiang-Xin, et al.
Published: (2024)
by: Shi, Jiang-Xin, et al.
Published: (2024)
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
by: Shi, Yucheng, et al.
Published: (2025)
by: Shi, Yucheng, et al.
Published: (2025)
Convolutional Rectangular Attention Module
by: Nguyen, Hai-Vy, et al.
Published: (2025)
by: Nguyen, Hai-Vy, et al.
Published: (2025)
SmartFRZ: An Efficient Training Framework using Attention-Based Layer Freezing
by: Li, Sheng, et al.
Published: (2024)
by: Li, Sheng, et al.
Published: (2024)
Elastic Attention Cores for Scalable Vision Transformers
by: Song, Alan Z., et al.
Published: (2026)
by: Song, Alan Z., et al.
Published: (2026)
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
by: Wang, Sheng-Yu, et al.
Published: (2024)
by: Wang, Sheng-Yu, et al.
Published: (2024)
SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners
by: Liang, Feng, et al.
Published: (2022)
by: Liang, Feng, et al.
Published: (2022)
Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
by: Zhu, Yuanzhi, et al.
Published: (2025)
by: Zhu, Yuanzhi, et al.
Published: (2025)
Di$\mathtt{[M]}$O: Distilling Masked Diffusion Models into One-step Generator
by: Zhu, Yuanzhi, et al.
Published: (2025)
by: Zhu, Yuanzhi, et al.
Published: (2025)
Diffusion Reinforcement Learning via Centered Reward Distillation
by: Zhu, Yuanzhi, et al.
Published: (2026)
by: Zhu, Yuanzhi, et al.
Published: (2026)
One-step Diffusion Models with Bregman Density Ratio Matching
by: Zhu, Yuanzhi, et al.
Published: (2025)
by: Zhu, Yuanzhi, et al.
Published: (2025)
Ultrasound Vision-Language Alignment via Contrastive Learning
by: Lyu, Zhuoyang, et al.
Published: (2026)
by: Lyu, Zhuoyang, et al.
Published: (2026)
Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models
by: Vemulapalli, Raviteja, et al.
Published: (2023)
by: Vemulapalli, Raviteja, et al.
Published: (2023)
Tensor Decomposition Based Attention Module for Spiking Neural Networks
by: Deng, Haoyu, et al.
Published: (2023)
by: Deng, Haoyu, et al.
Published: (2023)
Self-Supervised Weight Templates for Scalable Vision Model Initialization
by: Xie, Yucheng, et al.
Published: (2026)
by: Xie, Yucheng, et al.
Published: (2026)
RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Chest X-ray with Zero-Shot Multi-Task Capability
by: Park, Jonggwon, et al.
Published: (2025)
by: Park, Jonggwon, et al.
Published: (2025)
Memory Efficient Neural Processes via Constant Memory Attention Block
by: Feng, Leo, et al.
Published: (2023)
by: Feng, Leo, et al.
Published: (2023)
Hierarchical Self Attention Based Autoencoder for Open-Set Human Activity Recognition
by: Tonmoy, M Tanjid Hasan, et al.
Published: (2021)
by: Tonmoy, M Tanjid Hasan, et al.
Published: (2021)
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
by: Zhu, Lianghui, et al.
Published: (2024)
by: Zhu, Lianghui, et al.
Published: (2024)
Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales
by: Qi, Shuren, et al.
Published: (2024)
by: Qi, Shuren, et al.
Published: (2024)
Online Self-Calibration Against Hallucination in Vision-Language Models
by: Chen, Minghui, et al.
Published: (2026)
by: Chen, Minghui, et al.
Published: (2026)
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning
by: Zhang, Jintao, et al.
Published: (2026)
by: Zhang, Jintao, et al.
Published: (2026)
Understanding Task Transfer in Vision-Language Models
by: Sachdeva, Bhuvan, et al.
Published: (2025)
by: Sachdeva, Bhuvan, et al.
Published: (2025)
IBMA: An Imputation-Based Mixup Augmentation Using Self-Supervised Learning for Time Series Data
by: Nguyen, Dang Nha, et al.
Published: (2025)
by: Nguyen, Dang Nha, et al.
Published: (2025)
MABViT -- Modified Attention Block Enhances Vision Transformers
by: Ramesh, Mahesh, et al.
Published: (2023)
by: Ramesh, Mahesh, et al.
Published: (2023)
Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy
by: Zhu, Qinfeng, et al.
Published: (2024)
by: Zhu, Qinfeng, et al.
Published: (2024)
Similar Items
-
Mitigating Object Hallucinations in Vision-Language Models through Region-Aware Attention Recalibration
by: Xu, Yuanzhi, et al.
Published: (2026) -
Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning
by: Nai, Ruiqian, et al.
Published: (2024) -
HEP-NAS: Towards Efficient Few-shot Neural Architecture Search via Hierarchical Edge Partitioning
by: Li, Jianfeng, et al.
Published: (2024) -
AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer
by: Shan, Jiquan, et al.
Published: (2025) -
Fairness-aware Vision Transformer via Debiased Self-Attention
by: Qiang, Yao, et al.
Published: (2023)