Saved in:
| Main Authors: | Xu, Chenhao, Li, Chang-Tsun, Lim, Chee Peng, Creighton, Douglas |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.05196 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Block-based Symmetric Pruning and Fusion for Efficient Vision Transformers
by: Hsieh, Yi-Kuan, et al.
Published: (2025)
by: Hsieh, Yi-Kuan, et al.
Published: (2025)
Cross-Age Contrastive Learning for Age-Invariant Face Recognition
by: Wang, Haoyi, et al.
Published: (2023)
by: Wang, Haoyi, et al.
Published: (2023)
SCHEME: Scalable Channel Mixer for Vision Transformers
by: Sridhar, Deepak, et al.
Published: (2023)
by: Sridhar, Deepak, et al.
Published: (2023)
From Age Estimation to Age-Invariant Face Recognition: Generalized Age Feature Extraction Using Order-Enhanced Contrastive Learning
by: Wang, Haoyi, et al.
Published: (2025)
by: Wang, Haoyi, et al.
Published: (2025)
Selective State Space Memory for Large Vision-Language Models
by: Ng, Chee, et al.
Published: (2024)
by: Ng, Chee, et al.
Published: (2024)
SOLO: A Single Transformer for Scalable Vision-Language Modeling
by: Chen, Yangyi, et al.
Published: (2024)
by: Chen, Yangyi, et al.
Published: (2024)
Multi-Tailed Vision Transformer for Efficient Inference
by: Wang, Yunke, et al.
Published: (2022)
by: Wang, Yunke, et al.
Published: (2022)
Revisiting Shape from Polarization in the Era of Vision Foundation Models
by: Li, Chenhao, et al.
Published: (2026)
by: Li, Chenhao, et al.
Published: (2026)
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
by: Lei, Weixian, et al.
Published: (2025)
by: Lei, Weixian, et al.
Published: (2025)
TransParking: A Dual-Decoder Transformer Framework with Soft Localization for End-to-End Automatic Parking
by: Du, Hangyu, et al.
Published: (2025)
by: Du, Hangyu, et al.
Published: (2025)
Partial Ring Scan: Revisiting Scan Order in Vision State Space Models
by: Hsieh, Yi-Kuan, et al.
Published: (2026)
by: Hsieh, Yi-Kuan, et al.
Published: (2026)
Elastic Attention Cores for Scalable Vision Transformers
by: Song, Alan Z., et al.
Published: (2026)
by: Song, Alan Z., et al.
Published: (2026)
X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning
by: Ng, Chee, et al.
Published: (2025)
by: Ng, Chee, et al.
Published: (2025)
VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering
by: Lim, Qi Zhi, et al.
Published: (2025)
by: Lim, Qi Zhi, et al.
Published: (2025)
VistaFormer: Scalable Vision Transformers for Satellite Image Time Series Segmentation
by: MacDonald, Ezra, et al.
Published: (2024)
by: MacDonald, Ezra, et al.
Published: (2024)
Iterated Learning Improves Compositionality in Large Vision-Language Models
by: Zheng, Chenhao, et al.
Published: (2024)
by: Zheng, Chenhao, et al.
Published: (2024)
Residual Attention Single-Head Vision Transformer Network for Rolling Bearing Fault Diagnosis in Noisy Environments
by: Lai, Songjiang, et al.
Published: (2024)
by: Lai, Songjiang, et al.
Published: (2024)
RePaViT: Scalable Vision Transformer Acceleration via Structural Reparameterization on Feedforward Network Layers
by: Xu, Xuwei, et al.
Published: (2025)
by: Xu, Xuwei, et al.
Published: (2025)
Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
by: Ren, Yuchen, et al.
Published: (2025)
by: Ren, Yuchen, et al.
Published: (2025)
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
by: Jia, Ding, et al.
Published: (2024)
by: Jia, Ding, et al.
Published: (2024)
Neighbor-Aware Token Reduction via Hilbert Curve for Vision Transformers
by: Li, Yunge, et al.
Published: (2025)
by: Li, Yunge, et al.
Published: (2025)
Matryoshka Query Transformer for Large Vision-Language Models
by: Hu, Wenbo, et al.
Published: (2024)
by: Hu, Wenbo, et al.
Published: (2024)
A drone detector with modified backbone and multiple pyramid featuremaps enhancement structure (MDDPE)
by: Wu, Chenhao
Published: (2024)
by: Wu, Chenhao
Published: (2024)
OmniCD: A Foundational Framework for Remote Sensing Image Change Detection Guided by Multimodal Semantics
by: Sun, Chenhao
Published: (2026)
by: Sun, Chenhao
Published: (2026)
Stratified Knowledge-Density Super-Network for Scalable Vision Transformers
by: Li, Longhua, et al.
Published: (2025)
by: Li, Longhua, et al.
Published: (2025)
Trajectory-Diversity-Driven Robust Vision-and-Language Navigation
by: Li, Jiangyang, et al.
Published: (2026)
by: Li, Jiangyang, et al.
Published: (2026)
Image Recognition with Online Lightweight Vision Transformer: A Survey
by: Zhang, Zherui, et al.
Published: (2025)
by: Zhang, Zherui, et al.
Published: (2025)
Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers
by: Tian, Huiyuan, et al.
Published: (2025)
by: Tian, Huiyuan, et al.
Published: (2025)
Representative Attention For Vision Transformers
by: Li, Yuntong, et al.
Published: (2026)
by: Li, Yuntong, et al.
Published: (2026)
Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning
by: Jiang, Weihao, et al.
Published: (2024)
by: Jiang, Weihao, et al.
Published: (2024)
ROI-Aware Multiscale Cross-Attention Vision Transformer for Pest Image Identification
by: Kim, Ga-Eun, et al.
Published: (2023)
by: Kim, Ga-Eun, et al.
Published: (2023)
RS-OOD: A Vision-Language Augmented Framework for Out-of-Distribution Detection in Remote Sensing
by: Wang, Chenhao, et al.
Published: (2025)
by: Wang, Chenhao, et al.
Published: (2025)
Diversity Covariance-Aware Prompt Learning for Vision-Language Models
by: Dong, Songlin, et al.
Published: (2025)
by: Dong, Songlin, et al.
Published: (2025)
Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression
by: Saadi, Ibtissam, et al.
Published: (2024)
by: Saadi, Ibtissam, et al.
Published: (2024)
Feature Coding for Scalable Machine Vision
by: Eimon, Md Eimran Hossain, et al.
Published: (2025)
by: Eimon, Md Eimran Hossain, et al.
Published: (2025)
FlashVGGT: Efficient and Scalable Visual Geometry Transformers with Compressed Descriptor Attention
by: Wang, Zipeng, et al.
Published: (2025)
by: Wang, Zipeng, et al.
Published: (2025)
Denoising Vision Transformers
by: Yang, Jiawei, et al.
Published: (2024)
by: Yang, Jiawei, et al.
Published: (2024)
Structured Initialization for Vision Transformers
by: Zheng, Jianqiao, et al.
Published: (2025)
by: Zheng, Jianqiao, et al.
Published: (2025)
Interpretability-Aware Vision Transformer
by: Qiang, Yao, et al.
Published: (2023)
by: Qiang, Yao, et al.
Published: (2023)
Mask-TS Net: Mask Temperature Scaling Uncertainty Calibration for Polyp Segmentation
by: Zhang, Yudian, et al.
Published: (2024)
by: Zhang, Yudian, et al.
Published: (2024)
Similar Items
-
Block-based Symmetric Pruning and Fusion for Efficient Vision Transformers
by: Hsieh, Yi-Kuan, et al.
Published: (2025) -
Cross-Age Contrastive Learning for Age-Invariant Face Recognition
by: Wang, Haoyi, et al.
Published: (2023) -
SCHEME: Scalable Channel Mixer for Vision Transformers
by: Sridhar, Deepak, et al.
Published: (2023) -
From Age Estimation to Age-Invariant Face Recognition: Generalized Age Feature Extraction Using Order-Enhanced Contrastive Learning
by: Wang, Haoyi, et al.
Published: (2025) -
Selective State Space Memory for Large Vision-Language Models
by: Ng, Chee, et al.
Published: (2024)