Saved in:
| Main Authors: | Meng, Ke, Chen, Kai |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.04820 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automated Plant Disease and Pest Detection System Using Hybrid Lightweight CNN-MobileViT Models for Diagnosis of Indigenous Crops
by: Gebremedhin, Tekleab G., et al.
Published: (2025)
by: Gebremedhin, Tekleab G., et al.
Published: (2025)
Efficient Few-Shot Learning for Edge AI via Knowledge Distillation on MobileViT
by: Tsuyuki, Shuhei, et al.
Published: (2026)
by: Tsuyuki, Shuhei, et al.
Published: (2026)
MobilePlantViT: A Mobile-friendly Hybrid ViT for Generalized Plant Disease Image Classification
by: Tonmoy, Moshiur Rahman, et al.
Published: (2025)
by: Tonmoy, Moshiur Rahman, et al.
Published: (2025)
GoViG: Goal-Conditioned Visual Navigation Instruction Generation via Multimodal Reasoning
by: Wu, Fengyi, et al.
Published: (2025)
by: Wu, Fengyi, et al.
Published: (2025)
ViRED: Prediction of Visual Relations in Engineering Drawings
by: Gu, Chao, et al.
Published: (2024)
by: Gu, Chao, et al.
Published: (2024)
ViPO: Visual Preference Optimization at Scale
by: Li, Ming, et al.
Published: (2026)
by: Li, Ming, et al.
Published: (2026)
UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation
by: Dai, Guangzhao, et al.
Published: (2024)
by: Dai, Guangzhao, et al.
Published: (2024)
ViSA-Enhanced Aerial VLN: A Visual-Spatial Reasoning Enhanced Framework for Aerial Vision-Language Navigation
by: Tong, Haoyu, et al.
Published: (2026)
by: Tong, Haoyu, et al.
Published: (2026)
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning
by: Wu, Linquan, et al.
Published: (2026)
by: Wu, Linquan, et al.
Published: (2026)
MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition
by: Kim, Ye-eun, et al.
Published: (2025)
by: Kim, Ye-eun, et al.
Published: (2025)
Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes
by: Chen, Yingyi, et al.
Published: (2024)
by: Chen, Yingyi, et al.
Published: (2024)
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
by: Bartoccioni, Florent, et al.
Published: (2025)
by: Bartoccioni, Florent, et al.
Published: (2025)
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
by: Li, Zhengang, et al.
Published: (2024)
by: Li, Zhengang, et al.
Published: (2024)
Purrturbed but Stable: Human-Cat Invariant Representations Across CNNs, ViTs and Self-Supervised ViTs
by: Shah, Arya, et al.
Published: (2025)
by: Shah, Arya, et al.
Published: (2025)
ViT-Lens: Towards Omni-modal Representations
by: Lei, Weixian, et al.
Published: (2023)
by: Lei, Weixian, et al.
Published: (2023)
MC-ViViT: Multi-branch Classifier-ViViT to detect Mild Cognitive Impairment in older adults using facial videos
by: Sun, Jian, et al.
Published: (2023)
by: Sun, Jian, et al.
Published: (2023)
Feature-EndoGaussian: Feature Distilled Gaussian Splatting in Surgical Deformable Scene Reconstruction
by: Li, Kai, et al.
Published: (2025)
by: Li, Kai, et al.
Published: (2025)
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
by: Zhang, Mengchen, et al.
Published: (2025)
by: Zhang, Mengchen, et al.
Published: (2025)
HydraViT: Stacking Heads for a Scalable ViT
by: Haberer, Janek, et al.
Published: (2024)
by: Haberer, Janek, et al.
Published: (2024)
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation
by: Xu, Xuwei, et al.
Published: (2023)
by: Xu, Xuwei, et al.
Published: (2023)
CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning
by: Li, Kailing, et al.
Published: (2025)
by: Li, Kailing, et al.
Published: (2025)
Gaussian Grouping: Segment and Edit Anything in 3D Scenes
by: Ye, Mingqiao, et al.
Published: (2023)
by: Ye, Mingqiao, et al.
Published: (2023)
MobileDenseAttn:A Dual-Stream Architecture for Accurate and Interpretable Brain Tumor Detection
by: Banik, Shudipta, et al.
Published: (2025)
by: Banik, Shudipta, et al.
Published: (2025)
SFMViT: SlowFast Meet ViT in Chaotic World
by: Lin, Jiaying, et al.
Published: (2024)
by: Lin, Jiaying, et al.
Published: (2024)
7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting
by: Gao, Zhongpai, et al.
Published: (2025)
by: Gao, Zhongpai, et al.
Published: (2025)
GI-NAS: Boosting Gradient Inversion Attacks Through Adaptive Neural Architecture Search
by: Yu, Wenbo, et al.
Published: (2024)
by: Yu, Wenbo, et al.
Published: (2024)
OmniPatch: A Universal Adversarial Patch for ViT-CNN Cross-Architecture Transfer in Semantic Segmentation
by: Aggarwal, Aarush, et al.
Published: (2026)
by: Aggarwal, Aarush, et al.
Published: (2026)
6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering
by: Gao, Zhongpai, et al.
Published: (2024)
by: Gao, Zhongpai, et al.
Published: (2024)
Enhancing Layer Attention Efficiency through Pruning Redundant Retrievals
by: Li, Hanze, et al.
Published: (2025)
by: Li, Hanze, et al.
Published: (2025)
Walk and Read Less: Improving the Efficiency of Vision-and-Language Navigation via Tuning-Free Multimodal Token Pruning
by: Qin, Wenda, et al.
Published: (2025)
by: Qin, Wenda, et al.
Published: (2025)
Global Intervention and Distillation for Federated Out-of-Distribution Generalization
by: Qi, Zhuang, et al.
Published: (2025)
by: Qi, Zhuang, et al.
Published: (2025)
EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture
by: Feng, Wenfeng, et al.
Published: (2025)
by: Feng, Wenfeng, et al.
Published: (2025)
ViLCo-Bench: VIdeo Language COntinual learning Benchmark
by: Tang, Tianqi, et al.
Published: (2024)
by: Tang, Tianqi, et al.
Published: (2024)
LoViT: Long Video Transformer for Surgical Phase Recognition
by: Liu, Yang, et al.
Published: (2023)
by: Liu, Yang, et al.
Published: (2023)
ViLLa: A Neuro-Symbolic approach for Animal Monitoring
by: Koduri, Harsha
Published: (2025)
by: Koduri, Harsha
Published: (2025)
Sub-token ViT Embedding via Stochastic Resonance Transformers
by: Lao, Dong, et al.
Published: (2023)
by: Lao, Dong, et al.
Published: (2023)
ViCrop-Det: Spatial Attention Entropy Guided Cropping for Training-Free Small-Object Detection
by: Wang, Hui, et al.
Published: (2026)
by: Wang, Hui, et al.
Published: (2026)
P2DNav: Panorama-to-Downview Reasoning for Zero-shot Vision-and-Language Navigation
by: Sheng, Kai, et al.
Published: (2026)
by: Sheng, Kai, et al.
Published: (2026)
NavOne: One-Step Global Planning for Vision-Language Navigation on Top-Down Maps
by: Zhan, Dijia, et al.
Published: (2026)
by: Zhan, Dijia, et al.
Published: (2026)
NavComposer: Composing Language Instructions for Navigation Trajectories through Action-Scene-Object Modularization
by: He, Zongtao, et al.
Published: (2025)
by: He, Zongtao, et al.
Published: (2025)
Similar Items
-
Automated Plant Disease and Pest Detection System Using Hybrid Lightweight CNN-MobileViT Models for Diagnosis of Indigenous Crops
by: Gebremedhin, Tekleab G., et al.
Published: (2025) -
Efficient Few-Shot Learning for Edge AI via Knowledge Distillation on MobileViT
by: Tsuyuki, Shuhei, et al.
Published: (2026) -
MobilePlantViT: A Mobile-friendly Hybrid ViT for Generalized Plant Disease Image Classification
by: Tonmoy, Moshiur Rahman, et al.
Published: (2025) -
GoViG: Goal-Conditioned Visual Navigation Instruction Generation via Multimodal Reasoning
by: Wu, Fengyi, et al.
Published: (2025) -
ViRED: Prediction of Visual Relations in Engineering Drawings
by: Gu, Chao, et al.
Published: (2024)