:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Meng, Ke, Chen, Kai
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2406.04820
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Automated Plant Disease and Pest Detection System Using Hybrid Lightweight CNN-MobileViT Models for Diagnosis of Indigenous Crops
by: Gebremedhin, Tekleab G., et al.
Published: (2025)

Efficient Few-Shot Learning for Edge AI via Knowledge Distillation on MobileViT
by: Tsuyuki, Shuhei, et al.
Published: (2026)

MobilePlantViT: A Mobile-friendly Hybrid ViT for Generalized Plant Disease Image Classification
by: Tonmoy, Moshiur Rahman, et al.
Published: (2025)

GoViG: Goal-Conditioned Visual Navigation Instruction Generation via Multimodal Reasoning
by: Wu, Fengyi, et al.
Published: (2025)

ViRED: Prediction of Visual Relations in Engineering Drawings
by: Gu, Chao, et al.
Published: (2024)

ViPO: Visual Preference Optimization at Scale
by: Li, Ming, et al.
Published: (2026)

UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation
by: Dai, Guangzhao, et al.
Published: (2024)

ViSA-Enhanced Aerial VLN: A Visual-Spatial Reasoning Enhanced Framework for Aerial Vision-Language Navigation
by: Tong, Haoyu, et al.
Published: (2026)

LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning
by: Wu, Linquan, et al.
Published: (2026)

MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition
by: Kim, Ye-eun, et al.
Published: (2025)

Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes
by: Chen, Yingyi, et al.
Published: (2024)

VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
by: Bartoccioni, Florent, et al.
Published: (2025)

Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
by: Li, Zhengang, et al.
Published: (2024)

Purrturbed but Stable: Human-Cat Invariant Representations Across CNNs, ViTs and Self-Supervised ViTs
by: Shah, Arya, et al.
Published: (2025)

ViT-Lens: Towards Omni-modal Representations
by: Lei, Weixian, et al.
Published: (2023)

MC-ViViT: Multi-branch Classifier-ViViT to detect Mild Cognitive Impairment in older adults using facial videos
by: Sun, Jian, et al.
Published: (2023)

Feature-EndoGaussian: Feature Distilled Gaussian Splatting in Surgical Deformable Scene Reconstruction
by: Li, Kai, et al.
Published: (2025)

ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
by: Zhang, Mengchen, et al.
Published: (2025)

HydraViT: Stacking Heads for a Scalable ViT
by: Haberer, Janek, et al.
Published: (2024)

GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation
by: Xu, Xuwei, et al.
Published: (2023)

CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning
by: Li, Kailing, et al.
Published: (2025)

Gaussian Grouping: Segment and Edit Anything in 3D Scenes
by: Ye, Mingqiao, et al.
Published: (2023)

MobileDenseAttn:A Dual-Stream Architecture for Accurate and Interpretable Brain Tumor Detection
by: Banik, Shudipta, et al.
Published: (2025)

SFMViT: SlowFast Meet ViT in Chaotic World
by: Lin, Jiaying, et al.
Published: (2024)

7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting
by: Gao, Zhongpai, et al.
Published: (2025)

GI-NAS: Boosting Gradient Inversion Attacks Through Adaptive Neural Architecture Search
by: Yu, Wenbo, et al.
Published: (2024)

OmniPatch: A Universal Adversarial Patch for ViT-CNN Cross-Architecture Transfer in Semantic Segmentation
by: Aggarwal, Aarush, et al.
Published: (2026)

6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering
by: Gao, Zhongpai, et al.
Published: (2024)

Enhancing Layer Attention Efficiency through Pruning Redundant Retrievals
by: Li, Hanze, et al.
Published: (2025)

Walk and Read Less: Improving the Efficiency of Vision-and-Language Navigation via Tuning-Free Multimodal Token Pruning
by: Qin, Wenda, et al.
Published: (2025)

Global Intervention and Distillation for Federated Out-of-Distribution Generalization
by: Qi, Zhuang, et al.
Published: (2025)

EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture
by: Feng, Wenfeng, et al.
Published: (2025)

ViLCo-Bench: VIdeo Language COntinual learning Benchmark
by: Tang, Tianqi, et al.
Published: (2024)

LoViT: Long Video Transformer for Surgical Phase Recognition
by: Liu, Yang, et al.
Published: (2023)

ViLLa: A Neuro-Symbolic approach for Animal Monitoring
by: Koduri, Harsha
Published: (2025)

Sub-token ViT Embedding via Stochastic Resonance Transformers
by: Lao, Dong, et al.
Published: (2023)

ViCrop-Det: Spatial Attention Entropy Guided Cropping for Training-Free Small-Object Detection
by: Wang, Hui, et al.
Published: (2026)

P2DNav: Panorama-to-Downview Reasoning for Zero-shot Vision-and-Language Navigation
by: Sheng, Kai, et al.
Published: (2026)

NavOne: One-Step Global Planning for Vision-Language Navigation on Top-Down Maps
by: Zhan, Dijia, et al.
Published: (2026)

NavComposer: Composing Language Instructions for Navigation Trajectories through Action-Scene-Object Modularization
by: He, Zongtao, et al.
Published: (2025)