Saved in:
| Main Authors: | Gao, Xiangyu, Dai, Yu, Qiu, Benliu, Wang, Lanxiao, Qiu, Heqian, Li, Hongliang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.16981 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning
by: Xiong, Huiyu, et al.
Published: (2024)
by: Xiong, Huiyu, et al.
Published: (2024)
SAVA-X: Ego-to-Exo Imitation Error Detection via Scene-Adaptive View Alignment and Bidirectional Cross View Fusion
by: Li, Xiang, et al.
Published: (2026)
by: Li, Xiang, et al.
Published: (2026)
Attention-disentangled Uniform Orthogonal Feature Space Optimization for Few-shot Object Detection
by: Zhao, Taijin, et al.
Published: (2025)
by: Zhao, Taijin, et al.
Published: (2025)
GRSDet: Learning to Generate Local Reverse Samples for Few-shot Object Detection
by: Mei, Hefei, et al.
Published: (2023)
by: Mei, Hefei, et al.
Published: (2023)
Region Prompt Tuning: Fine-grained Scene Text Detection Utilizing Region Text Prompt
by: Lin, Xingtao, et al.
Published: (2024)
by: Lin, Xingtao, et al.
Published: (2024)
EgoMe: A New Dataset and Challenge for Following Me via Egocentric View in Real World
by: Qiu, Heqian, et al.
Published: (2025)
by: Qiu, Heqian, et al.
Published: (2025)
Cognition Transferring and Decoupling for Text-supervised Egocentric Semantic Segmentation
by: Shi, Zhaofeng, et al.
Published: (2024)
by: Shi, Zhaofeng, et al.
Published: (2024)
Test-time Ego-Exo-centric Adaptation for Action Anticipation via Multi-Label Prototype Growing and Dual-Clue Consistency
by: Shi, Zhaofeng, et al.
Published: (2026)
by: Shi, Zhaofeng, et al.
Published: (2026)
Challenges and Trends in Egocentric Vision: A Survey
by: Li, Xiang, et al.
Published: (2025)
by: Li, Xiang, et al.
Published: (2025)
A Hybrid Framework Bridging CNN and ViT based on Theory of Evidence for Diabetic Retinopathy Grading
by: Qiu, Junlai, et al.
Published: (2025)
by: Qiu, Junlai, et al.
Published: (2025)
RepViT: Revisiting Mobile CNN From ViT Perspective
by: Wang, Ao, et al.
Published: (2023)
by: Wang, Ao, et al.
Published: (2023)
Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
by: Salzmann, Tim, et al.
Published: (2024)
by: Salzmann, Tim, et al.
Published: (2024)
ViT$^3$: Unlocking Test-Time Training in Vision
by: Han, Dongchen, et al.
Published: (2025)
by: Han, Dongchen, et al.
Published: (2025)
OSR-ViT: A Simple and Modular Framework for Open-Set Object Detection and Discovery
by: Inkawhich, Matthew, et al.
Published: (2024)
by: Inkawhich, Matthew, et al.
Published: (2024)
Slightly Shift New Classes to Remember Old Classes for Video Class-Incremental Learning
by: Jiao, Jian, et al.
Published: (2024)
by: Jiao, Jian, et al.
Published: (2024)
VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs
by: Wang, Xiyao, et al.
Published: (2026)
by: Wang, Xiyao, et al.
Published: (2026)
UniRefiner: Teaching Pre-trained ViTs to Self-Dispose Dross via Contrastive Register
by: Qiu, Congpei, et al.
Published: (2026)
by: Qiu, Congpei, et al.
Published: (2026)
Adaptive Aspect Ratios with Patch-Mixup-ViT-based Vehicle ReID
by: Qiu, Mei, et al.
Published: (2024)
by: Qiu, Mei, et al.
Published: (2024)
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
by: Zhong, Yunshan, et al.
Published: (2023)
by: Zhong, Yunshan, et al.
Published: (2023)
ViT-Lens: Towards Omni-modal Representations
by: Lei, Weixian, et al.
Published: (2023)
by: Lei, Weixian, et al.
Published: (2023)
Let ViT Speak: Generative Language-Image Pre-training
by: Fang, Yan, et al.
Published: (2026)
by: Fang, Yan, et al.
Published: (2026)
Taming Self-Training for Open-Vocabulary Object Detection
by: Zhao, Shiyu, et al.
Published: (2023)
by: Zhao, Shiyu, et al.
Published: (2023)
Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition
by: Alonso-Fernandez, Fernando, et al.
Published: (2024)
by: Alonso-Fernandez, Fernando, et al.
Published: (2024)
CNN and ViT Efficiency Study on Tiny ImageNet and DermaMNIST Datasets
by: Amangeldi, Aidar, et al.
Published: (2025)
by: Amangeldi, Aidar, et al.
Published: (2025)
MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer
by: Tai, Yu-Shan, et al.
Published: (2024)
by: Tai, Yu-Shan, et al.
Published: (2024)
CNN-ViT Hybrid for Pneumonia Detection: Theory and Empiric on Limited Data without Pretraining
by: Basnet, Prashant Singh, et al.
Published: (2025)
by: Basnet, Prashant Singh, et al.
Published: (2025)
Hybrid CNN-ViT Framework for Motion-Blurred Scene Text Restoration
by: Rashid, Umar, et al.
Published: (2025)
by: Rashid, Umar, et al.
Published: (2025)
Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization
by: Wang, Zhao, et al.
Published: (2024)
by: Wang, Zhao, et al.
Published: (2024)
Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection
by: Aubard, Martin, et al.
Published: (2024)
by: Aubard, Martin, et al.
Published: (2024)
VAT: Vision Action Transformer by Unlocking Full Representation of ViT
by: Li, Wenhao, et al.
Published: (2025)
by: Li, Wenhao, et al.
Published: (2025)
Exploring Open-Vocabulary Object Recognition in Images using CLIP
by: Chen, Wei Yu, et al.
Published: (2026)
by: Chen, Wei Yu, et al.
Published: (2026)
Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion
by: Dong, Caixia, et al.
Published: (2025)
by: Dong, Caixia, et al.
Published: (2025)
A Hybrid CNN-ViT-GNN Framework with GAN-Based Augmentation for Intelligent Weed Detection in Precision Agriculture
by: V, Pandiyaraju, et al.
Published: (2025)
by: V, Pandiyaraju, et al.
Published: (2025)
Purrturbed but Stable: Human-Cat Invariant Representations Across CNNs, ViTs and Self-Supervised ViTs
by: Shah, Arya, et al.
Published: (2025)
by: Shah, Arya, et al.
Published: (2025)
Learning to Detect and Segment for Open Vocabulary Object Detection
by: Wang, Tao, et al.
Published: (2022)
by: Wang, Tao, et al.
Published: (2022)
TFS-ViT: Token-Level Feature Stylization for Domain Generalization
by: Noori, Mehrdad, et al.
Published: (2023)
by: Noori, Mehrdad, et al.
Published: (2023)
Pretrained ViTs Yield Versatile Representations For Medical Images
by: Matsoukas, Christos, et al.
Published: (2023)
by: Matsoukas, Christos, et al.
Published: (2023)
DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
by: Xu, Shilin, et al.
Published: (2023)
by: Xu, Shilin, et al.
Published: (2023)
Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation
by: Zheng, Yanhao, et al.
Published: (2024)
by: Zheng, Yanhao, et al.
Published: (2024)
Scaling Open-Vocabulary Object Detection
by: Minderer, Matthias, et al.
Published: (2023)
by: Minderer, Matthias, et al.
Published: (2023)
Similar Items
-
MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning
by: Xiong, Huiyu, et al.
Published: (2024) -
SAVA-X: Ego-to-Exo Imitation Error Detection via Scene-Adaptive View Alignment and Bidirectional Cross View Fusion
by: Li, Xiang, et al.
Published: (2026) -
Attention-disentangled Uniform Orthogonal Feature Space Optimization for Few-shot Object Detection
by: Zhao, Taijin, et al.
Published: (2025) -
GRSDet: Learning to Generate Local Reverse Samples for Few-shot Object Detection
by: Mei, Hefei, et al.
Published: (2023) -
Region Prompt Tuning: Fine-grained Scene Text Detection Utilizing Region Text Prompt
by: Lin, Xingtao, et al.
Published: (2024)