:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gao, Xiangyu, Dai, Yu, Qiu, Benliu, Wang, Lanxiao, Qiu, Heqian, Li, Hongliang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2501.16981
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning
by: Xiong, Huiyu, et al.
Published: (2024)

SAVA-X: Ego-to-Exo Imitation Error Detection via Scene-Adaptive View Alignment and Bidirectional Cross View Fusion
by: Li, Xiang, et al.
Published: (2026)

Attention-disentangled Uniform Orthogonal Feature Space Optimization for Few-shot Object Detection
by: Zhao, Taijin, et al.
Published: (2025)

GRSDet: Learning to Generate Local Reverse Samples for Few-shot Object Detection
by: Mei, Hefei, et al.
Published: (2023)

Region Prompt Tuning: Fine-grained Scene Text Detection Utilizing Region Text Prompt
by: Lin, Xingtao, et al.
Published: (2024)

EgoMe: A New Dataset and Challenge for Following Me via Egocentric View in Real World
by: Qiu, Heqian, et al.
Published: (2025)

Cognition Transferring and Decoupling for Text-supervised Egocentric Semantic Segmentation
by: Shi, Zhaofeng, et al.
Published: (2024)

Test-time Ego-Exo-centric Adaptation for Action Anticipation via Multi-Label Prototype Growing and Dual-Clue Consistency
by: Shi, Zhaofeng, et al.
Published: (2026)

Challenges and Trends in Egocentric Vision: A Survey
by: Li, Xiang, et al.
Published: (2025)

A Hybrid Framework Bridging CNN and ViT based on Theory of Evidence for Diabetic Retinopathy Grading
by: Qiu, Junlai, et al.
Published: (2025)

RepViT: Revisiting Mobile CNN From ViT Perspective
by: Wang, Ao, et al.
Published: (2023)

Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
by: Salzmann, Tim, et al.
Published: (2024)

ViT$^3$: Unlocking Test-Time Training in Vision
by: Han, Dongchen, et al.
Published: (2025)

OSR-ViT: A Simple and Modular Framework for Open-Set Object Detection and Discovery
by: Inkawhich, Matthew, et al.
Published: (2024)

Slightly Shift New Classes to Remember Old Classes for Video Class-Incremental Learning
by: Jiao, Jian, et al.
Published: (2024)

VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs
by: Wang, Xiyao, et al.
Published: (2026)

UniRefiner: Teaching Pre-trained ViTs to Self-Dispose Dross via Contrastive Register
by: Qiu, Congpei, et al.
Published: (2026)

Adaptive Aspect Ratios with Patch-Mixup-ViT-based Vehicle ReID
by: Qiu, Mei, et al.
Published: (2024)

I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
by: Zhong, Yunshan, et al.
Published: (2023)

ViT-Lens: Towards Omni-modal Representations
by: Lei, Weixian, et al.
Published: (2023)

Let ViT Speak: Generative Language-Image Pre-training
by: Fang, Yan, et al.
Published: (2026)

Taming Self-Training for Open-Vocabulary Object Detection
by: Zhao, Shiyu, et al.
Published: (2023)

Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition
by: Alonso-Fernandez, Fernando, et al.
Published: (2024)

CNN and ViT Efficiency Study on Tiny ImageNet and DermaMNIST Datasets
by: Amangeldi, Aidar, et al.
Published: (2025)

MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer
by: Tai, Yu-Shan, et al.
Published: (2024)

CNN-ViT Hybrid for Pneumonia Detection: Theory and Empiric on Limited Data without Pretraining
by: Basnet, Prashant Singh, et al.
Published: (2025)

Hybrid CNN-ViT Framework for Motion-Blurred Scene Text Restoration
by: Rashid, Umar, et al.
Published: (2025)

Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization
by: Wang, Zhao, et al.
Published: (2024)

Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection
by: Aubard, Martin, et al.
Published: (2024)

VAT: Vision Action Transformer by Unlocking Full Representation of ViT
by: Li, Wenhao, et al.
Published: (2025)

Exploring Open-Vocabulary Object Recognition in Images using CLIP
by: Chen, Wei Yu, et al.
Published: (2026)

Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion
by: Dong, Caixia, et al.
Published: (2025)

A Hybrid CNN-ViT-GNN Framework with GAN-Based Augmentation for Intelligent Weed Detection in Precision Agriculture
by: V, Pandiyaraju, et al.
Published: (2025)

Purrturbed but Stable: Human-Cat Invariant Representations Across CNNs, ViTs and Self-Supervised ViTs
by: Shah, Arya, et al.
Published: (2025)

Learning to Detect and Segment for Open Vocabulary Object Detection
by: Wang, Tao, et al.
Published: (2022)

TFS-ViT: Token-Level Feature Stylization for Domain Generalization
by: Noori, Mehrdad, et al.
Published: (2023)

Pretrained ViTs Yield Versatile Representations For Medical Images
by: Matsoukas, Christos, et al.
Published: (2023)

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
by: Xu, Shilin, et al.
Published: (2023)

Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation
by: Zheng, Yanhao, et al.
Published: (2024)

Scaling Open-Vocabulary Object Detection
by: Minderer, Matthias, et al.
Published: (2023)