:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dong, Guanfang, Schultz, Luke, Hassanpour, Negar, Gao, Chao
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.12083
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Accelerating Inference of Networks in the Frequency Domain
by: Zhao, Chenqiu, et al.
Published: (2024)

Learning Temporal Distribution and Spatial Correlation Towards Universal Moving Object Segmentation
by: Dong, Guanfang, et al.
Published: (2023)

Qua$^2$SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models
by: Mills, Keith G., et al.
Published: (2024)

ReVision: Refining Video Diffusion with Explicit 3D Motion Modeling
by: Liu, Qihao, et al.
Published: (2025)

Frequency Regularization: Restricting Information Redundancy of Convolutional Neural Networks
by: Zhao, Chenqiu, et al.
Published: (2023)

PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
by: Jiang, Liyao, et al.
Published: (2024)

Griffin: Generative Reference and Layout Guided Image Composition
by: Mikaeili, Aryan, et al.
Published: (2025)

Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing
by: Dong, Wei, et al.
Published: (2023)

Fantastic Multi-Task Gradient Updates and How to Find Them In a Cone
by: Hassanpour, Negar, et al.
Published: (2025)

Context-Aware Token Selection and Packing for Enhanced Vision Transformer
by: Zhang, Tianyi, et al.
Published: (2024)

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
by: Image Team, et al.
Published: (2025)

Looking Locally: Object-Centric Vision Transformers as Foundation Models for Efficient Segmentation
by: Traub, Manuel, et al.
Published: (2025)

ROI-Packing: Efficient Region-Based Compression for Machine Vision
by: Eimon, Md Eimran Hossain, et al.
Published: (2025)

RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models
by: Fecso, Ronald, et al.
Published: (2025)

Vision Transformer-Based Deep Learning for Histologic Classification of Endometrial Cancer
by: Goyal, Manu, et al.
Published: (2023)

FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting
by: Jiang, Liyao, et al.
Published: (2024)

Embedding Compression for Efficient Re-Identification
by: McDermott, Luke
Published: (2024)

PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer
by: Feng, Qian, et al.
Published: (2024)

ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models
by: Truong, Thanh-Dat, et al.
Published: (2024)

Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
by: Ren, Yuchen, et al.
Published: (2025)

Hierarchical Re-Classification: Combining Animal Classification Models with Vision Transformers
by: Markoff, Hugo, et al.
Published: (2025)

SPROUT: A Scalable Diffusion Foundation Model for Agricultural Vision
by: Xiang, Shuai, et al.
Published: (2026)

LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking
by: Dong, Shaohua, et al.
Published: (2024)

FiT: Flexible Vision Transformer for Diffusion Model
by: Lu, Zeyu, et al.
Published: (2024)

TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning
by: Xie, Jingjing, et al.
Published: (2024)

Fusion of Foundation and Vision Transformer Model Features for Dermatoscopic Image Classification
by: Mahbod, Amirreza, et al.
Published: (2025)

HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer
by: Cai, Qi, et al.
Published: (2025)

Query-Efficient Hard-Label Black-Box Attack against Vision Transformers
by: Zhou, Chao, et al.
Published: (2024)

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers
by: Li, Zhuojin, et al.
Published: (2026)

Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID
by: Cheng, De, et al.
Published: (2025)

Falcon: A Remote Sensing Vision-Language Foundation Model (Technical Report)
by: Yao, Kelu, et al.
Published: (2025)

PeftCD: Leveraging Vision Foundation Models with Parameter-Efficient Fine-Tuning for Remote Sensing Change Detection
by: Dong, Sijun, et al.
Published: (2025)

Bootstrapping SparseFormers from Vision Foundation Models
by: Gao, Ziteng, et al.
Published: (2023)

Vision Transformer based Random Walk for Group Re-Identification
by: Zhang, Guoqing, et al.
Published: (2024)

AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors
by: Fučka, Matic, et al.
Published: (2026)

Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification
by: Wang, Yingquan, et al.
Published: (2024)

Rein++: Efficient Generalization and Adaptation for Semantic Segmentation with Vision Foundation Models
by: Wei, Zhixiang, et al.
Published: (2025)

GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
by: Jia, Ding, et al.
Published: (2024)

Amber-Image: Efficient Compression of Large-Scale Diffusion Transformers
by: Yang, Chaojie, et al.
Published: (2026)

DetRefiner: Model-Agnostic Detection Refinement with Feature Fusion Transformer
by: Okazaki, Soichiro, et al.
Published: (2026)