Saved in:
| Main Authors: | Dong, Guanfang, Schultz, Luke, Hassanpour, Negar, Gao, Chao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.12083 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Accelerating Inference of Networks in the Frequency Domain
by: Zhao, Chenqiu, et al.
Published: (2024)
by: Zhao, Chenqiu, et al.
Published: (2024)
Learning Temporal Distribution and Spatial Correlation Towards Universal Moving Object Segmentation
by: Dong, Guanfang, et al.
Published: (2023)
by: Dong, Guanfang, et al.
Published: (2023)
Qua$^2$SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models
by: Mills, Keith G., et al.
Published: (2024)
by: Mills, Keith G., et al.
Published: (2024)
ReVision: Refining Video Diffusion with Explicit 3D Motion Modeling
by: Liu, Qihao, et al.
Published: (2025)
by: Liu, Qihao, et al.
Published: (2025)
Frequency Regularization: Restricting Information Redundancy of Convolutional Neural Networks
by: Zhao, Chenqiu, et al.
Published: (2023)
by: Zhao, Chenqiu, et al.
Published: (2023)
PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
by: Jiang, Liyao, et al.
Published: (2024)
by: Jiang, Liyao, et al.
Published: (2024)
Griffin: Generative Reference and Layout Guided Image Composition
by: Mikaeili, Aryan, et al.
Published: (2025)
by: Mikaeili, Aryan, et al.
Published: (2025)
Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing
by: Dong, Wei, et al.
Published: (2023)
by: Dong, Wei, et al.
Published: (2023)
Fantastic Multi-Task Gradient Updates and How to Find Them In a Cone
by: Hassanpour, Negar, et al.
Published: (2025)
by: Hassanpour, Negar, et al.
Published: (2025)
Context-Aware Token Selection and Packing for Enhanced Vision Transformer
by: Zhang, Tianyi, et al.
Published: (2024)
by: Zhang, Tianyi, et al.
Published: (2024)
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
by: Image Team, et al.
Published: (2025)
by: Image Team, et al.
Published: (2025)
Looking Locally: Object-Centric Vision Transformers as Foundation Models for Efficient Segmentation
by: Traub, Manuel, et al.
Published: (2025)
by: Traub, Manuel, et al.
Published: (2025)
ROI-Packing: Efficient Region-Based Compression for Machine Vision
by: Eimon, Md Eimran Hossain, et al.
Published: (2025)
by: Eimon, Md Eimran Hossain, et al.
Published: (2025)
RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models
by: Fecso, Ronald, et al.
Published: (2025)
by: Fecso, Ronald, et al.
Published: (2025)
Vision Transformer-Based Deep Learning for Histologic Classification of Endometrial Cancer
by: Goyal, Manu, et al.
Published: (2023)
by: Goyal, Manu, et al.
Published: (2023)
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting
by: Jiang, Liyao, et al.
Published: (2024)
by: Jiang, Liyao, et al.
Published: (2024)
Embedding Compression for Efficient Re-Identification
by: McDermott, Luke
Published: (2024)
by: McDermott, Luke
Published: (2024)
PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer
by: Feng, Qian, et al.
Published: (2024)
by: Feng, Qian, et al.
Published: (2024)
ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models
by: Truong, Thanh-Dat, et al.
Published: (2024)
by: Truong, Thanh-Dat, et al.
Published: (2024)
Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
by: Ren, Yuchen, et al.
Published: (2025)
by: Ren, Yuchen, et al.
Published: (2025)
Hierarchical Re-Classification: Combining Animal Classification Models with Vision Transformers
by: Markoff, Hugo, et al.
Published: (2025)
by: Markoff, Hugo, et al.
Published: (2025)
SPROUT: A Scalable Diffusion Foundation Model for Agricultural Vision
by: Xiang, Shuai, et al.
Published: (2026)
by: Xiang, Shuai, et al.
Published: (2026)
LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking
by: Dong, Shaohua, et al.
Published: (2024)
by: Dong, Shaohua, et al.
Published: (2024)
FiT: Flexible Vision Transformer for Diffusion Model
by: Lu, Zeyu, et al.
Published: (2024)
by: Lu, Zeyu, et al.
Published: (2024)
TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning
by: Xie, Jingjing, et al.
Published: (2024)
by: Xie, Jingjing, et al.
Published: (2024)
Fusion of Foundation and Vision Transformer Model Features for Dermatoscopic Image Classification
by: Mahbod, Amirreza, et al.
Published: (2025)
by: Mahbod, Amirreza, et al.
Published: (2025)
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer
by: Cai, Qi, et al.
Published: (2025)
by: Cai, Qi, et al.
Published: (2025)
Query-Efficient Hard-Label Black-Box Attack against Vision Transformers
by: Zhou, Chao, et al.
Published: (2024)
by: Zhou, Chao, et al.
Published: (2024)
CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers
by: Li, Zhuojin, et al.
Published: (2026)
by: Li, Zhuojin, et al.
Published: (2026)
Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID
by: Cheng, De, et al.
Published: (2025)
by: Cheng, De, et al.
Published: (2025)
Falcon: A Remote Sensing Vision-Language Foundation Model (Technical Report)
by: Yao, Kelu, et al.
Published: (2025)
by: Yao, Kelu, et al.
Published: (2025)
PeftCD: Leveraging Vision Foundation Models with Parameter-Efficient Fine-Tuning for Remote Sensing Change Detection
by: Dong, Sijun, et al.
Published: (2025)
by: Dong, Sijun, et al.
Published: (2025)
Bootstrapping SparseFormers from Vision Foundation Models
by: Gao, Ziteng, et al.
Published: (2023)
by: Gao, Ziteng, et al.
Published: (2023)
Vision Transformer based Random Walk for Group Re-Identification
by: Zhang, Guoqing, et al.
Published: (2024)
by: Zhang, Guoqing, et al.
Published: (2024)
AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors
by: Fučka, Matic, et al.
Published: (2026)
by: Fučka, Matic, et al.
Published: (2026)
Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification
by: Wang, Yingquan, et al.
Published: (2024)
by: Wang, Yingquan, et al.
Published: (2024)
Rein++: Efficient Generalization and Adaptation for Semantic Segmentation with Vision Foundation Models
by: Wei, Zhixiang, et al.
Published: (2025)
by: Wei, Zhixiang, et al.
Published: (2025)
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
by: Jia, Ding, et al.
Published: (2024)
by: Jia, Ding, et al.
Published: (2024)
Amber-Image: Efficient Compression of Large-Scale Diffusion Transformers
by: Yang, Chaojie, et al.
Published: (2026)
by: Yang, Chaojie, et al.
Published: (2026)
DetRefiner: Model-Agnostic Detection Refinement with Feature Fusion Transformer
by: Okazaki, Soichiro, et al.
Published: (2026)
by: Okazaki, Soichiro, et al.
Published: (2026)
Similar Items
-
Accelerating Inference of Networks in the Frequency Domain
by: Zhao, Chenqiu, et al.
Published: (2024) -
Learning Temporal Distribution and Spatial Correlation Towards Universal Moving Object Segmentation
by: Dong, Guanfang, et al.
Published: (2023) -
Qua$^2$SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models
by: Mills, Keith G., et al.
Published: (2024) -
ReVision: Refining Video Diffusion with Explicit 3D Motion Modeling
by: Liu, Qihao, et al.
Published: (2025) -
Frequency Regularization: Restricting Information Redundancy of Convolutional Neural Networks
by: Zhao, Chenqiu, et al.
Published: (2023)