:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Smerkous, David, Wang, Zian, Najafian, Behzad
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.16249
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
by: Gao, Difei, et al.
Published: (2023)

D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
by: Choi, Suhwan, et al.
Published: (2025)

Self-Supervised Multi-View Representation Learning using Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025)

Facial Emotion Learning with Text-Guided Multiview Fusion via Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025)

SynthSet: Generative Diffusion Model for Semantic Segmentation in Precision Agriculture
by: Heschl, Andrew, et al.
Published: (2024)

Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025)

Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model
by: Dalaq, Alaa, et al.
Published: (2025)

Autoregressive Pretraining with Mamba in Vision
by: Ren, Sucheng, et al.
Published: (2024)

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
by: Nayak, Shravan, et al.
Published: (2025)

Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model
by: Behzad, Muzammil, et al.
Published: (2025)

Scaling Down to Scale Up: Towards Operationally-Efficient and Deployable Clinical Models via Cross-Modal Low-Rank Adaptation for Medical Vision-Language Models
by: Alzubaidi, Thuraya, et al.
Published: (2025)

Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
by: Chen, Jun, et al.
Published: (2022)

Distilling Vision-Language Pretraining for Efficient Cross-Modal Retrieval
by: Jang, Young Kyun, et al.
Published: (2024)

Beam-Guided Knowledge Replay for Knowledge-Rich Image Captioning using Vision-Language Model
by: AlJunaid, Reem, et al.
Published: (2025)

Modified CycleGAN for the synthesization of samples for wheat head segmentation
by: Myers, Jaden, et al.
Published: (2024)

FullLoRA: Efficiently Boosting the Robustness of Pretrained Vision Transformers
by: Yuan, Zheng, et al.
Published: (2024)

OmniSVG: A Unified Scalable Vector Graphics Generation Model
by: Yang, Yiying, et al.
Published: (2025)

Learning to Read Where to Look: Disease-Aware Vision-Language Pretraining for 3D CT
by: Ging, Simon, et al.
Published: (2026)

SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis
by: Hu, Teng, et al.
Published: (2024)

Evaluating Graphical Perception Capabilities of Vision Transformers
by: Poonam, Poonam, et al.
Published: (2026)

A Semi-Self-Supervised Approach for Dense-Pattern Video Object Segmentation
by: Najafian, Keyhan, et al.
Published: (2024)

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
by: Duan, Yuchen, et al.
Published: (2024)

A Real-time 3D Desktop Display
by: Tenze, Livio, et al.
Published: (2025)

VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
by: Liu, Yunze, et al.
Published: (2025)

FrEVL: Leveraging Frozen Pretrained Embeddings for Efficient Vision-Language Understanding
by: Bourigault, Emmanuelle, et al.
Published: (2025)

From Semantic To Instance: A Semi-Self-Supervised Learning Approach
by: Najafian, Keyhan, et al.
Published: (2025)

Leveraging Large Language Models For Scalable Vector Graphics Processing: A Review
by: Malashenko, Boris, et al.
Published: (2025)

SVL: Spike-based Vision-language Pretraining for Efficient 3D Open-world Understanding
by: Qiu, Xuerui, et al.
Published: (2025)

Revisiting Prompt Pretraining of Vision-Language Models
by: Chen, Zhenyuan, et al.
Published: (2024)

WinDeskGround: A Benchmark for Robust GUI Grounding in Complex Multi-Window Desktop Environments
by: Zhao, Haoren, et al.
Published: (2026)

Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
by: Zeng, Weili, et al.
Published: (2025)

Distill-SODA: Distilling Self-Supervised Vision Transformer for Source-Free Open-Set Domain Adaptation in Computational Pathology
by: Vray, Guillaume, et al.
Published: (2023)

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models
by: He, Guangzhao, et al.
Published: (2026)

Dynamic Pattern Alignment Learning for Pretraining Lightweight Human-Centric Vision Models
by: Wang, Xuanhan, et al.
Published: (2025)

Harvest Video Foundation Models via Efficient Post-Pretraining
by: Li, Yizhuo, et al.
Published: (2023)

GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology
by: Kapse, Saarthak, et al.
Published: (2025)

Enhancing Fine-Grained Vision-Language Pretraining with Negative Augmented Samples
by: Wang, Yeyuan, et al.
Published: (2024)

Separators in Enhancing Autoregressive Pretraining for Vision Mamba
by: Liu, Hanpeng, et al.
Published: (2026)

Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models
by: Wang, Zhongqi, et al.
Published: (2025)

Sharingan: Extract User Action Sequence from Desktop Recordings
by: Chen, Yanting, et al.
Published: (2024)