Saved in:
| Main Authors: | Smerkous, David, Wang, Zian, Najafian, Behzad |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.16249 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
by: Gao, Difei, et al.
Published: (2023)
by: Gao, Difei, et al.
Published: (2023)
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
by: Choi, Suhwan, et al.
Published: (2025)
by: Choi, Suhwan, et al.
Published: (2025)
Self-Supervised Multi-View Representation Learning using Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025)
by: Behzad, Muzammil
Published: (2025)
Facial Emotion Learning with Text-Guided Multiview Fusion via Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025)
by: Behzad, Muzammil
Published: (2025)
SynthSet: Generative Diffusion Model for Semantic Segmentation in Precision Agriculture
by: Heschl, Andrew, et al.
Published: (2024)
by: Heschl, Andrew, et al.
Published: (2024)
Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025)
by: Behzad, Muzammil
Published: (2025)
Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model
by: Dalaq, Alaa, et al.
Published: (2025)
by: Dalaq, Alaa, et al.
Published: (2025)
Autoregressive Pretraining with Mamba in Vision
by: Ren, Sucheng, et al.
Published: (2024)
by: Ren, Sucheng, et al.
Published: (2024)
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
by: Nayak, Shravan, et al.
Published: (2025)
by: Nayak, Shravan, et al.
Published: (2025)
Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model
by: Behzad, Muzammil, et al.
Published: (2025)
by: Behzad, Muzammil, et al.
Published: (2025)
Scaling Down to Scale Up: Towards Operationally-Efficient and Deployable Clinical Models via Cross-Modal Low-Rank Adaptation for Medical Vision-Language Models
by: Alzubaidi, Thuraya, et al.
Published: (2025)
by: Alzubaidi, Thuraya, et al.
Published: (2025)
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
by: Chen, Jun, et al.
Published: (2022)
by: Chen, Jun, et al.
Published: (2022)
Distilling Vision-Language Pretraining for Efficient Cross-Modal Retrieval
by: Jang, Young Kyun, et al.
Published: (2024)
by: Jang, Young Kyun, et al.
Published: (2024)
Beam-Guided Knowledge Replay for Knowledge-Rich Image Captioning using Vision-Language Model
by: AlJunaid, Reem, et al.
Published: (2025)
by: AlJunaid, Reem, et al.
Published: (2025)
Modified CycleGAN for the synthesization of samples for wheat head segmentation
by: Myers, Jaden, et al.
Published: (2024)
by: Myers, Jaden, et al.
Published: (2024)
FullLoRA: Efficiently Boosting the Robustness of Pretrained Vision Transformers
by: Yuan, Zheng, et al.
Published: (2024)
by: Yuan, Zheng, et al.
Published: (2024)
OmniSVG: A Unified Scalable Vector Graphics Generation Model
by: Yang, Yiying, et al.
Published: (2025)
by: Yang, Yiying, et al.
Published: (2025)
Learning to Read Where to Look: Disease-Aware Vision-Language Pretraining for 3D CT
by: Ging, Simon, et al.
Published: (2026)
by: Ging, Simon, et al.
Published: (2026)
SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis
by: Hu, Teng, et al.
Published: (2024)
by: Hu, Teng, et al.
Published: (2024)
Evaluating Graphical Perception Capabilities of Vision Transformers
by: Poonam, Poonam, et al.
Published: (2026)
by: Poonam, Poonam, et al.
Published: (2026)
A Semi-Self-Supervised Approach for Dense-Pattern Video Object Segmentation
by: Najafian, Keyhan, et al.
Published: (2024)
by: Najafian, Keyhan, et al.
Published: (2024)
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
by: Duan, Yuchen, et al.
Published: (2024)
by: Duan, Yuchen, et al.
Published: (2024)
A Real-time 3D Desktop Display
by: Tenze, Livio, et al.
Published: (2025)
by: Tenze, Livio, et al.
Published: (2025)
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
by: Liu, Yunze, et al.
Published: (2025)
by: Liu, Yunze, et al.
Published: (2025)
FrEVL: Leveraging Frozen Pretrained Embeddings for Efficient Vision-Language Understanding
by: Bourigault, Emmanuelle, et al.
Published: (2025)
by: Bourigault, Emmanuelle, et al.
Published: (2025)
From Semantic To Instance: A Semi-Self-Supervised Learning Approach
by: Najafian, Keyhan, et al.
Published: (2025)
by: Najafian, Keyhan, et al.
Published: (2025)
Leveraging Large Language Models For Scalable Vector Graphics Processing: A Review
by: Malashenko, Boris, et al.
Published: (2025)
by: Malashenko, Boris, et al.
Published: (2025)
SVL: Spike-based Vision-language Pretraining for Efficient 3D Open-world Understanding
by: Qiu, Xuerui, et al.
Published: (2025)
by: Qiu, Xuerui, et al.
Published: (2025)
Revisiting Prompt Pretraining of Vision-Language Models
by: Chen, Zhenyuan, et al.
Published: (2024)
by: Chen, Zhenyuan, et al.
Published: (2024)
WinDeskGround: A Benchmark for Robust GUI Grounding in Complex Multi-Window Desktop Environments
by: Zhao, Haoren, et al.
Published: (2026)
by: Zhao, Haoren, et al.
Published: (2026)
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
by: Zeng, Weili, et al.
Published: (2025)
by: Zeng, Weili, et al.
Published: (2025)
Distill-SODA: Distilling Self-Supervised Vision Transformer for Source-Free Open-Set Domain Adaptation in Computational Pathology
by: Vray, Guillaume, et al.
Published: (2023)
by: Vray, Guillaume, et al.
Published: (2023)
Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models
by: He, Guangzhao, et al.
Published: (2026)
by: He, Guangzhao, et al.
Published: (2026)
Dynamic Pattern Alignment Learning for Pretraining Lightweight Human-Centric Vision Models
by: Wang, Xuanhan, et al.
Published: (2025)
by: Wang, Xuanhan, et al.
Published: (2025)
Harvest Video Foundation Models via Efficient Post-Pretraining
by: Li, Yizhuo, et al.
Published: (2023)
by: Li, Yizhuo, et al.
Published: (2023)
GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology
by: Kapse, Saarthak, et al.
Published: (2025)
by: Kapse, Saarthak, et al.
Published: (2025)
Enhancing Fine-Grained Vision-Language Pretraining with Negative Augmented Samples
by: Wang, Yeyuan, et al.
Published: (2024)
by: Wang, Yeyuan, et al.
Published: (2024)
Separators in Enhancing Autoregressive Pretraining for Vision Mamba
by: Liu, Hanpeng, et al.
Published: (2026)
by: Liu, Hanpeng, et al.
Published: (2026)
Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models
by: Wang, Zhongqi, et al.
Published: (2025)
by: Wang, Zhongqi, et al.
Published: (2025)
Sharingan: Extract User Action Sequence from Desktop Recordings
by: Chen, Yanting, et al.
Published: (2024)
by: Chen, Yanting, et al.
Published: (2024)
Similar Items
-
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
by: Gao, Difei, et al.
Published: (2023) -
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
by: Choi, Suhwan, et al.
Published: (2025) -
Self-Supervised Multi-View Representation Learning using Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025) -
Facial Emotion Learning with Text-Guided Multiview Fusion via Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025) -
SynthSet: Generative Diffusion Model for Semantic Segmentation in Precision Agriculture
by: Heschl, Andrew, et al.
Published: (2024)