:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Taraday, Mitchell Keren, Wagner, Shahaf, Baskin, Chaim
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2510.06820
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Sequential Signal Mixing Aggregation for Message Passing Graph Neural Networks
by: Taraday, Mitchell Keren, et al.
Published: (2024)

Leveraging Latents for Efficient Thermography Classification and Segmentation
by: Shor, Tamir, et al.
Published: (2024)

Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency
by: Dikter, Maor, et al.
Published: (2024)

Sparse patches adversarial attacks via extrapolating point-wise information
by: Nemcovsky, Yaniv, et al.
Published: (2024)

Semi-Supervised Semantic Segmentation via Marginal Contextual Information
by: Kimhi, Moshe, et al.
Published: (2023)

Noisy Annotations in Semantic Segmentation
by: Kimhi, Moshe, et al.
Published: (2024)

T1-PILOT: Optimized Trajectories for T1 Mapping Acceleration
by: Shor, Tamir, et al.
Published: (2025)

Dynamic Scene Understanding from Vision-Language Representations
by: Pruss, Shahaf, et al.
Published: (2025)

CARES: Context-Aware Resolution Selector for VLMs
by: Kimhi, Moshe, et al.
Published: (2025)

Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders
by: Jiang, Yitong, et al.
Published: (2026)

Multimodal Autoregressive Pre-training of Large Vision Encoders
by: Fini, Enrico, et al.
Published: (2024)

Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models
by: Panos, Aristeidis, et al.
Published: (2024)

Efficient Test-Time Scaling for Small Vision-Language Models
by: Kaya, Mehmet Onurcan, et al.
Published: (2025)

Effectiveness Assessment of Recent Large Vision-Language Models
by: Jiang, Yao, et al.
Published: (2024)

Localizing Memorization in SSL Vision Encoders
by: Wang, Wenhao, et al.
Published: (2024)

Cross-Instance Gaussian Splatting Registration via Geometry-Aware Feature-Guided Alignment
by: Amoyal, Roy, et al.
Published: (2026)

Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference
by: Miranda, Imanol, et al.
Published: (2026)

Renaissance: Investigating the Pretraining of Vision-Language Encoders
by: Fields, Clayton, et al.
Published: (2024)

Activation Quantization of Vision Encoders Needs Prefixing Registers
by: Kim, Seunghyeon, et al.
Published: (2025)

BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
by: Xu, Xiao, et al.
Published: (2022)

Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization
by: Luo, Richard, et al.
Published: (2024)

Explaining Similarity in Vision-Language Encoders with Weighted Banzhaf Interactions
by: Baniecki, Hubert, et al.
Published: (2025)

Do Vision and Language Encoders Represent the World Similarly?
by: Maniparambil, Mayug, et al.
Published: (2024)

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders
by: Kuo, Shang-Jui Ray, et al.
Published: (2026)

UniFusion: Vision-Language Model as Unified Encoder in Image Generation
by: Li, Kevin, et al.
Published: (2025)

CAPA: Contribution-Aware Pruning and FFN Approximation for Efficient Large Vision-Language Models
by: Jha, Samyak, et al.
Published: (2026)

$\mathbf{R}^3$: Reconstruction, Raw, and Rain: Deraining Directly in the Bayer Domain
by: Rothschild, Nate, et al.
Published: (2025)

Single Image Test-Time Adaptation for Segmentation
by: Janouskova, Klara, et al.
Published: (2023)

TEAM PILOT -- Learned Feasible Extendable Set of Dynamic MRI Acquisition Trajectories
by: Shor, Tamir, et al.
Published: (2024)

Image-Specific Adaptation of Transformer Encoders for Compute-Efficient Segmentation
by: Yao, Manyi, et al.
Published: (2024)

STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference
by: Guo, Yichen, et al.
Published: (2025)

ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)

Class-Discriminative Attention Maps for Vision Transformers
by: Brocki, Lennart, et al.
Published: (2023)

Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
by: Luo, Hao, et al.
Published: (2025)

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
by: Ristea, Nicolae-Catalin, et al.
Published: (2023)

Attention Guided Alignment in Efficient Vision-Language Models
by: Mahajan, Shweta, et al.
Published: (2025)

Towards Efficient Large Vision-Language Models: A Comprehensive Survey on Inference Strategies
by: Pathak, Surendra, et al.
Published: (2026)

Improved Alignment of Modalities in Large Vision Language Models
by: Jangra, Kartik, et al.
Published: (2025)

Detecting and Preventing Hallucinations in Large Vision Language Models
by: Gunjal, Anisha, et al.
Published: (2023)

LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References
by: Jiang, Shuguo, et al.
Published: (2024)