Saved in:
| Main Authors: | Taraday, Mitchell Keren, Wagner, Shahaf, Baskin, Chaim |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.06820 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Sequential Signal Mixing Aggregation for Message Passing Graph Neural Networks
by: Taraday, Mitchell Keren, et al.
Published: (2024)
by: Taraday, Mitchell Keren, et al.
Published: (2024)
Leveraging Latents for Efficient Thermography Classification and Segmentation
by: Shor, Tamir, et al.
Published: (2024)
by: Shor, Tamir, et al.
Published: (2024)
Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency
by: Dikter, Maor, et al.
Published: (2024)
by: Dikter, Maor, et al.
Published: (2024)
Sparse patches adversarial attacks via extrapolating point-wise information
by: Nemcovsky, Yaniv, et al.
Published: (2024)
by: Nemcovsky, Yaniv, et al.
Published: (2024)
Semi-Supervised Semantic Segmentation via Marginal Contextual Information
by: Kimhi, Moshe, et al.
Published: (2023)
by: Kimhi, Moshe, et al.
Published: (2023)
Noisy Annotations in Semantic Segmentation
by: Kimhi, Moshe, et al.
Published: (2024)
by: Kimhi, Moshe, et al.
Published: (2024)
T1-PILOT: Optimized Trajectories for T1 Mapping Acceleration
by: Shor, Tamir, et al.
Published: (2025)
by: Shor, Tamir, et al.
Published: (2025)
Dynamic Scene Understanding from Vision-Language Representations
by: Pruss, Shahaf, et al.
Published: (2025)
by: Pruss, Shahaf, et al.
Published: (2025)
CARES: Context-Aware Resolution Selector for VLMs
by: Kimhi, Moshe, et al.
Published: (2025)
by: Kimhi, Moshe, et al.
Published: (2025)
Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders
by: Jiang, Yitong, et al.
Published: (2026)
by: Jiang, Yitong, et al.
Published: (2026)
Multimodal Autoregressive Pre-training of Large Vision Encoders
by: Fini, Enrico, et al.
Published: (2024)
by: Fini, Enrico, et al.
Published: (2024)
Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models
by: Panos, Aristeidis, et al.
Published: (2024)
by: Panos, Aristeidis, et al.
Published: (2024)
Efficient Test-Time Scaling for Small Vision-Language Models
by: Kaya, Mehmet Onurcan, et al.
Published: (2025)
by: Kaya, Mehmet Onurcan, et al.
Published: (2025)
Effectiveness Assessment of Recent Large Vision-Language Models
by: Jiang, Yao, et al.
Published: (2024)
by: Jiang, Yao, et al.
Published: (2024)
Localizing Memorization in SSL Vision Encoders
by: Wang, Wenhao, et al.
Published: (2024)
by: Wang, Wenhao, et al.
Published: (2024)
Cross-Instance Gaussian Splatting Registration via Geometry-Aware Feature-Guided Alignment
by: Amoyal, Roy, et al.
Published: (2026)
by: Amoyal, Roy, et al.
Published: (2026)
Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference
by: Miranda, Imanol, et al.
Published: (2026)
by: Miranda, Imanol, et al.
Published: (2026)
Renaissance: Investigating the Pretraining of Vision-Language Encoders
by: Fields, Clayton, et al.
Published: (2024)
by: Fields, Clayton, et al.
Published: (2024)
Activation Quantization of Vision Encoders Needs Prefixing Registers
by: Kim, Seunghyeon, et al.
Published: (2025)
by: Kim, Seunghyeon, et al.
Published: (2025)
BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
by: Xu, Xiao, et al.
Published: (2022)
by: Xu, Xiao, et al.
Published: (2022)
Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization
by: Luo, Richard, et al.
Published: (2024)
by: Luo, Richard, et al.
Published: (2024)
Explaining Similarity in Vision-Language Encoders with Weighted Banzhaf Interactions
by: Baniecki, Hubert, et al.
Published: (2025)
by: Baniecki, Hubert, et al.
Published: (2025)
Do Vision and Language Encoders Represent the World Similarly?
by: Maniparambil, Mayug, et al.
Published: (2024)
by: Maniparambil, Mayug, et al.
Published: (2024)
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders
by: Kuo, Shang-Jui Ray, et al.
Published: (2026)
by: Kuo, Shang-Jui Ray, et al.
Published: (2026)
UniFusion: Vision-Language Model as Unified Encoder in Image Generation
by: Li, Kevin, et al.
Published: (2025)
by: Li, Kevin, et al.
Published: (2025)
CAPA: Contribution-Aware Pruning and FFN Approximation for Efficient Large Vision-Language Models
by: Jha, Samyak, et al.
Published: (2026)
by: Jha, Samyak, et al.
Published: (2026)
$\mathbf{R}^3$: Reconstruction, Raw, and Rain: Deraining Directly in the Bayer Domain
by: Rothschild, Nate, et al.
Published: (2025)
by: Rothschild, Nate, et al.
Published: (2025)
Single Image Test-Time Adaptation for Segmentation
by: Janouskova, Klara, et al.
Published: (2023)
by: Janouskova, Klara, et al.
Published: (2023)
TEAM PILOT -- Learned Feasible Extendable Set of Dynamic MRI Acquisition Trajectories
by: Shor, Tamir, et al.
Published: (2024)
by: Shor, Tamir, et al.
Published: (2024)
Image-Specific Adaptation of Transformer Encoders for Compute-Efficient Segmentation
by: Yao, Manyi, et al.
Published: (2024)
by: Yao, Manyi, et al.
Published: (2024)
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference
by: Guo, Yichen, et al.
Published: (2025)
by: Guo, Yichen, et al.
Published: (2025)
ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
Class-Discriminative Attention Maps for Vision Transformers
by: Brocki, Lennart, et al.
Published: (2023)
by: Brocki, Lennart, et al.
Published: (2023)
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
by: Luo, Hao, et al.
Published: (2025)
by: Luo, Hao, et al.
Published: (2025)
Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
by: Ristea, Nicolae-Catalin, et al.
Published: (2023)
by: Ristea, Nicolae-Catalin, et al.
Published: (2023)
Attention Guided Alignment in Efficient Vision-Language Models
by: Mahajan, Shweta, et al.
Published: (2025)
by: Mahajan, Shweta, et al.
Published: (2025)
Towards Efficient Large Vision-Language Models: A Comprehensive Survey on Inference Strategies
by: Pathak, Surendra, et al.
Published: (2026)
by: Pathak, Surendra, et al.
Published: (2026)
Improved Alignment of Modalities in Large Vision Language Models
by: Jangra, Kartik, et al.
Published: (2025)
by: Jangra, Kartik, et al.
Published: (2025)
Detecting and Preventing Hallucinations in Large Vision Language Models
by: Gunjal, Anisha, et al.
Published: (2023)
by: Gunjal, Anisha, et al.
Published: (2023)
LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References
by: Jiang, Shuguo, et al.
Published: (2024)
by: Jiang, Shuguo, et al.
Published: (2024)
Similar Items
-
Sequential Signal Mixing Aggregation for Message Passing Graph Neural Networks
by: Taraday, Mitchell Keren, et al.
Published: (2024) -
Leveraging Latents for Efficient Thermography Classification and Segmentation
by: Shor, Tamir, et al.
Published: (2024) -
Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency
by: Dikter, Maor, et al.
Published: (2024) -
Sparse patches adversarial attacks via extrapolating point-wise information
by: Nemcovsky, Yaniv, et al.
Published: (2024) -
Semi-Supervised Semantic Segmentation via Marginal Contextual Information
by: Kimhi, Moshe, et al.
Published: (2023)