Saved in:
| Main Authors: | Choraria, Moulik, Wu, Xinbo, Basu, Sourya, Sekhar, Nitesh, Wu, Yue, Zhang, Xu, Singhal, Prateek, Varshney, Lav R. |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2311.07449 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DeepInsert: Early Layer Bypass for Efficient and Performant Multimodal Understanding
by: Choraria, Moulik, et al.
Published: (2025)
by: Choraria, Moulik, et al.
Published: (2025)
Skip-It? Theoretical Conditions for Layer Skipping in Vision-Language Models
by: Hartman, Max, et al.
Published: (2025)
by: Hartman, Max, et al.
Published: (2025)
Verifier Threshold: An Efficient Test-Time Scaling Approach for Image Generation
by: Sundaresha, Vignesh, et al.
Published: (2025)
by: Sundaresha, Vignesh, et al.
Published: (2025)
Watermarking Discrete Diffusion Language Models
by: Bagchi, Avi, et al.
Published: (2025)
by: Bagchi, Avi, et al.
Published: (2025)
Efficient Model-Agnostic Multi-Group Equivariant Networks
by: Baltaji, Razan, et al.
Published: (2023)
by: Baltaji, Razan, et al.
Published: (2023)
Context-Gated Associative Retrieval: From Theory to Transformers
by: Choraria, Moulik, et al.
Published: (2026)
by: Choraria, Moulik, et al.
Published: (2026)
Hiding in Plain Sight: Detectability-Aware Antidistillation of Reasoning Models
by: Hartman, Max, et al.
Published: (2026)
by: Hartman, Max, et al.
Published: (2026)
Understanding Sensor Vulnerabilities in Industrial XR Tracking
by: Saha, Sourya, et al.
Published: (2026)
by: Saha, Sourya, et al.
Published: (2026)
Learning from One and Only One Shot
by: Yu, Haizi, et al.
Published: (2022)
by: Yu, Haizi, et al.
Published: (2022)
Evaluating Cell Type Inference in Vision Language Models Under Varying Visual Context
by: Singhal, Samarth, et al.
Published: (2025)
by: Singhal, Samarth, et al.
Published: (2025)
Curing Semantic Drift: A Dynamic Approach to Grounding Generation in Large Vision-Language Models
by: Chen, Jiahe, et al.
Published: (2025)
by: Chen, Jiahe, et al.
Published: (2025)
Transformer-based Causal Language Models Perform Clustering
by: Wu, Xinbo, et al.
Published: (2024)
by: Wu, Xinbo, et al.
Published: (2024)
A Meta-Learning Perspective on Transformers for Causal Language Modeling
by: Wu, Xinbo, et al.
Published: (2023)
by: Wu, Xinbo, et al.
Published: (2023)
Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis
by: Verma, Prateek, et al.
Published: (2024)
by: Verma, Prateek, et al.
Published: (2024)
Semantics-aware Motion Retargeting with Vision-Language Models
by: Zhang, Haodong, et al.
Published: (2023)
by: Zhang, Haodong, et al.
Published: (2023)
Deep Reinforcement Learning-driven Edge Offloading for Latency-constrained XR pipelines
by: Saha, Sourya, et al.
Published: (2026)
by: Saha, Sourya, et al.
Published: (2026)
EVLM: An Efficient Vision-Language Model for Visual Understanding
by: Chen, Kaibing, et al.
Published: (2024)
by: Chen, Kaibing, et al.
Published: (2024)
RoiMAM: Region-of-Interest Medical Attention Model for Efficient Vision-Language Understanding
by: Yang, Jiayan, et al.
Published: (2026)
by: Yang, Jiayan, et al.
Published: (2026)
PatrolVision: Automated License Plate Recognition in the wild
by: Singhal, Anmol Singhal Navya
Published: (2025)
by: Singhal, Anmol Singhal Navya
Published: (2025)
SAFE-Pruner: Semantic Attention-Guided Future-Aware Token Pruning for Efficient Vision-Language-Action Manipulation
by: Ma, Shilin, et al.
Published: (2026)
by: Ma, Shilin, et al.
Published: (2026)
Shrinking the Teacher: An Adaptive Teaching Paradigm for Asymmetric EEG-Vision Alignment
by: Wu, Lukun, et al.
Published: (2025)
by: Wu, Lukun, et al.
Published: (2025)
Vision-Language Semantic Grounding for Multi-Domain Crop-Weed Segmentation
by: Hossain, Nazia, et al.
Published: (2026)
by: Hossain, Nazia, et al.
Published: (2026)
Language and Geometry Grounded Sparse Voxel Representations for Holistic Scene Understanding
by: Wu, Guile, et al.
Published: (2026)
by: Wu, Guile, et al.
Published: (2026)
PiercingEye: Dual-Space Video Violence Detection with Hyperbolic Vision-Language Guidance
by: Leng, Jiaxu, et al.
Published: (2025)
by: Leng, Jiaxu, et al.
Published: (2025)
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding
by: Zheng, Henry, et al.
Published: (2025)
by: Zheng, Henry, et al.
Published: (2025)
Can Large Vision-Language Models Correct Semantic Grounding Errors By Themselves?
by: Liao, Yuan-Hong, et al.
Published: (2024)
by: Liao, Yuan-Hong, et al.
Published: (2024)
PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery
by: Biswas, Shristi Das, et al.
Published: (2025)
by: Biswas, Shristi Das, et al.
Published: (2025)
Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models
by: Lu, Jiaying, et al.
Published: (2023)
by: Lu, Jiaying, et al.
Published: (2023)
EGM: Efficient Visual Grounding Language Models
by: Zhan, Guanqi, et al.
Published: (2026)
by: Zhan, Guanqi, et al.
Published: (2026)
Hierarchical Context Transformer for Multi-level Semantic Scene Understanding
by: Hao, Luoying, et al.
Published: (2025)
by: Hao, Luoying, et al.
Published: (2025)
DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation
by: Liu, Ting, et al.
Published: (2023)
by: Liu, Ting, et al.
Published: (2023)
Efficient Test-Time Prompt Tuning for Vision-Language Models
by: Zhu, Yuhan, et al.
Published: (2024)
by: Zhu, Yuhan, et al.
Published: (2024)
ExpAlign: Expectation-Guided Vision-Language Alignment for Open-Vocabulary Grounding
by: Hu, Junyi, et al.
Published: (2026)
by: Hu, Junyi, et al.
Published: (2026)
On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study
by: Van, Minh-Hao, et al.
Published: (2024)
by: Van, Minh-Hao, et al.
Published: (2024)
Towards Highly Transferable Vision-Language Attack via Semantic-Augmented Dynamic Contrastive Interaction
by: Li, Yuanbo, et al.
Published: (2026)
by: Li, Yuanbo, et al.
Published: (2026)
Boosting Temporal Sentence Grounding via Causal Inference
by: Tang, Kefan, et al.
Published: (2025)
by: Tang, Kefan, et al.
Published: (2025)
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
by: Zhou, Yue, et al.
Published: (2024)
by: Zhou, Yue, et al.
Published: (2024)
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model
by: Li, Guozhang, et al.
Published: (2023)
by: Li, Guozhang, et al.
Published: (2023)
A Test Statistic Estimation-based Approach for Establishing Self-interpretable CNN-based Binary Classifiers
by: Sengupta, Sourya, et al.
Published: (2023)
by: Sengupta, Sourya, et al.
Published: (2023)
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding
by: Clark, Christopher, et al.
Published: (2026)
by: Clark, Christopher, et al.
Published: (2026)
Similar Items
-
DeepInsert: Early Layer Bypass for Efficient and Performant Multimodal Understanding
by: Choraria, Moulik, et al.
Published: (2025) -
Skip-It? Theoretical Conditions for Layer Skipping in Vision-Language Models
by: Hartman, Max, et al.
Published: (2025) -
Verifier Threshold: An Efficient Test-Time Scaling Approach for Image Generation
by: Sundaresha, Vignesh, et al.
Published: (2025) -
Watermarking Discrete Diffusion Language Models
by: Bagchi, Avi, et al.
Published: (2025) -
Efficient Model-Agnostic Multi-Group Equivariant Networks
by: Baltaji, Razan, et al.
Published: (2023)