:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Chou, Shih-Han, Chandhok, Shivam, Little, James J., Sigal, Leonid
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2410.04778
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Test-Time Consistency in Vision Language Models
di: Chou, Shih-Han, et al.
Pubblicazione: (2025)

Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities
di: Chandhok, Shivam, et al.
Pubblicazione: (2024)

Implicit and Explicit Commonsense for Multi-sentence Video Captioning
di: Chou, Shih-Han, et al.
Pubblicazione: (2023)

SceneGPT: A Language Model for 3D Scene Understanding
di: Chandhok, Shivam
Pubblicazione: (2024)

Do Vision-Language Foundational models show Robust Visual Perception?
di: Chandhok, Shivam, et al.
Pubblicazione: (2024)

The Power of One: A Single Example is All it Takes for Segmentation in VLMs
di: Hossain, Mir Rayat Imtiaz, et al.
Pubblicazione: (2025)

Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection
di: Chandhok, Shivam, et al.
Pubblicazione: (2025)

DSeq-JEPA: Discriminative Sequential Joint-Embedding Predictive Architecture
di: He, Xiangteng, et al.
Pubblicazione: (2025)

Framework-agnostic Semantically-aware Global Reasoning for Segmentation
di: Hossain, Mir Rayat Imtiaz, et al.
Pubblicazione: (2022)

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
di: Hossain, Mir Rayat Imtiaz, et al.
Pubblicazione: (2024)

Spotlight: Identifying and Localizing Video Generation Errors Using VLMs
di: Chinchure, Aditya, et al.
Pubblicazione: (2025)

Tinted Frames: Question Framing Blinds Vision-Language Models
di: Fan, Wan-Cyuan, et al.
Pubblicazione: (2026)

Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
di: Luo, Jiayun, et al.
Pubblicazione: (2023)

MMFactory: A Universal Solution Search Engine for Vision-Language Tasks
di: Fan, Wan-Cyuan, et al.
Pubblicazione: (2024)

Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection
di: Bhatt, Gaurav, et al.
Pubblicazione: (2024)

SCALE-VLP: Soft-Weighted Contrastive Volumetric Vision-Language Pre-training with Spatial-Knowledge Semantics
di: Mahdizadeh, Ailar, et al.
Pubblicazione: (2025)

Factorized Video Autoencoders for Efficient Generative Modelling
di: Suhail, Mohammed, et al.
Pubblicazione: (2024)

To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models
di: Luo, Jiayun, et al.
Pubblicazione: (2025)

Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)
di: Chou, Shih-Han, et al.
Pubblicazione: (2024)

Image Recognition with Vision and Language Embeddings of VLMs
di: Volkov, Illia, et al.
Pubblicazione: (2025)

Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI
di: Shakhadri, Syed Abdul Gaffar, et al.
Pubblicazione: (2025)

Joint Generative Modeling of Grounded Scene Graphs and Images via Diffusion Models
di: Xu, Bicheng, et al.
Pubblicazione: (2024)

3VL: Using Trees to Improve Vision-Language Models' Interpretability
di: Yellinek, Nir, et al.
Pubblicazione: (2023)

Glo-VLMs: Leveraging Vision-Language Models for Fine-Grained Diseased Glomerulus Classification
di: Guo, Zhenhao, et al.
Pubblicazione: (2025)

InvAD: Inversion-based Reconstruction-Free Anomaly Detection with Diffusion Models
di: Sakai, Shunsuke, et al.
Pubblicazione: (2025)

All in One: A Unified Synthetic Data Pipeline for Multimodal Video Understanding
di: Rahman, Tanzila, et al.
Pubblicazione: (2026)

What Makes VLMs Robust? Towards Reconciling Robustness and Accuracy in Vision-Language Models
di: Nie, Sen, et al.
Pubblicazione: (2026)

On Pre-training of Multimodal Language Models Customized for Chart Understanding
di: Fan, Wan-Cyuan, et al.
Pubblicazione: (2024)

3D-Consistent Image Inpainting with Diffusion Models
di: Antsfeld, Leonid, et al.
Pubblicazione: (2024)

PanSt3R: Multi-view Consistent Panoptic Segmentation
di: Zust, Lojze, et al.
Pubblicazione: (2025)

Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis
di: Li, Frank, et al.
Pubblicazione: (2025)

TAM-VT: Transformation-Aware Multi-scale Video Transformer for Segmentation and Tracking
di: Goyal, Raghav, et al.
Pubblicazione: (2023)

Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding
di: Sharma, Shivam, et al.
Pubblicazione: (2026)

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
di: Pan, Jiazhen, et al.
Pubblicazione: (2025)

MM-R1: Unleashing the Power of Unified Multimodal Large Language Models for Personalized Image Generation
di: Liang, Qian, et al.
Pubblicazione: (2025)

Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models
di: Zheng, Shunjie-Fabian, et al.
Pubblicazione: (2025)

On the Fairness, Diversity and Reliability of Text-to-Image Generative Models
di: Vice, Jordan, et al.
Pubblicazione: (2024)

Unveiling the Tapestry of Consistency in Large Vision-Language Models
di: Zhang, Yuan, et al.
Pubblicazione: (2024)

Consistency-guided Prompt Learning for Vision-Language Models
di: Roy, Shuvendu, et al.
Pubblicazione: (2023)

Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images
di: Medghalchi, Yasamin, et al.
Pubblicazione: (2024)