:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Iablochnikov, Viacheslav, Rogachev, Alexander
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2412.01725
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Learning to Generate Rigid Body Interactions with Video Diffusion Models
di: Romero, David, et al.
Pubblicazione: (2025)

On the robustness of multimodal language model towards distractions
di: Liu, Ming, et al.
Pubblicazione: (2025)

MIMO: A medical vision language model with visual referring multimodal input and pixel grounding multimodal output
di: Chen, Yanyuan, et al.
Pubblicazione: (2025)

Monitoring Horses in Stalls: From Object to Event Detection
di: Galimzianov, Dmitrii, et al.
Pubblicazione: (2025)

SalsaAgent: A multimodal embodied language model for interactive dance generation
di: Yazdian, Payam Jome, et al.
Pubblicazione: (2026)

Visual Language Models as Zero-Shot Deepfake Detectors
di: Pirogov, Viacheslav
Pubblicazione: (2025)

SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection
di: Chae, Joongwon, et al.
Pubblicazione: (2024)

In-context learning enables multimodal large language models to classify cancer pathology images
di: Ferber, Dyke, et al.
Pubblicazione: (2024)

SmolVLM: Redefining small and efficient multimodal models
di: Marafioti, Andrés, et al.
Pubblicazione: (2025)

Chain-of-Caption: Training-free improvement of multimodal large language model on referring expression comprehension
di: Pang, Yik Lung, et al.
Pubblicazione: (2026)

Text-to-Vector Conversion for Residential Plan Design
di: Bazhenov, Egor, et al.
Pubblicazione: (2026)

A multimodal vision foundation model for generalizable knee pathology
di: Yu, Kang, et al.
Pubblicazione: (2026)

GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
di: Wu, Yiqi, et al.
Pubblicazione: (2024)

Transformation trees -- documentation of multimodal image registration
di: Tomaka, Agnieszka Anna, et al.
Pubblicazione: (2025)

Assessing the alignment between infants' visual and linguistic experience using multimodal language models
di: Tan, Alvin Wei Ming, et al.
Pubblicazione: (2025)

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
di: Sepehri, Mohammad Shahab, et al.
Pubblicazione: (2024)

AI-powered multimodal modeling of personalized hemodynamics in aortic stenosis
di: Ozturk, Caglar, et al.
Pubblicazione: (2024)

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
di: Granite Vision Team, et al.
Pubblicazione: (2025)

Evaluating point-light biological motion in multimodal large language models
di: Kadambi, Akila, et al.
Pubblicazione: (2025)

Animalbooth: multimodal feature enhancement for animal subject personalization
di: Liu, Chen, et al.
Pubblicazione: (2025)

Do multimodal models imagine electric sheep?
di: Ramakrishnan, Santhosh Kumar, et al.
Pubblicazione: (2026)

MULTIAQUA: A multimodal maritime dataset and robust training strategies for multimodal semantic segmentation
di: Muhovič, Jon, et al.
Pubblicazione: (2025)

Visual concept ranking uncovers medical shortcuts used by large multimodal models
di: Janizek, Joseph D., et al.
Pubblicazione: (2026)

A benchmark multimodal oro-dental dataset for large vision-language models
di: Lv, Haoxin, et al.
Pubblicazione: (2025)

Bias-constrained multimodal intelligence for equitable and reliable clinical AI
di: Li, Cheng, et al.
Pubblicazione: (2026)

Automatic benchmarking of large multimodal models via iterative experiment programming
di: Conti, Alessandro, et al.
Pubblicazione: (2024)

MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation
di: Xu, Lijian, et al.
Pubblicazione: (2024)

Densification and forecasting of Sentinel-2 time series from multimodal SAR and Optical satellite data using deep generative models
di: Defonte, Véronique, et al.
Pubblicazione: (2026)

New multimodal similarity measure for image registration via modeling local functional dependence with linear combination of learned basis functions
di: Honkamaa, Joel, et al.
Pubblicazione: (2025)

Explaining latent representations of generative models with large multimodal models
di: Zhu, Mengdan, et al.
Pubblicazione: (2024)

Evaluating Deepfake Detectors in the Wild
di: Pirogov, Viacheslav, et al.
Pubblicazione: (2025)

Headset: Human emotion awareness under partial occlusions multimodal dataset
di: Lohesara, Fatemeh Ghorbani, et al.
Pubblicazione: (2024)

JUMP: A joint multimodal registration pipeline for neuroimaging with minimal preprocessing
di: Casamitjana, Adria, et al.
Pubblicazione: (2024)

From Image to Video, what do we need in multimodal LLMs?
di: Huang, Suyuan, et al.
Pubblicazione: (2024)

A multimodal gesture recognition dataset for desktop human-computer interaction
di: Wang, Qi, et al.
Pubblicazione: (2024)

What do vision-language models see in the context? Investigating multimodal in-context learning
di: Santos, Gabriel O. dos, et al.
Pubblicazione: (2025)

GlitchBench: Can large multimodal models detect video game glitches?
di: Taesiri, Mohammad Reza, et al.
Pubblicazione: (2023)

MAIRA-1: A specialised large multimodal model for radiology report generation
di: Hyland, Stephanie L., et al.
Pubblicazione: (2023)

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
di: Zhang, Sheng, et al.
Pubblicazione: (2023)

J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM
di: Yoshida, Takero, et al.
Pubblicazione: (2024)