:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Castrejon, Lluis, Mensink, Thomas, Zhou, Howard, Ferrari, Vittorio, Araujo, Andre, Uijlings, Jasper
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2404.05465
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HAVIR: HierArchical Vision to Image Reconstruction using CLIP-Guided Versatile Diffusion
by: Zhang, Shiyi, et al.
Published: (2025)

HAVIR: HierArchical Vision to Image Reconstruction using CLIP-Guided Versatile Diffusion
by: Zhang, Shiyi, et al.
Published: (2025)

VQA Training Sets are Self-play Environments for Generating Few-shot Pools
by: Misiunas, Tautvydas, et al.
Published: (2024)

MultiModal Action Conditioned Video Generation
by: Li, Yichen, et al.
Published: (2025)

MultiModal Fine-tuning with Synthetic Captions
by: Enomoto, Shohei, et al.
Published: (2026)

MMA-Diffusion: MultiModal Attack on Diffusion Models
by: Yang, Yijun, et al.
Published: (2023)

M3: 3D-Spatial MultiModal Memory
by: Zou, Xueyan, et al.
Published: (2025)

Re-M3Dr: Rebalanced MultiModal Mean Deviation Regression
by: Yin, Haojie, et al.
Published: (2026)

ControlEdit: A MultiModal Local Clothing Image Editing Method
by: Cheng, Di, et al.
Published: (2024)

CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification
by: Wang, Qijie, et al.
Published: (2024)

MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings
by: Li, Zijie, et al.
Published: (2026)

VoCap: Video Object Captioning and Segmentation from Any Prompt
by: Uijlings, Jasper, et al.
Published: (2025)

CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder
by: Ma, Lichen, et al.
Published: (2024)

ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology
by: Sastry, Srikumar, et al.
Published: (2025)

Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
by: Hao, Yunzhuo, et al.
Published: (2025)

MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning
by: Zheng, Xuhui, et al.
Published: (2025)

TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots
by: Liu, Tianyu, et al.
Published: (2025)

LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models
by: Qiu, Han, et al.
Published: (2024)

Omni-Customizer: End-to-End MultiModal Customization for Joint Audio-Video Generation
by: Chen, Yuheng, et al.
Published: (2026)

Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
by: Zeng, Zhen, et al.
Published: (2024)

M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System
by: Kong, Chenqi, et al.
Published: (2023)

M3R: Localized Rainfall Nowcasting with Meteorology-Informed MultiModal Attention
by: Panta, Sanjeev, et al.
Published: (2026)

MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild
by: Chumachenko, Kateryna, et al.
Published: (2024)

HAC: Parameter-Efficient Hyperbolic Adaptation of CLIP for Zero-Shot VQA
by: Dibitonto, Francesco, et al.
Published: (2026)

MM-MoralBench: A MultiModal Moral Evaluation Benchmark for Large Vision-Language Models
by: Yan, Bei, et al.
Published: (2024)

From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
by: Zou, Heqing, et al.
Published: (2024)

Unified Latents (UL): How to train your latents
by: Heek, Jonathan, et al.
Published: (2026)

M$^2$CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation
by: Liu, Ziyuan, et al.
Published: (2025)

MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval
by: Xu, Mingjun, et al.
Published: (2025)

UDON: Universal Dynamic Online distillatioN for generic image representations
by: Ypsilantis, Nikolaos-Antonios, et al.
Published: (2024)

Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
by: Hoogeboom, Emiel, et al.
Published: (2026)

Multistep Distillation of Diffusion Models via Moment Matching
by: Salimans, Tim, et al.
Published: (2024)

Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion
by: Hoogeboom, Emiel, et al.
Published: (2024)

Dual-Rate Diffusion: Accelerating diffusion models with an interleaved heavy-light network
by: Bartosh, Grigory, et al.
Published: (2026)

LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals
by: Karpur, Arjun, et al.
Published: (2023)

HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation
by: Gadi, Hari Krishna, et al.
Published: (2026)

HierEdit: Region-Aware Hierarchical Diffusion for Efficient High-Resolution Editing
by: Zhang, Yuyao, et al.
Published: (2026)

Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives
by: Peirone, Simone Alberto, et al.
Published: (2025)

Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
by: Cen, Zhi, et al.
Published: (2025)

GRAM: Global Reasoning for Multi-Page VQA
by: Blau, Tsachi, et al.
Published: (2024)