:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kong, Zhifeng, Chaudhuri, Kamalika
Format:	Preprint
Published:	2023
Subjects:	Machine Learning Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2305.11351
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Déjà Vu Memorization in Vision-Language Models
by: Jayaraman, Bargav, et al.
Published: (2024)

Measuring Déjà vu Memorization Efficiently
by: Kokhlikyan, Narine, et al.
Published: (2025)

Controlled Training Data Generation with Diffusion Models
by: Yeo, Teresa, et al.
Published: (2024)

ConCuR: Conciseness Makes State-of-the-Art Kernel Generation
by: Kong, Lingcheng, et al.
Published: (2025)

Differentially Private Representation Learning via Image Captioning
by: Sander, Tom, et al.
Published: (2024)

Learning Conditional Invariances through Non-Commutativity
by: Chaudhuri, Abhra, et al.
Published: (2024)

The Neglected Tails in Vision-Language Models
by: Parashar, Shubham, et al.
Published: (2024)

Data Alignment for Zero-Shot Concept Generation in Dermatology AI
by: Gadgil, Soham, et al.
Published: (2024)

DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning
by: Lebensold, Jonathan, et al.
Published: (2024)

Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension
by: Parolari, Luca, et al.
Published: (2024)

A Survey on Data Augmentation in Large Model Era
by: Zhou, Yue, et al.
Published: (2024)

Realistic Evaluation of Model Merging for Compositional Generalization
by: Tam, Derek, et al.
Published: (2024)

Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection
by: Park, Kwanyong, et al.
Published: (2024)

Unleashing the Potential of Model Bias for Generalized Category Discovery
by: An, Wenbin, et al.
Published: (2024)

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
by: Hsieh, Cheng-Yu, et al.
Published: (2025)

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
by: Cheng, Jiale, et al.
Published: (2025)

LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models
by: Zhu, Mengdan, et al.
Published: (2024)

R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?
by: Zhang, Jingyi, et al.
Published: (2026)

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
by: Li, Yanghao, et al.
Published: (2025)

MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
by: Ganesan, Mugilan, et al.
Published: (2025)

C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning
by: Chen, Xiuwei, et al.
Published: (2025)

A General Framework for Inference-time Scaling and Steering of Diffusion Models
by: Singhal, Raghav, et al.
Published: (2025)

Learning from Synthetic Data for Visual Grounding
by: He, Ruozhen, et al.
Published: (2024)

O3SLM: Open Weight, Open Data, and Open Vocabulary Sketch-Language Model
by: Gupta, Rishi, et al.
Published: (2025)

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
by: Udandarao, Vishaal, et al.
Published: (2024)

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
by: Deitke, Matt, et al.
Published: (2024)

MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks
by: Wu, Yiming, et al.
Published: (2024)

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search
by: Li, Kaican, et al.
Published: (2025)

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
by: Lei, Jiayi, et al.
Published: (2025)

D3G: Diverse Demographic Data Generation Increases Zero-Shot Image Classification Accuracy within Multimodal Models
by: Hickmon, Javon
Published: (2025)

Robustness of Structured Data Extraction from Perspectively Distorted Documents
by: Nakada, Hyakka, et al.
Published: (2025)

Stylus: Automatic Adapter Selection for Diffusion Models
by: Luo, Michael, et al.
Published: (2024)

Open-Source Multimodal Moxin Models with Moxin-VLM and Moxin-VLA
by: Zhao, Pu, et al.
Published: (2025)

GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
by: Jin, Haibo, et al.
Published: (2024)

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
by: Chen, Zhaorun, et al.
Published: (2024)

mDPO: Conditional Preference Optimization for Multimodal Large Language Models
by: Wang, Fei, et al.
Published: (2024)

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
by: Karamcheti, Siddharth, et al.
Published: (2024)

Composition-Grounded Data Synthesis for Visual Reasoning
by: Gu, Xinyi, et al.
Published: (2025)

Dual-Process Image Generation
by: Luo, Grace, et al.
Published: (2025)

Bidirectional Long-Range Parser for Sequential Data Understanding
by: Leotescu, George, et al.
Published: (2024)