Saved in:
| Main Authors: | Kong, Zhifeng, Chaudhuri, Kamalika |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2305.11351 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Déjà Vu Memorization in Vision-Language Models
by: Jayaraman, Bargav, et al.
Published: (2024)
by: Jayaraman, Bargav, et al.
Published: (2024)
Measuring Déjà vu Memorization Efficiently
by: Kokhlikyan, Narine, et al.
Published: (2025)
by: Kokhlikyan, Narine, et al.
Published: (2025)
Controlled Training Data Generation with Diffusion Models
by: Yeo, Teresa, et al.
Published: (2024)
by: Yeo, Teresa, et al.
Published: (2024)
ConCuR: Conciseness Makes State-of-the-Art Kernel Generation
by: Kong, Lingcheng, et al.
Published: (2025)
by: Kong, Lingcheng, et al.
Published: (2025)
Differentially Private Representation Learning via Image Captioning
by: Sander, Tom, et al.
Published: (2024)
by: Sander, Tom, et al.
Published: (2024)
Learning Conditional Invariances through Non-Commutativity
by: Chaudhuri, Abhra, et al.
Published: (2024)
by: Chaudhuri, Abhra, et al.
Published: (2024)
The Neglected Tails in Vision-Language Models
by: Parashar, Shubham, et al.
Published: (2024)
by: Parashar, Shubham, et al.
Published: (2024)
Data Alignment for Zero-Shot Concept Generation in Dermatology AI
by: Gadgil, Soham, et al.
Published: (2024)
by: Gadgil, Soham, et al.
Published: (2024)
DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning
by: Lebensold, Jonathan, et al.
Published: (2024)
by: Lebensold, Jonathan, et al.
Published: (2024)
Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension
by: Parolari, Luca, et al.
Published: (2024)
by: Parolari, Luca, et al.
Published: (2024)
A Survey on Data Augmentation in Large Model Era
by: Zhou, Yue, et al.
Published: (2024)
by: Zhou, Yue, et al.
Published: (2024)
Realistic Evaluation of Model Merging for Compositional Generalization
by: Tam, Derek, et al.
Published: (2024)
by: Tam, Derek, et al.
Published: (2024)
Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection
by: Park, Kwanyong, et al.
Published: (2024)
by: Park, Kwanyong, et al.
Published: (2024)
Unleashing the Potential of Model Bias for Generalized Category Discovery
by: An, Wenbin, et al.
Published: (2024)
by: An, Wenbin, et al.
Published: (2024)
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
by: Hsieh, Cheng-Yu, et al.
Published: (2025)
by: Hsieh, Cheng-Yu, et al.
Published: (2025)
VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
by: Cheng, Jiale, et al.
Published: (2025)
by: Cheng, Jiale, et al.
Published: (2025)
LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models
by: Zhu, Mengdan, et al.
Published: (2024)
by: Zhu, Mengdan, et al.
Published: (2024)
R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?
by: Zhang, Jingyi, et al.
Published: (2026)
by: Zhang, Jingyi, et al.
Published: (2026)
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
by: Li, Yanghao, et al.
Published: (2025)
by: Li, Yanghao, et al.
Published: (2025)
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
by: Ganesan, Mugilan, et al.
Published: (2025)
by: Ganesan, Mugilan, et al.
Published: (2025)
C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning
by: Chen, Xiuwei, et al.
Published: (2025)
by: Chen, Xiuwei, et al.
Published: (2025)
A General Framework for Inference-time Scaling and Steering of Diffusion Models
by: Singhal, Raghav, et al.
Published: (2025)
by: Singhal, Raghav, et al.
Published: (2025)
Learning from Synthetic Data for Visual Grounding
by: He, Ruozhen, et al.
Published: (2024)
by: He, Ruozhen, et al.
Published: (2024)
O3SLM: Open Weight, Open Data, and Open Vocabulary Sketch-Language Model
by: Gupta, Rishi, et al.
Published: (2025)
by: Gupta, Rishi, et al.
Published: (2025)
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
by: Udandarao, Vishaal, et al.
Published: (2024)
by: Udandarao, Vishaal, et al.
Published: (2024)
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
by: Deitke, Matt, et al.
Published: (2024)
by: Deitke, Matt, et al.
Published: (2024)
MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks
by: Wu, Yiming, et al.
Published: (2024)
by: Wu, Yiming, et al.
Published: (2024)
InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search
by: Li, Kaican, et al.
Published: (2025)
by: Li, Kaican, et al.
Published: (2025)
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
by: Lei, Jiayi, et al.
Published: (2025)
by: Lei, Jiayi, et al.
Published: (2025)
D3G: Diverse Demographic Data Generation Increases Zero-Shot Image Classification Accuracy within Multimodal Models
by: Hickmon, Javon
Published: (2025)
by: Hickmon, Javon
Published: (2025)
Robustness of Structured Data Extraction from Perspectively Distorted Documents
by: Nakada, Hyakka, et al.
Published: (2025)
by: Nakada, Hyakka, et al.
Published: (2025)
Stylus: Automatic Adapter Selection for Diffusion Models
by: Luo, Michael, et al.
Published: (2024)
by: Luo, Michael, et al.
Published: (2024)
Open-Source Multimodal Moxin Models with Moxin-VLM and Moxin-VLA
by: Zhao, Pu, et al.
Published: (2025)
by: Zhao, Pu, et al.
Published: (2025)
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
by: Jin, Haibo, et al.
Published: (2024)
by: Jin, Haibo, et al.
Published: (2024)
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
by: Chen, Zhaorun, et al.
Published: (2024)
by: Chen, Zhaorun, et al.
Published: (2024)
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
by: Wang, Fei, et al.
Published: (2024)
by: Wang, Fei, et al.
Published: (2024)
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
by: Karamcheti, Siddharth, et al.
Published: (2024)
by: Karamcheti, Siddharth, et al.
Published: (2024)
Composition-Grounded Data Synthesis for Visual Reasoning
by: Gu, Xinyi, et al.
Published: (2025)
by: Gu, Xinyi, et al.
Published: (2025)
Dual-Process Image Generation
by: Luo, Grace, et al.
Published: (2025)
by: Luo, Grace, et al.
Published: (2025)
Bidirectional Long-Range Parser for Sequential Data Understanding
by: Leotescu, George, et al.
Published: (2024)
by: Leotescu, George, et al.
Published: (2024)
Similar Items
-
Déjà Vu Memorization in Vision-Language Models
by: Jayaraman, Bargav, et al.
Published: (2024) -
Measuring Déjà vu Memorization Efficiently
by: Kokhlikyan, Narine, et al.
Published: (2025) -
Controlled Training Data Generation with Diffusion Models
by: Yeo, Teresa, et al.
Published: (2024) -
ConCuR: Conciseness Makes State-of-the-Art Kernel Generation
by: Kong, Lingcheng, et al.
Published: (2025) -
Differentially Private Representation Learning via Image Captioning
by: Sander, Tom, et al.
Published: (2024)