Saved in:
| Main Authors: | Nguyen, Thao, Mo, Sicheng, Singh, Krishna Kumar, Wang, Yilin, Shi, Jing, Kolkin, Nicholas, Shechtman, Eli, Lee, Yong Jae, Li, Yuheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.07833 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration
by: Mo, Sicheng, et al.
Published: (2025)
by: Mo, Sicheng, et al.
Published: (2025)
YoChameleon: Personalized Vision and Language Generation
by: Nguyen, Thao, et al.
Published: (2025)
by: Nguyen, Thao, et al.
Published: (2025)
X-Fusion: Introducing New Modality to Frozen Large Language Models
by: Mo, Sicheng, et al.
Published: (2025)
by: Mo, Sicheng, et al.
Published: (2025)
TurboEdit: Instant text-based image editing
by: Wu, Zongze, et al.
Published: (2024)
by: Wu, Zongze, et al.
Published: (2024)
Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
by: Li, Yuheng, et al.
Published: (2024)
by: Li, Yuheng, et al.
Published: (2024)
SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
by: Gandikota, Rohit, et al.
Published: (2025)
by: Gandikota, Rohit, et al.
Published: (2025)
ID-Sim: An Identity-Focused Similarity Metric
by: Chae, Julia, et al.
Published: (2026)
by: Chae, Julia, et al.
Published: (2026)
Edit One for All: Interactive Batch Image Editing
by: Nguyen, Thao, et al.
Published: (2024)
by: Nguyen, Thao, et al.
Published: (2024)
Improved Baselines with Visual Instruction Tuning
by: Liu, Haotian, et al.
Published: (2023)
by: Liu, Haotian, et al.
Published: (2023)
Generative Models: What Do They Know? Do They Know Things? Let's Find Out!
by: Du, Xiaodan, et al.
Published: (2023)
by: Du, Xiaodan, et al.
Published: (2023)
Yo'LLaVA: Your Personalized Language and Vision Assistant
by: Nguyen, Thao, et al.
Published: (2024)
by: Nguyen, Thao, et al.
Published: (2024)
Improved Baselines with Representation Autoencoders
by: Singh, Jaskirat, et al.
Published: (2026)
by: Singh, Jaskirat, et al.
Published: (2026)
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
by: Huang, Xun, et al.
Published: (2025)
by: Huang, Xun, et al.
Published: (2025)
From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing
by: Rajan, Anirudh Sundara, et al.
Published: (2026)
by: Rajan, Anirudh Sundara, et al.
Published: (2026)
Personal Visual Memory from Explicit and Implicit Evidence
by: Nguyen, Viet, et al.
Published: (2026)
by: Nguyen, Viet, et al.
Published: (2026)
What matters for Representation Alignment: Global Information or Spatial Structure?
by: Singh, Jaskirat, et al.
Published: (2025)
by: Singh, Jaskirat, et al.
Published: (2025)
Stepwise Credit Assignment for GRPO on Flow-Matching Models
by: Savani, Yash, et al.
Published: (2026)
by: Savani, Yash, et al.
Published: (2026)
Learning an Image Editing Model without Image Editing Pairs
by: Kumari, Nupur, et al.
Published: (2025)
by: Kumari, Nupur, et al.
Published: (2025)
Causality in Video Diffusers is Separable from Denoising
by: Bai, Xingjian, et al.
Published: (2026)
by: Bai, Xingjian, et al.
Published: (2026)
How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding
by: Yu, Zhuoran, et al.
Published: (2025)
by: Yu, Zhuoran, et al.
Published: (2025)
Lazy Diffusion Transformer for Interactive Image Editing
by: Nitzan, Yotam, et al.
Published: (2024)
by: Nitzan, Yotam, et al.
Published: (2024)
Benchmarking and Analyzing Generative Data for Visual Recognition
by: Li, Bo, et al.
Published: (2023)
by: Li, Bo, et al.
Published: (2023)
Revealing the Underlying Patterns: Investigating Dataset Similarity, Performance, and Generalization
by: Achara, Akshit, et al.
Published: (2023)
by: Achara, Akshit, et al.
Published: (2023)
Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing
by: Yeh, Chun-Hsiao, et al.
Published: (2025)
by: Yeh, Chun-Hsiao, et al.
Published: (2025)
Semantic Similarity Score for Measuring Visual Similarity at Semantic Level
by: Fan, Senran, et al.
Published: (2024)
by: Fan, Senran, et al.
Published: (2024)
CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space
by: Lim, Sohwi, et al.
Published: (2026)
by: Lim, Sohwi, et al.
Published: (2026)
Towards Universal Fake Image Detectors that Generalize Across Generative Models
by: Ojha, Utkarsh, et al.
Published: (2023)
by: Ojha, Utkarsh, et al.
Published: (2023)
Texo: Formula Recognition within 20M Parameters
by: Mao, Sicheng
Published: (2026)
by: Mao, Sicheng
Published: (2026)
Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models
by: Chae, Hyunsik, et al.
Published: (2025)
by: Chae, Hyunsik, et al.
Published: (2025)
Visual Evaluative AI: A Hypothesis-Driven Tool with Concept-Based Explanations and Weight of Evidence
by: Le, Thao, et al.
Published: (2024)
by: Le, Thao, et al.
Published: (2024)
Jump Cut Smoothing for Talking Heads
by: Wang, Xiaojuan, et al.
Published: (2024)
by: Wang, Xiaojuan, et al.
Published: (2024)
AC-MAMBASEG: An adaptive convolution and Mamba-based architecture for enhanced skin lesion segmentation
by: Nguyen, Viet-Thanh, et al.
Published: (2024)
by: Nguyen, Viet-Thanh, et al.
Published: (2024)
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
by: Cai, Mu, et al.
Published: (2023)
by: Cai, Mu, et al.
Published: (2023)
End-to-End Training for Unified Tokenization and Latent Denoising
by: Duggal, Shivam, et al.
Published: (2026)
by: Duggal, Shivam, et al.
Published: (2026)
Simple Unsupervised Knowledge Distillation With Space Similarity
by: Singh, Aditya, et al.
Published: (2024)
by: Singh, Aditya, et al.
Published: (2024)
Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection
by: Jung, Min Jae, et al.
Published: (2023)
by: Jung, Min Jae, et al.
Published: (2023)
Self-Evaluation Unlocks Any-Step Text-to-Image Generation
by: Yu, Xin, et al.
Published: (2025)
by: Yu, Xin, et al.
Published: (2025)
Bridge then Begin Anew: Generating Target-relevant Intermediate Model for Source-free Visual Emotion Adaptation
by: Zhu, Jiankun, et al.
Published: (2024)
by: Zhu, Jiankun, et al.
Published: (2024)
LACO: Adaptive Latent Communication for Collaborative Driving
by: Chen, Tianhao, et al.
Published: (2026)
by: Chen, Tianhao, et al.
Published: (2026)
Talk in Pieces, See in Whole: Disentangling and Hierarchical Aggregating Representations for Language-based Object Detection
by: An, Sojung, et al.
Published: (2025)
by: An, Sojung, et al.
Published: (2025)
Similar Items
-
Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration
by: Mo, Sicheng, et al.
Published: (2025) -
YoChameleon: Personalized Vision and Language Generation
by: Nguyen, Thao, et al.
Published: (2025) -
X-Fusion: Introducing New Modality to Frozen Large Language Models
by: Mo, Sicheng, et al.
Published: (2025) -
TurboEdit: Instant text-based image editing
by: Wu, Zongze, et al.
Published: (2024) -
Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
by: Li, Yuheng, et al.
Published: (2024)