:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nguyen, Thao, Mo, Sicheng, Singh, Krishna Kumar, Wang, Yilin, Shi, Jing, Kolkin, Nicholas, Shechtman, Eli, Lee, Yong Jae, Li, Yuheng
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2512.07833
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration
by: Mo, Sicheng, et al.
Published: (2025)

YoChameleon: Personalized Vision and Language Generation
by: Nguyen, Thao, et al.
Published: (2025)

X-Fusion: Introducing New Modality to Frozen Large Language Models
by: Mo, Sicheng, et al.
Published: (2025)

TurboEdit: Instant text-based image editing
by: Wu, Zongze, et al.
Published: (2024)

Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
by: Li, Yuheng, et al.
Published: (2024)

SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
by: Gandikota, Rohit, et al.
Published: (2025)

ID-Sim: An Identity-Focused Similarity Metric
by: Chae, Julia, et al.
Published: (2026)

Edit One for All: Interactive Batch Image Editing
by: Nguyen, Thao, et al.
Published: (2024)

Improved Baselines with Visual Instruction Tuning
by: Liu, Haotian, et al.
Published: (2023)

Generative Models: What Do They Know? Do They Know Things? Let's Find Out!
by: Du, Xiaodan, et al.
Published: (2023)

Yo'LLaVA: Your Personalized Language and Vision Assistant
by: Nguyen, Thao, et al.
Published: (2024)

Improved Baselines with Representation Autoencoders
by: Singh, Jaskirat, et al.
Published: (2026)

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
by: Huang, Xun, et al.
Published: (2025)

From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing
by: Rajan, Anirudh Sundara, et al.
Published: (2026)

Personal Visual Memory from Explicit and Implicit Evidence
by: Nguyen, Viet, et al.
Published: (2026)

What matters for Representation Alignment: Global Information or Spatial Structure?
by: Singh, Jaskirat, et al.
Published: (2025)

Stepwise Credit Assignment for GRPO on Flow-Matching Models
by: Savani, Yash, et al.
Published: (2026)

Learning an Image Editing Model without Image Editing Pairs
by: Kumari, Nupur, et al.
Published: (2025)

Causality in Video Diffusers is Separable from Denoising
by: Bai, Xingjian, et al.
Published: (2026)

How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding
by: Yu, Zhuoran, et al.
Published: (2025)

Lazy Diffusion Transformer for Interactive Image Editing
by: Nitzan, Yotam, et al.
Published: (2024)

Benchmarking and Analyzing Generative Data for Visual Recognition
by: Li, Bo, et al.
Published: (2023)

Revealing the Underlying Patterns: Investigating Dataset Similarity, Performance, and Generalization
by: Achara, Akshit, et al.
Published: (2023)

Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing
by: Yeh, Chun-Hsiao, et al.
Published: (2025)

Semantic Similarity Score for Measuring Visual Similarity at Semantic Level
by: Fan, Senran, et al.
Published: (2024)

CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space
by: Lim, Sohwi, et al.
Published: (2026)

Towards Universal Fake Image Detectors that Generalize Across Generative Models
by: Ojha, Utkarsh, et al.
Published: (2023)

Texo: Formula Recognition within 20M Parameters
by: Mao, Sicheng
Published: (2026)

Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models
by: Chae, Hyunsik, et al.
Published: (2025)

Visual Evaluative AI: A Hypothesis-Driven Tool with Concept-Based Explanations and Weight of Evidence
by: Le, Thao, et al.
Published: (2024)

Jump Cut Smoothing for Talking Heads
by: Wang, Xiaojuan, et al.
Published: (2024)

AC-MAMBASEG: An adaptive convolution and Mamba-based architecture for enhanced skin lesion segmentation
by: Nguyen, Viet-Thanh, et al.
Published: (2024)

Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
by: Cai, Mu, et al.
Published: (2023)

End-to-End Training for Unified Tokenization and Latent Denoising
by: Duggal, Shivam, et al.
Published: (2026)

Simple Unsupervised Knowledge Distillation With Space Similarity
by: Singh, Aditya, et al.
Published: (2024)

Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection
by: Jung, Min Jae, et al.
Published: (2023)

Self-Evaluation Unlocks Any-Step Text-to-Image Generation
by: Yu, Xin, et al.
Published: (2025)

Bridge then Begin Anew: Generating Target-relevant Intermediate Model for Source-free Visual Emotion Adaptation
by: Zhu, Jiankun, et al.
Published: (2024)

LACO: Adaptive Latent Communication for Collaborative Driving
by: Chen, Tianhao, et al.
Published: (2026)

Talk in Pieces, See in Whole: Disentangling and Hierarchical Aggregating Representations for Language-based Object Detection
by: An, Sojung, et al.
Published: (2025)