:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Shirvani, Rouzbeh
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Graphics
Online Access:	https://arxiv.org/abs/2603.07455
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models
by: S, Sridhar, et al.
Published: (2025)

Generative Powers of Ten
by: Wang, Xiaojuan, et al.
Published: (2023)

VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding
by: Kelly, Chris, et al.
Published: (2024)

SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation
by: Wang, Wenjia, et al.
Published: (2024)

Multi-LoRA Composition for Image Generation
by: Zhong, Ming, et al.
Published: (2024)

Towards Understanding Graphical Perception in Large Multimodal Models
by: Zhang, Kai, et al.
Published: (2025)

JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation
by: Song, Lin, et al.
Published: (2026)

DynamicGTR: Leveraging Graph Topology Representation Preferences to Boost VLM Capabilities on Graph QAs
by: Wei, Yanbin, et al.
Published: (2026)

Grounding Language in Multi-Perspective Referential Communication
by: Tang, Zineng, et al.
Published: (2024)

Unbounded: A Generative Infinite Game of Character Life Simulation
by: Li, Jialu, et al.
Published: (2024)

Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders
by: Bohacek, Matyas, et al.
Published: (2025)

A Survey on Quality Metrics for Text-to-Image Generation
by: Hartwig, Sebastian, et al.
Published: (2024)

DreamDrive: Generative 4D Scene Modeling from Street View Images
by: Mao, Jiageng, et al.
Published: (2024)

An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning
by: Jin, Chen, et al.
Published: (2023)

BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models
by: Purushwalkam, Senthil, et al.
Published: (2024)

GazeFusion: Saliency-Guided Image Generation
by: Zhang, Yunxiang, et al.
Published: (2024)

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
by: Liu, Yuan, et al.
Published: (2023)

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
by: Wang, Kuan-Chieh, et al.
Published: (2024)

Stylus: Automatic Adapter Selection for Diffusion Models
by: Luo, Michael, et al.
Published: (2024)

Object-level Visual Prompts for Compositional Image Generation
by: Parmar, Gaurav, et al.
Published: (2025)

Text-guided Controllable Mesh Refinement for Interactive 3D Modeling
by: Chen, Yun-Chun, et al.
Published: (2024)

Make It Count: Text-to-Image Generation with an Accurate Number of Objects
by: Binyamin, Lital, et al.
Published: (2024)

GraphicsDreamer: Image to 3D Generation with Physical Consistency
by: Chen, Pei, et al.
Published: (2024)

MAPWise: Evaluating Vision-Language Models for Advanced Map Queries
by: Mukhopadhyay, Srija, et al.
Published: (2024)

Vector Grimoire: Codebook-based Shape Generation under Raster Image Supervision
by: Feuerpfeil, Moritz, et al.
Published: (2024)

Annotated Hands for Generative Models
by: Yang, Yue, et al.
Published: (2024)

WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
by: Li, Zizhang, et al.
Published: (2025)

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model
by: Zuo, Qi, et al.
Published: (2024)

Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-training
by: Shen, Junxiao, et al.
Published: (2024)

ReLumix: Extending Image Relighting to Video via Video Diffusion Models
by: Wang, Lezhong, et al.
Published: (2025)

FabricDiffusion: High-Fidelity Texture Transfer for 3D Garments Generation from In-The-Wild Clothing Images
by: Zhang, Cheng, et al.
Published: (2024)

TEXGen: a Generative Diffusion Model for Mesh Textures
by: Yu, Xin, et al.
Published: (2024)

Transcending Dimensions using Generative AI: Real-Time 3D Model Generation in Augmented Reality
by: Behravan, Majid, et al.
Published: (2025)

Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models
by: Liu, Bingchen, et al.
Published: (2024)

StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
by: Guo, Ziyu, et al.
Published: (2025)

Agentic Design of Compositional Machines
by: Zhang, Wenqian, et al.
Published: (2025)

Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model
by: Zhong, Hongliang, et al.
Published: (2024)

EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation
by: Zhou, Wenyang, et al.
Published: (2023)

HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation
by: Wen, Yuxin, et al.
Published: (2025)

PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
by: Jiang, Liyao, et al.
Published: (2024)