Saved in:
| Main Author: | Shirvani, Rouzbeh |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.07455 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models
by: S, Sridhar, et al.
Published: (2025)
by: S, Sridhar, et al.
Published: (2025)
Generative Powers of Ten
by: Wang, Xiaojuan, et al.
Published: (2023)
by: Wang, Xiaojuan, et al.
Published: (2023)
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding
by: Kelly, Chris, et al.
Published: (2024)
by: Kelly, Chris, et al.
Published: (2024)
SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation
by: Wang, Wenjia, et al.
Published: (2024)
by: Wang, Wenjia, et al.
Published: (2024)
Multi-LoRA Composition for Image Generation
by: Zhong, Ming, et al.
Published: (2024)
by: Zhong, Ming, et al.
Published: (2024)
Towards Understanding Graphical Perception in Large Multimodal Models
by: Zhang, Kai, et al.
Published: (2025)
by: Zhang, Kai, et al.
Published: (2025)
JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation
by: Song, Lin, et al.
Published: (2026)
by: Song, Lin, et al.
Published: (2026)
DynamicGTR: Leveraging Graph Topology Representation Preferences to Boost VLM Capabilities on Graph QAs
by: Wei, Yanbin, et al.
Published: (2026)
by: Wei, Yanbin, et al.
Published: (2026)
Grounding Language in Multi-Perspective Referential Communication
by: Tang, Zineng, et al.
Published: (2024)
by: Tang, Zineng, et al.
Published: (2024)
Unbounded: A Generative Infinite Game of Character Life Simulation
by: Li, Jialu, et al.
Published: (2024)
by: Li, Jialu, et al.
Published: (2024)
Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders
by: Bohacek, Matyas, et al.
Published: (2025)
by: Bohacek, Matyas, et al.
Published: (2025)
A Survey on Quality Metrics for Text-to-Image Generation
by: Hartwig, Sebastian, et al.
Published: (2024)
by: Hartwig, Sebastian, et al.
Published: (2024)
DreamDrive: Generative 4D Scene Modeling from Street View Images
by: Mao, Jiageng, et al.
Published: (2024)
by: Mao, Jiageng, et al.
Published: (2024)
An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning
by: Jin, Chen, et al.
Published: (2023)
by: Jin, Chen, et al.
Published: (2023)
BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models
by: Purushwalkam, Senthil, et al.
Published: (2024)
by: Purushwalkam, Senthil, et al.
Published: (2024)
GazeFusion: Saliency-Guided Image Generation
by: Zhang, Yunxiang, et al.
Published: (2024)
by: Zhang, Yunxiang, et al.
Published: (2024)
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
by: Liu, Yuan, et al.
Published: (2023)
by: Liu, Yuan, et al.
Published: (2023)
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
by: Wang, Kuan-Chieh, et al.
Published: (2024)
by: Wang, Kuan-Chieh, et al.
Published: (2024)
Stylus: Automatic Adapter Selection for Diffusion Models
by: Luo, Michael, et al.
Published: (2024)
by: Luo, Michael, et al.
Published: (2024)
Object-level Visual Prompts for Compositional Image Generation
by: Parmar, Gaurav, et al.
Published: (2025)
by: Parmar, Gaurav, et al.
Published: (2025)
Text-guided Controllable Mesh Refinement for Interactive 3D Modeling
by: Chen, Yun-Chun, et al.
Published: (2024)
by: Chen, Yun-Chun, et al.
Published: (2024)
Make It Count: Text-to-Image Generation with an Accurate Number of Objects
by: Binyamin, Lital, et al.
Published: (2024)
by: Binyamin, Lital, et al.
Published: (2024)
GraphicsDreamer: Image to 3D Generation with Physical Consistency
by: Chen, Pei, et al.
Published: (2024)
by: Chen, Pei, et al.
Published: (2024)
MAPWise: Evaluating Vision-Language Models for Advanced Map Queries
by: Mukhopadhyay, Srija, et al.
Published: (2024)
by: Mukhopadhyay, Srija, et al.
Published: (2024)
Vector Grimoire: Codebook-based Shape Generation under Raster Image Supervision
by: Feuerpfeil, Moritz, et al.
Published: (2024)
by: Feuerpfeil, Moritz, et al.
Published: (2024)
Annotated Hands for Generative Models
by: Yang, Yue, et al.
Published: (2024)
by: Yang, Yue, et al.
Published: (2024)
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
by: Li, Zizhang, et al.
Published: (2025)
by: Li, Zizhang, et al.
Published: (2025)
VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model
by: Zuo, Qi, et al.
Published: (2024)
by: Zuo, Qi, et al.
Published: (2024)
Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-training
by: Shen, Junxiao, et al.
Published: (2024)
by: Shen, Junxiao, et al.
Published: (2024)
ReLumix: Extending Image Relighting to Video via Video Diffusion Models
by: Wang, Lezhong, et al.
Published: (2025)
by: Wang, Lezhong, et al.
Published: (2025)
FabricDiffusion: High-Fidelity Texture Transfer for 3D Garments Generation from In-The-Wild Clothing Images
by: Zhang, Cheng, et al.
Published: (2024)
by: Zhang, Cheng, et al.
Published: (2024)
TEXGen: a Generative Diffusion Model for Mesh Textures
by: Yu, Xin, et al.
Published: (2024)
by: Yu, Xin, et al.
Published: (2024)
Transcending Dimensions using Generative AI: Real-Time 3D Model Generation in Augmented Reality
by: Behravan, Majid, et al.
Published: (2025)
by: Behravan, Majid, et al.
Published: (2025)
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models
by: Liu, Bingchen, et al.
Published: (2024)
by: Liu, Bingchen, et al.
Published: (2024)
StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
by: Guo, Ziyu, et al.
Published: (2025)
by: Guo, Ziyu, et al.
Published: (2025)
Agentic Design of Compositional Machines
by: Zhang, Wenqian, et al.
Published: (2025)
by: Zhang, Wenqian, et al.
Published: (2025)
Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model
by: Zhong, Hongliang, et al.
Published: (2024)
by: Zhong, Hongliang, et al.
Published: (2024)
EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation
by: Zhou, Wenyang, et al.
Published: (2023)
by: Zhou, Wenyang, et al.
Published: (2023)
HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation
by: Wen, Yuxin, et al.
Published: (2025)
by: Wen, Yuxin, et al.
Published: (2025)
PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
by: Jiang, Liyao, et al.
Published: (2024)
by: Jiang, Liyao, et al.
Published: (2024)
Similar Items
-
Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models
by: S, Sridhar, et al.
Published: (2025) -
Generative Powers of Ten
by: Wang, Xiaojuan, et al.
Published: (2023) -
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding
by: Kelly, Chris, et al.
Published: (2024) -
SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation
by: Wang, Wenjia, et al.
Published: (2024) -
Multi-LoRA Composition for Image Generation
by: Zhong, Ming, et al.
Published: (2024)