:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dang, Shengqi, He, Yi, Lei, Jiaying, Qian, Ziqing, Cao, Nan
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.09286
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model
by: Dang, Shengqi, et al.
Published: (2025)

CogMorph: Cognitive Morphing Attacks for Text-to-Image Models
by: Jing, Zonglei, et al.
Published: (2025)

DensiCrafter: Physically-Constrained Generation and Fabrication of Self-Supporting Hollow Structures
by: Dang, Shengqi, et al.
Published: (2025)

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
by: Zheng, Wendi, et al.
Published: (2024)

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
by: Lv, Jiaxi, et al.
Published: (2023)

Designed to Spread: A Generative Approach to Enhance Information Diffusion
by: Qian, Ziqing, et al.
Published: (2025)

DiffBlender: Composable and Versatile Multimodal Text-to-Image Diffusion Models
by: Kim, Sungnyun, et al.
Published: (2023)

DyCoRM: Dynamic Criterion-Aware Reward Modeling for Text-to-Image Generation
by: Qian, Jiaying, et al.
Published: (2026)

CogDoc: Towards Unified thinking in Documents
by: Xu, Qixin, et al.
Published: (2025)

CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition
by: Yang, Hongji, et al.
Published: (2026)

Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators
by: Yuan, Jianhao, et al.
Published: (2022)

CogStereo: Neural Stereo Matching with Implicit Spatial Cognition Embedding
by: Fang, Lihuang, et al.
Published: (2025)

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models
by: He, Guangzhao, et al.
Published: (2026)

Blendify -- Python rendering framework for Blender
by: Guzov, Vladimir, et al.
Published: (2024)

CogDriver: Integrating Cognitive Inertia for Temporally Coherent Planning in Autonomous Driving
by: Liu, Pei, et al.
Published: (2025)

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
by: Yang, Zhuoyi, et al.
Published: (2024)

CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification
by: Li, Wei, et al.
Published: (2025)

Cog2Gen3D: Sculpturing 3D Semantic-Geometric Cognition for 3D Generation
by: Wang, Haonan, et al.
Published: (2026)

BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
by: Chen, Jiacheng, et al.
Published: (2025)

EgoCogNav: Cognition-aware Human Egocentric Navigation
by: Qiu, Zhiwen, et al.
Published: (2025)

CogVLM2: Visual Language Models for Image and Video Understanding
by: Hong, Wenyi, et al.
Published: (2024)

Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation
by: Gao, Xiang, et al.
Published: (2024)

Why Settle for One? Text-to-ImageSet Generation and Evaluation
by: Jia, Chengyou, et al.
Published: (2025)

Towards Explainable Partial-AIGC Image Quality Assessment
by: Qian, Jiaying, et al.
Published: (2025)

CogPortrait: Fine-Grained Eye-Region Control in Portrait Animation via Hierarchical Agent Planning
by: Feng, He, et al.
Published: (2026)

Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation
by: Wang, Wenjing, et al.
Published: (2023)

CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs
by: Cao, Yihan, et al.
Published: (2024)

Cog3DMap: Multi-View Vision-Language Reasoning with 3D Cognitive Maps
by: Gwak, Chanyoung, et al.
Published: (2026)

ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
by: Jia, Chengyou, et al.
Published: (2024)

Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
by: Zhang, Xinyu, et al.
Published: (2025)

PhyCustom: Towards Realistic Physical Customization in Text-to-Image Generation
by: Wu, Fan, et al.
Published: (2025)

Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios
by: Tang, Mingwei, et al.
Published: (2025)

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
by: Li, Yongkang, et al.
Published: (2025)

MultiBooth: Towards Generating All Your Concepts in an Image from Text
by: Zhu, Chenyang, et al.
Published: (2024)

FreqBlender: Enhancing DeepFake Detection by Blending Frequency Knowledge
by: Li, Hanzhe, et al.
Published: (2024)

Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework
by: Xu, Shengqi, et al.
Published: (2024)

OmniFM: Toward Modality-Robust and Task-Agnostic Federated Learning for Heterogeneous Medical Imaging
by: Liu, Meilin, et al.
Published: (2026)

CogVLM: Visual Expert for Pretrained Language Models
by: Wang, Weihan, et al.
Published: (2023)

DTVI: Dual-Stage Textual and Visual Intervention for Safe Text-to-Image Generation
by: Tan, Binhong, et al.
Published: (2026)

Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model
by: Zhang, Hao, et al.
Published: (2024)