Saved in:
| Main Authors: | Zhang, Bowen, Yang, Cheng, Liu, Xuanhui |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.15066 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Arbitrary-Scale Spacecraft Image Super-Resolution via Salient Region-Guidance
by: Yang, Jingfan, et al.
Published: (2025)
by: Yang, Jingfan, et al.
Published: (2025)
GenEraser: Generalizable Video Object Removal via Balanced Text-Mask Guidance and Decoupled Locator-Preserver
by: Chen, Yuqing, et al.
Published: (2026)
by: Chen, Yuqing, et al.
Published: (2026)
UniMapGen: A Generative Framework for Large-Scale Map Construction from Multi-modal Data
by: Yuan, Yujian, et al.
Published: (2025)
by: Yuan, Yujian, et al.
Published: (2025)
GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning
by: Jiang, Kaixun, et al.
Published: (2026)
by: Jiang, Kaixun, et al.
Published: (2026)
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
by: Tan, Zhiyu, et al.
Published: (2024)
by: Tan, Zhiyu, et al.
Published: (2024)
Controllable Generation of Large-Scale 3D Urban Layouts with Semantic and Structural Guidance
by: Niu, Mengyuan, et al.
Published: (2025)
by: Niu, Mengyuan, et al.
Published: (2025)
EliGen: Entity-Level Controlled Image Generation with Regional Attention
by: Zhang, Hong, et al.
Published: (2025)
by: Zhang, Hong, et al.
Published: (2025)
ExpertGen: Training-Free Expert Guidance for Controllable Text-to-Face Generation
by: Shi, Liang, et al.
Published: (2025)
by: Shi, Liang, et al.
Published: (2025)
Scaling Backwards: Minimal Synthetic Pre-training?
by: Nakamura, Ryo, et al.
Published: (2024)
by: Nakamura, Ryo, et al.
Published: (2024)
Layout Control and Semantic Guidance with Attention Loss Backward for T2I Diffusion Model
by: Li, Guandong
Published: (2024)
by: Li, Guandong
Published: (2024)
UniFlowRestore: A General Video Restoration Framework via Flow Matching and Prompt Guidance
by: Sun, Shuning, et al.
Published: (2025)
by: Sun, Shuning, et al.
Published: (2025)
Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance
by: Yang, Haijie, et al.
Published: (2025)
by: Yang, Haijie, et al.
Published: (2025)
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
by: Chu, Ruihang, et al.
Published: (2025)
by: Chu, Ruihang, et al.
Published: (2025)
SSG: Scaled Spatial Guidance for Multi-Scale Visual Autoregressive Generation
by: Shin, Youngwoo, et al.
Published: (2026)
by: Shin, Youngwoo, et al.
Published: (2026)
GenAR: Next-Scale Autoregressive Generation for Spatial Gene Expression Prediction
by: Ouyang, Jiarui, et al.
Published: (2025)
by: Ouyang, Jiarui, et al.
Published: (2025)
ScaleMoGen: Autoregressive Next-Scale Prediction for Human Motion Generation
by: Hwang, Inwoo, et al.
Published: (2026)
by: Hwang, Inwoo, et al.
Published: (2026)
$E^{3}$Gen: Efficient, Expressive and Editable Avatars Generation
by: Zhang, Weitian, et al.
Published: (2024)
by: Zhang, Weitian, et al.
Published: (2024)
SynerMedGen: Synergizing Medical Multimodal Understanding with Generation via Task Alignment
by: Zhao, Weiren, et al.
Published: (2026)
by: Zhao, Weiren, et al.
Published: (2026)
RetiGen: A Framework for Generalized Retinal Diagnosis Using Multi-View Fundus Images
by: Chen, Ze, et al.
Published: (2024)
by: Chen, Ze, et al.
Published: (2024)
GenMask: Adapting DiT for Segmentation via Direct Mask Generation
by: Yang, Yuhuan, et al.
Published: (2026)
by: Yang, Yuhuan, et al.
Published: (2026)
StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation
by: Zhai, Shangjin, et al.
Published: (2025)
by: Zhai, Shangjin, et al.
Published: (2025)
Diffusion-based Aesthetic QR Code Generation via Scanning-Robust Perceptual Guidance
by: Liao, Jia-Wei, et al.
Published: (2024)
by: Liao, Jia-Wei, et al.
Published: (2024)
Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment
by: Chen, Yang, et al.
Published: (2025)
by: Chen, Yang, et al.
Published: (2025)
SpikeGen: Decoupled "Rods and Cones" Visual Representation Processing with Latent Generative Framework
by: Dai, Gaole, et al.
Published: (2025)
by: Dai, Gaole, et al.
Published: (2025)
ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework
by: Chen, Guanzhou, et al.
Published: (2026)
by: Chen, Guanzhou, et al.
Published: (2026)
LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
by: Hao, Bowen, et al.
Published: (2025)
by: Hao, Bowen, et al.
Published: (2025)
Residual Decoding: Mitigating Hallucinations in Large Vision-Language Models via History-Aware Residual Guidance
by: Chen, Xinrong, et al.
Published: (2026)
by: Chen, Xinrong, et al.
Published: (2026)
MoGen: A Unified Collaborative Framework for Controllable Multi-Object Image Generation
by: Li, Yanfeng, et al.
Published: (2026)
by: Li, Yanfeng, et al.
Published: (2026)
SQuadGen: Generating Simple Quad Layouts via Chart Distance Fields
by: Kong, Youkang, et al.
Published: (2026)
by: Kong, Youkang, et al.
Published: (2026)
SkyLink: A Large Vision-Language Model Driven Re-ranking Framework for Cross-View UAV geolocalization
by: Liu, Bowen, et al.
Published: (2026)
by: Liu, Bowen, et al.
Published: (2026)
A Forward and Backward Compatible Framework for Few-shot Class-incremental Pill Recognition
by: Zhang, Jinghua, et al.
Published: (2023)
by: Zhang, Jinghua, et al.
Published: (2023)
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
by: Chen, Haoxing, et al.
Published: (2024)
by: Chen, Haoxing, et al.
Published: (2024)
Conditional Text-to-Image Generation with Reference Guidance
by: Kim, Taewook, et al.
Published: (2024)
by: Kim, Taewook, et al.
Published: (2024)
GenKOL: Modular Generative AI Framework For Scalable Virtual KOL Generation
by: To, Tan-Hiep, et al.
Published: (2025)
by: To, Tan-Hiep, et al.
Published: (2025)
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation
by: Gu, Tiancheng, et al.
Published: (2024)
by: Gu, Tiancheng, et al.
Published: (2024)
PromptLNet: Region-Adaptive Aesthetic Enhancement via Prompt Guidance in Low-Light Enhancement Net
by: Yin, Jun, et al.
Published: (2025)
by: Yin, Jun, et al.
Published: (2025)
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
by: Wei, Yujie, et al.
Published: (2025)
by: Wei, Yujie, et al.
Published: (2025)
Backward-Compatible Aligned Representations via an Orthogonal Transformation Layer
by: Ricci, Simone, et al.
Published: (2024)
by: Ricci, Simone, et al.
Published: (2024)
GenMed: A Pairwise Generative Reformulation of Medical Diagnostic Tasks
by: Zhang, Hantao, et al.
Published: (2026)
by: Zhang, Hantao, et al.
Published: (2026)
Classifier-free Guidance with Adaptive Scaling
by: Malarz, Dawid, et al.
Published: (2025)
by: Malarz, Dawid, et al.
Published: (2025)
Similar Items
-
Towards Arbitrary-Scale Spacecraft Image Super-Resolution via Salient Region-Guidance
by: Yang, Jingfan, et al.
Published: (2025) -
GenEraser: Generalizable Video Object Removal via Balanced Text-Mask Guidance and Decoupled Locator-Preserver
by: Chen, Yuqing, et al.
Published: (2026) -
UniMapGen: A Generative Framework for Large-Scale Map Construction from Multi-modal Data
by: Yuan, Yujian, et al.
Published: (2025) -
GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning
by: Jiang, Kaixun, et al.
Published: (2026) -
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
by: Tan, Zhiyu, et al.
Published: (2024)