:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Bowen, Yang, Cheng, Liu, Xuanhui
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2407.15066
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Arbitrary-Scale Spacecraft Image Super-Resolution via Salient Region-Guidance
by: Yang, Jingfan, et al.
Published: (2025)

GenEraser: Generalizable Video Object Removal via Balanced Text-Mask Guidance and Decoupled Locator-Preserver
by: Chen, Yuqing, et al.
Published: (2026)

UniMapGen: A Generative Framework for Large-Scale Map Construction from Multi-modal Data
by: Yuan, Yujian, et al.
Published: (2025)

GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning
by: Jiang, Kaixun, et al.
Published: (2026)

VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
by: Tan, Zhiyu, et al.
Published: (2024)

Controllable Generation of Large-Scale 3D Urban Layouts with Semantic and Structural Guidance
by: Niu, Mengyuan, et al.
Published: (2025)

EliGen: Entity-Level Controlled Image Generation with Regional Attention
by: Zhang, Hong, et al.
Published: (2025)

ExpertGen: Training-Free Expert Guidance for Controllable Text-to-Face Generation
by: Shi, Liang, et al.
Published: (2025)

Scaling Backwards: Minimal Synthetic Pre-training?
by: Nakamura, Ryo, et al.
Published: (2024)

Layout Control and Semantic Guidance with Attention Loss Backward for T2I Diffusion Model
by: Li, Guandong
Published: (2024)

UniFlowRestore: A General Video Restoration Framework via Flow Matching and Prompt Guidance
by: Sun, Shuning, et al.
Published: (2025)

Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance
by: Yang, Haijie, et al.
Published: (2025)

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
by: Chu, Ruihang, et al.
Published: (2025)

SSG: Scaled Spatial Guidance for Multi-Scale Visual Autoregressive Generation
by: Shin, Youngwoo, et al.
Published: (2026)

GenAR: Next-Scale Autoregressive Generation for Spatial Gene Expression Prediction
by: Ouyang, Jiarui, et al.
Published: (2025)

ScaleMoGen: Autoregressive Next-Scale Prediction for Human Motion Generation
by: Hwang, Inwoo, et al.
Published: (2026)

$E^{3}$Gen: Efficient, Expressive and Editable Avatars Generation
by: Zhang, Weitian, et al.
Published: (2024)

SynerMedGen: Synergizing Medical Multimodal Understanding with Generation via Task Alignment
by: Zhao, Weiren, et al.
Published: (2026)

RetiGen: A Framework for Generalized Retinal Diagnosis Using Multi-View Fundus Images
by: Chen, Ze, et al.
Published: (2024)

GenMask: Adapting DiT for Segmentation via Direct Mask Generation
by: Yang, Yuhuan, et al.
Published: (2026)

StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation
by: Zhai, Shangjin, et al.
Published: (2025)

Diffusion-based Aesthetic QR Code Generation via Scanning-Robust Perceptual Guidance
by: Liao, Jia-Wei, et al.
Published: (2024)

Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment
by: Chen, Yang, et al.
Published: (2025)

SpikeGen: Decoupled "Rods and Cones" Visual Representation Processing with Latent Generative Framework
by: Dai, Gaole, et al.
Published: (2025)

ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework
by: Chen, Guanzhou, et al.
Published: (2026)

LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
by: Hao, Bowen, et al.
Published: (2025)

Residual Decoding: Mitigating Hallucinations in Large Vision-Language Models via History-Aware Residual Guidance
by: Chen, Xinrong, et al.
Published: (2026)

MoGen: A Unified Collaborative Framework for Controllable Multi-Object Image Generation
by: Li, Yanfeng, et al.
Published: (2026)

SQuadGen: Generating Simple Quad Layouts via Chart Distance Fields
by: Kong, Youkang, et al.
Published: (2026)

SkyLink: A Large Vision-Language Model Driven Re-ranking Framework for Cross-View UAV geolocalization
by: Liu, Bowen, et al.
Published: (2026)

A Forward and Backward Compatible Framework for Few-shot Class-incremental Pill Recognition
by: Zhang, Jinghua, et al.
Published: (2023)

DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
by: Chen, Haoxing, et al.
Published: (2024)

Conditional Text-to-Image Generation with Reference Guidance
by: Kim, Taewook, et al.
Published: (2024)

GenKOL: Modular Generative AI Framework For Scalable Virtual KOL Generation
by: To, Tan-Hiep, et al.
Published: (2025)

ORID: Organ-Regional Information Driven Framework for Radiology Report Generation
by: Gu, Tiancheng, et al.
Published: (2024)

PromptLNet: Region-Adaptive Aesthetic Enhancement via Prompt Guidance in Low-Light Enhancement Net
by: Yin, Jun, et al.
Published: (2025)

Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
by: Wei, Yujie, et al.
Published: (2025)

Backward-Compatible Aligned Representations via an Orthogonal Transformation Layer
by: Ricci, Simone, et al.
Published: (2024)

GenMed: A Pairwise Generative Reformulation of Medical Diagnostic Tasks
by: Zhang, Hantao, et al.
Published: (2026)

Classifier-free Guidance with Adaptive Scaling
by: Malarz, Dawid, et al.
Published: (2025)