Saved in:
| Main Authors: | Wang, Shuai, Gao, Ziteng, Zhu, Chenhui, Huang, Weilin, Wang, Limin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.23268 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DDT: Decoupled Diffusion Transformer
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
by: Zhu, Chenhui, et al.
Published: (2025)
by: Zhu, Chenhui, et al.
Published: (2025)
End-to-End Dense Video Grounding via Parallel Regression
by: Shi, Fengyuan, et al.
Published: (2021)
by: Shi, Fengyuan, et al.
Published: (2021)
PixIE: Prompted Pixel-Space Low-Light Image Enhancement
by: Lin, Ruirui, et al.
Published: (2026)
by: Lin, Ruirui, et al.
Published: (2026)
PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space Refinement
by: Zheng, Haitian, et al.
Published: (2025)
by: Zheng, Haitian, et al.
Published: (2025)
Learning Human Skill Generators at Key-Step Levels
by: Wu, Yilu, et al.
Published: (2025)
by: Wu, Yilu, et al.
Published: (2025)
PixOOD: Pixel-Level Out-of-Distribution Detection
by: Vojíř, Tomáš, et al.
Published: (2024)
by: Vojíř, Tomáš, et al.
Published: (2024)
STMixer: A One-Stage Sparse Action Detector
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
Pixelis: Reasoning in Pixels, from Seeing to Acting
by: Zhou, Yunpeng
Published: (2026)
by: Zhou, Yunpeng
Published: (2026)
Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment
by: Chen, Yang, et al.
Published: (2025)
by: Chen, Yang, et al.
Published: (2025)
D-AR: Diffusion via Autoregressive Models
by: Gao, Ziteng, et al.
Published: (2025)
by: Gao, Ziteng, et al.
Published: (2025)
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?
by: Siam, Mennatullah
Published: (2025)
by: Siam, Mennatullah
Published: (2025)
ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing
by: Jin, Ying, et al.
Published: (2024)
by: Jin, Ying, et al.
Published: (2024)
Pix2Gif: Motion-Guided Diffusion for GIF Generation
by: Kandala, Hitesh, et al.
Published: (2024)
by: Kandala, Hitesh, et al.
Published: (2024)
Forgedit: Text Guided Image Editing via Learning and Forgetting
by: Zhang, Shiwen, et al.
Published: (2023)
by: Zhang, Shiwen, et al.
Published: (2023)
Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
by: You, Zuyao, et al.
Published: (2025)
by: You, Zuyao, et al.
Published: (2025)
GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing
by: Ou, Ruizhe, et al.
Published: (2025)
by: Ou, Ruizhe, et al.
Published: (2025)
Scene Graph Generation via Conditional Random Fields
by: Cong, Weilin, et al.
Published: (2018)
by: Cong, Weilin, et al.
Published: (2018)
PixCLIP: Achieving Fine-grained Visual Language Understanding via Any-granularity Pixel-Text Alignment Learning
by: Xiao, Yicheng, et al.
Published: (2025)
by: Xiao, Yicheng, et al.
Published: (2025)
PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
by: Chen, Junsong, et al.
Published: (2023)
by: Chen, Junsong, et al.
Published: (2023)
MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking
by: Gao, Ruopeng, et al.
Published: (2023)
by: Gao, Ruopeng, et al.
Published: (2023)
UniField: A Unified Field-Aware MRI Enhancement Framework
by: Lin, Yiyang, et al.
Published: (2026)
by: Lin, Yiyang, et al.
Published: (2026)
Multimodal Crowd Counting with Pix2Pix GANs
by: Khan, Muhammad Asif, et al.
Published: (2024)
by: Khan, Muhammad Asif, et al.
Published: (2024)
Differentiable Solver Search for Fast Diffusion Sampling
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
Image Neural Field Diffusion Models
by: Chen, Yinbo, et al.
Published: (2024)
by: Chen, Yinbo, et al.
Published: (2024)
Unified Pix Token And Word Token Generative Language Model
by: Leung, Haun, et al.
Published: (2026)
by: Leung, Haun, et al.
Published: (2026)
I2-NeRF: Learning Neural Radiance Fields Under Physically-Grounded Media Interactions
by: Liu, Shuhong, et al.
Published: (2025)
by: Liu, Shuhong, et al.
Published: (2025)
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
by: Ma, Zehong, et al.
Published: (2025)
by: Ma, Zehong, et al.
Published: (2025)
CeRF: Convolutional Neural Radiance Fields for New View Synthesis with Derivatives of Ray Modeling
by: Yang, Xiaoyan, et al.
Published: (2023)
by: Yang, Xiaoyan, et al.
Published: (2023)
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
by: Yue, Zhengrong, et al.
Published: (2025)
by: Yue, Zhengrong, et al.
Published: (2025)
InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning
by: Li, Tiancheng, et al.
Published: (2024)
by: Li, Tiancheng, et al.
Published: (2024)
DiP: Taming Diffusion Models in Pixel Space
by: Chen, Zhennan, et al.
Published: (2025)
by: Chen, Zhennan, et al.
Published: (2025)
Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers
by: Xu, Gangwei, et al.
Published: (2025)
by: Xu, Gangwei, et al.
Published: (2025)
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
by: Chen, Junsong, et al.
Published: (2024)
by: Chen, Junsong, et al.
Published: (2024)
SeedEdit: Align Image Re-Generation to Image Editing
by: Shi, Yichun, et al.
Published: (2024)
by: Shi, Yichun, et al.
Published: (2024)
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
by: Wang, Hanlin, et al.
Published: (2023)
by: Wang, Hanlin, et al.
Published: (2023)
Towards Pixel-Level VLM Perception via Simple Points Prediction
by: Song, Tianhui, et al.
Published: (2026)
by: Song, Tianhui, et al.
Published: (2026)
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
by: Lin, Weifeng, et al.
Published: (2024)
by: Lin, Weifeng, et al.
Published: (2024)
BeyondPixels: A Comprehensive Review of the Evolution of Neural Radiance Fields
by: Rabby, AKM Shahariar Azad, et al.
Published: (2023)
by: Rabby, AKM Shahariar Azad, et al.
Published: (2023)
FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on
by: Wang, Chenhui, et al.
Published: (2024)
by: Wang, Chenhui, et al.
Published: (2024)
Similar Items
-
DDT: Decoupled Diffusion Transformer
by: Wang, Shuai, et al.
Published: (2025) -
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
by: Zhu, Chenhui, et al.
Published: (2025) -
End-to-End Dense Video Grounding via Parallel Regression
by: Shi, Fengyuan, et al.
Published: (2021) -
PixIE: Prompted Pixel-Space Low-Light Image Enhancement
by: Lin, Ruirui, et al.
Published: (2026) -
PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space Refinement
by: Zheng, Haitian, et al.
Published: (2025)