Saved in:
| Main Authors: | Yang, Jiawei, Geng, Zhengyang, Ju, Xuan, Tian, Yonglong, Wang, Yue |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.28190 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Latent Denoising Makes Good Tokenizers
by: Yang, Jiawei, et al.
Published: (2025)
by: Yang, Jiawei, et al.
Published: (2025)
Personalized Representation from Personalized Generation
by: Sundaram, Shobhita, et al.
Published: (2024)
by: Sundaram, Shobhita, et al.
Published: (2024)
Denoising Vision Transformers
by: Yang, Jiawei, et al.
Published: (2024)
by: Yang, Jiawei, et al.
Published: (2024)
Autoregressive Image Generation without Vector Quantization
by: Li, Tianhong, et al.
Published: (2024)
by: Li, Tianhong, et al.
Published: (2024)
Component-Aware Structure-Preserving Style Transfer for Satellite Visual Sim2Real Data Construction
by: Xie, Zongwu, et al.
Published: (2026)
by: Xie, Zongwu, et al.
Published: (2026)
Improving the Classification Effect of Clinical Images of Diseases for Multi-Source Privacy Protection
by: Bowen, Tian, et al.
Published: (2024)
by: Bowen, Tian, et al.
Published: (2024)
Fréchet Denoised Distance: Enhancing Plausibility Evaluation for Generated Designs with Denoising Autoencoder
by: Fan, Jiajie, et al.
Published: (2024)
by: Fan, Jiajie, et al.
Published: (2024)
Stable Consistency Tuning: Understanding and Improving Consistency Models
by: Wang, Fu-Yun, et al.
Published: (2024)
by: Wang, Fu-Yun, et al.
Published: (2024)
Visual Text Generation in the Wild
by: Zhu, Yuanzhi, et al.
Published: (2024)
by: Zhu, Yuanzhi, et al.
Published: (2024)
Learning Vision from Models Rivals Learning Vision from Data
by: Tian, Yonglong, et al.
Published: (2023)
by: Tian, Yonglong, et al.
Published: (2023)
Mean Flows for One-step Generative Modeling
by: Geng, Zhengyang, et al.
Published: (2025)
by: Geng, Zhengyang, et al.
Published: (2025)
SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
by: Liu, Jiawei, et al.
Published: (2025)
by: Liu, Jiawei, et al.
Published: (2025)
VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation
by: Wang, Jiawei, et al.
Published: (2024)
by: Wang, Jiawei, et al.
Published: (2024)
Native Audio-Visual Alignment for Generation
by: Ji, Longbin, et al.
Published: (2026)
by: Ji, Longbin, et al.
Published: (2026)
One-Step Diffusion Distillation via Deep Equilibrium Models
by: Geng, Zhengyang, et al.
Published: (2023)
by: Geng, Zhengyang, et al.
Published: (2023)
LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition
by: Xia, Peng, et al.
Published: (2023)
by: Xia, Peng, et al.
Published: (2023)
Are Video Generation Models Geographically Fair? An Attraction-Centric Evaluation of Global Visual Knowledge
by: Liu, Xiao, et al.
Published: (2026)
by: Liu, Xiao, et al.
Published: (2026)
Visual Bridge: Universal Visual Perception Representations Generating
by: Gao, Yilin, et al.
Published: (2025)
by: Gao, Yilin, et al.
Published: (2025)
DINOv3 Visual Representations for Blueberry Perception Toward Robotic Harvesting
by: Wang, Rui-Feng, et al.
Published: (2026)
by: Wang, Rui-Feng, et al.
Published: (2026)
Learning 1D Causal Visual Representation with De-focus Attention Networks
by: Tao, Chenxin, et al.
Published: (2024)
by: Tao, Chenxin, et al.
Published: (2024)
SparseGS-W: Sparse-View 3D Gaussian Splatting in the Wild with Generative Priors
by: Li, Yiqing, et al.
Published: (2025)
by: Li, Yiqing, et al.
Published: (2025)
One-step Latent-free Image Generation with Pixel Mean Flows
by: Lu, Yiyang, et al.
Published: (2026)
by: Lu, Yiyang, et al.
Published: (2026)
EVA-02: A Visual Representation for Neon Genesis
by: Fang, Yuxin, et al.
Published: (2023)
by: Fang, Yuxin, et al.
Published: (2023)
Evaluating Text-to-Image and Text-to-Video Synthesis with a Conditional Fréchet Distance
by: Koo, Jaywon, et al.
Published: (2025)
by: Koo, Jaywon, et al.
Published: (2025)
DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation
by: Ju, Xiaoliang, et al.
Published: (2025)
by: Ju, Xiaoliang, et al.
Published: (2025)
City Scene Super-Resolution via Geometric Error Minimization
by: Lu, Zhengyang, et al.
Published: (2024)
by: Lu, Zhengyang, et al.
Published: (2024)
Randomized Autoregressive Visual Generation
by: Yu, Qihang, et al.
Published: (2024)
by: Yu, Qihang, et al.
Published: (2024)
Learning Robust Representations via Bidirectional Transition for Visual Reinforcement Learning
by: Hu, Xiaobo, et al.
Published: (2023)
by: Hu, Xiaobo, et al.
Published: (2023)
Flow Generator Matching
by: Huang, Zemin, et al.
Published: (2024)
by: Huang, Zemin, et al.
Published: (2024)
Neural Clustering based Visual Representation Learning
by: Chen, Guikun, et al.
Published: (2024)
by: Chen, Guikun, et al.
Published: (2024)
Semi-supervised Chinese Poem-to-Painting Generation via Cycle-consistent Adversarial Networks
by: Lu, Zhengyang, et al.
Published: (2024)
by: Lu, Zhengyang, et al.
Published: (2024)
AnyText2: Visual Text Generation and Editing With Customizable Attributes
by: Tuo, Yuxiang, et al.
Published: (2024)
by: Tuo, Yuxiang, et al.
Published: (2024)
Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
by: Geng, Daniel, et al.
Published: (2023)
by: Geng, Daniel, et al.
Published: (2023)
GMC: A General Framework of Multi-stage Context Learning and Utilization for Visual Detection Tasks
by: Wang, Xuan, et al.
Published: (2024)
by: Wang, Xuan, et al.
Published: (2024)
Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos
by: Liu, Jiahe, et al.
Published: (2024)
by: Liu, Jiahe, et al.
Published: (2024)
Improved Mean Flows: On the Challenges of Fastforward Generative Models
by: Geng, Zhengyang, et al.
Published: (2025)
by: Geng, Zhengyang, et al.
Published: (2025)
Vision-Language Models Do Not Understand Negation
by: Alhamoud, Kumail, et al.
Published: (2025)
by: Alhamoud, Kumail, et al.
Published: (2025)
Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing
by: Ma, Shichao, et al.
Published: (2025)
by: Ma, Shichao, et al.
Published: (2025)
Visual Attention Drifts,but Anchors Hold:Mitigating Hallucination in Multimodal Large Language Models via Cross-Layer Visual Anchors
by: Yang, Chengxu, et al.
Published: (2026)
by: Yang, Chengxu, et al.
Published: (2026)
VDLF-Net: Variational Feature Fusion for Adaptive and Few-Shot Visual Learning
by: Yan, Jiawei
Published: (2026)
by: Yan, Jiawei
Published: (2026)
Similar Items
-
Latent Denoising Makes Good Tokenizers
by: Yang, Jiawei, et al.
Published: (2025) -
Personalized Representation from Personalized Generation
by: Sundaram, Shobhita, et al.
Published: (2024) -
Denoising Vision Transformers
by: Yang, Jiawei, et al.
Published: (2024) -
Autoregressive Image Generation without Vector Quantization
by: Li, Tianhong, et al.
Published: (2024) -
Component-Aware Structure-Preserving Style Transfer for Satellite Visual Sim2Real Data Construction
by: Xie, Zongwu, et al.
Published: (2026)