Saved in:
| Main Authors: | Li, Liuzhuozheng, Zhan, Zhiyuan, Liu, Shuhong, Jiang, Dengyang, Wang, Zanyi, Dai, Guang, Wang, Jingdong, Wang, Mengmeng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.25289 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unlocking the Potential of Grounding DINO in Videos: Parameter-Efficient Adaptation for Limited-Data Spatial-Temporal Localization
by: Wang, Zanyi, et al.
Published: (2026)
by: Wang, Zanyi, et al.
Published: (2026)
Deforming Videos to Masks: Flow Matching for Referring Video Segmentation
by: Wang, Zanyi, et al.
Published: (2025)
by: Wang, Zanyi, et al.
Published: (2025)
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
by: Jiang, Dengyang, et al.
Published: (2025)
by: Jiang, Dengyang, et al.
Published: (2025)
SRA 2: Variational Autoencoder Self-Representation Alignment for Efficient Diffusion Training
by: Wang, Mengmeng, et al.
Published: (2026)
by: Wang, Mengmeng, et al.
Published: (2026)
TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP
by: Li, Fan, et al.
Published: (2025)
by: Li, Fan, et al.
Published: (2025)
Low-Biased General Annotated Dataset Generation
by: Jiang, Dengyang, et al.
Published: (2024)
by: Jiang, Dengyang, et al.
Published: (2024)
AffordanceSAM: Segment Anything Once More in Affordance Grounding
by: Jiang, Dengyang, et al.
Published: (2025)
by: Jiang, Dengyang, et al.
Published: (2025)
RefTon: Reference person shot assist virtual Try-on
by: Li, Liuzhuozheng, et al.
Published: (2025)
by: Li, Liuzhuozheng, et al.
Published: (2025)
Distribution Matching Distillation Meets Reinforcement Learning
by: Jiang, Dengyang, et al.
Published: (2025)
by: Jiang, Dengyang, et al.
Published: (2025)
SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation
by: Jia, Chengyou, et al.
Published: (2023)
by: Jia, Chengyou, et al.
Published: (2023)
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
by: Lin, Haonan, et al.
Published: (2024)
by: Lin, Haonan, et al.
Published: (2024)
Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery
by: Lin, Haonan, et al.
Published: (2024)
by: Lin, Haonan, et al.
Published: (2024)
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
by: Jiang, Dengyang, et al.
Published: (2026)
by: Jiang, Dengyang, et al.
Published: (2026)
Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing
by: Lin, Haonan, et al.
Published: (2024)
by: Lin, Haonan, et al.
Published: (2024)
SpotActor: Training-Free Layout-Controlled Consistent Image Generation
by: Wang, Jiahao, et al.
Published: (2024)
by: Wang, Jiahao, et al.
Published: (2024)
PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement
by: Jia, Chengyou, et al.
Published: (2023)
by: Jia, Chengyou, et al.
Published: (2023)
Visual Object Tracking across Diverse Data Modalities: A Review
by: Wang, Mengmeng, et al.
Published: (2024)
by: Wang, Mengmeng, et al.
Published: (2024)
MG-SLAM: Structure Gaussian Splatting SLAM with Manhattan World Hypothesis
by: Liu, Shuhong, et al.
Published: (2024)
by: Liu, Shuhong, et al.
Published: (2024)
Disentangled Noisy Correspondence Learning
by: Dang, Zhuohang, et al.
Published: (2024)
by: Dang, Zhuohang, et al.
Published: (2024)
Timestep-Aware Correction for Quantized Diffusion Models
by: Yao, Yuzhe, et al.
Published: (2024)
by: Yao, Yuzhe, et al.
Published: (2024)
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
by: Wang, Mengmeng, et al.
Published: (2024)
by: Wang, Mengmeng, et al.
Published: (2024)
MA-FSAR: Multimodal Adaptation of CLIP for Few-Shot Action Recognition
by: Xing, Jiazheng, et al.
Published: (2023)
by: Xing, Jiazheng, et al.
Published: (2023)
Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization
by: Chang, Yuanyuan, et al.
Published: (2025)
by: Chang, Yuanyuan, et al.
Published: (2025)
JoDiffusion: Jointly Diffusing Image with Pixel-Level Annotations for Semantic Segmentation Promotion
by: Wang, Haoyu, et al.
Published: (2025)
by: Wang, Haoyu, et al.
Published: (2025)
OneActor: Consistent Character Generation via Cluster-Conditioned Guidance
by: Wang, Jiahao, et al.
Published: (2024)
by: Wang, Jiahao, et al.
Published: (2024)
Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning
by: Li, Chengzu, et al.
Published: (2026)
by: Li, Chengzu, et al.
Published: (2026)
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
by: Han, Haochen, et al.
Published: (2024)
by: Han, Haochen, et al.
Published: (2024)
Improving Generalized Visual Grounding with Instance-aware Joint Learning
by: Dai, Ming, et al.
Published: (2025)
by: Dai, Ming, et al.
Published: (2025)
Manifold-Aware Exploration for Reinforcement Learning in Video Generation
by: Zheng, Mingzhe, et al.
Published: (2026)
by: Zheng, Mingzhe, et al.
Published: (2026)
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
by: Zheng, Shuhong, et al.
Published: (2024)
by: Zheng, Shuhong, et al.
Published: (2024)
Disentangled Representation Learning with Transmitted Information Bottleneck
by: Dang, Zhuohang, et al.
Published: (2023)
by: Dang, Zhuohang, et al.
Published: (2023)
Exploring Effective Factors for Improving Visual In-Context Learning
by: Sun, Yanpeng, et al.
Published: (2023)
by: Sun, Yanpeng, et al.
Published: (2023)
Assessing Model Generalization in Vicinity
by: Liu, Yuchi, et al.
Published: (2024)
by: Liu, Yuchi, et al.
Published: (2024)
Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model
by: Zhu, Guoqing, et al.
Published: (2024)
by: Zhu, Guoqing, et al.
Published: (2024)
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer
by: Hu, Jinyi, et al.
Published: (2024)
by: Hu, Jinyi, et al.
Published: (2024)
Guiding Noisy Label Conditional Diffusion Models with Score-based Discriminator Correction
by: Cong, Dat Nguyen, et al.
Published: (2025)
by: Cong, Dat Nguyen, et al.
Published: (2025)
Foster Adaptivity and Balance in Learning with Noisy Labels
by: Sheng, Mengmeng, et al.
Published: (2024)
by: Sheng, Mengmeng, et al.
Published: (2024)
Beyond Thinking: Imagining in 360$^\circ$ for Humanoid Visual Search
by: Zhang, Jingdong, et al.
Published: (2026)
by: Zhang, Jingdong, et al.
Published: (2026)
CCDM: Continuous Conditional Diffusion Models for Image Generation
by: Ding, Xin, et al.
Published: (2024)
by: Ding, Xin, et al.
Published: (2024)
Guiding Visual Autoregressive Models through Spectrum Weakening
by: Wang, Chaoyang, et al.
Published: (2025)
by: Wang, Chaoyang, et al.
Published: (2025)
Similar Items
-
Unlocking the Potential of Grounding DINO in Videos: Parameter-Efficient Adaptation for Limited-Data Spatial-Temporal Localization
by: Wang, Zanyi, et al.
Published: (2026) -
Deforming Videos to Masks: Flow Matching for Referring Video Segmentation
by: Wang, Zanyi, et al.
Published: (2025) -
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
by: Jiang, Dengyang, et al.
Published: (2025) -
SRA 2: Variational Autoencoder Self-Representation Alignment for Efficient Diffusion Training
by: Wang, Mengmeng, et al.
Published: (2026) -
TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP
by: Li, Fan, et al.
Published: (2025)