Saved in:
| Main Authors: | Liu, Ziming, Yang, Yifan, Zhang, Chengruidong, Zhang, Yiqi, Qiu, Lili, You, Yang, Yang, Yuqing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.10389 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AVA: Towards Agentic Video Analytics with Vision Language Models
by: Yan, Yuxuan, et al.
Published: (2025)
by: Yan, Yuxuan, et al.
Published: (2025)
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
by: Zhou, Ziqin, et al.
Published: (2025)
by: Zhou, Ziqin, et al.
Published: (2025)
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
by: Zhou, Ziwei, et al.
Published: (2026)
by: Zhou, Ziwei, et al.
Published: (2026)
Region-to-Region: Enhancing Generative Image Harmonization with Adaptive Regional Injection
by: Zhang, Zhiqiu, et al.
Published: (2025)
by: Zhang, Zhiqiu, et al.
Published: (2025)
AHPA: Adaptive Hierarchical Prior Alignment for Diffusion Transformers
by: Min, Ruibin, et al.
Published: (2026)
by: Min, Ruibin, et al.
Published: (2026)
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
by: Qian, Jiaxu, et al.
Published: (2025)
by: Qian, Jiaxu, et al.
Published: (2025)
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
by: Li, Yan, et al.
Published: (2026)
by: Li, Yan, et al.
Published: (2026)
MROSS: Multi-Round Region-based Optimization for Scene Sketching
by: Liang, Yiqi, et al.
Published: (2024)
by: Liang, Yiqi, et al.
Published: (2024)
Stabilizing Diffusion Posterior Sampling by Noise--Frequency Continuation
by: Tian, Feng, et al.
Published: (2026)
by: Tian, Feng, et al.
Published: (2026)
Component-Aware Structure-Preserving Style Transfer for Satellite Visual Sim2Real Data Construction
by: Xie, Zongwu, et al.
Published: (2026)
by: Xie, Zongwu, et al.
Published: (2026)
SpotEdit: Selective Region Editing in Diffusion Transformers
by: Qin, Zhibin, et al.
Published: (2025)
by: Qin, Zhibin, et al.
Published: (2025)
EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy
by: Yu, Yichun, et al.
Published: (2025)
by: Yu, Yichun, et al.
Published: (2025)
OminiControl2: Efficient Conditioning for Diffusion Transformers
by: Tan, Zhenxiong, et al.
Published: (2025)
by: Tan, Zhenxiong, et al.
Published: (2025)
Energy Score-based Pseudo-Label Filtering and Adaptive Loss for Imbalanced Semi-supervised SAR target recognition
by: Zhang, Xinzheng, et al.
Published: (2024)
by: Zhang, Xinzheng, et al.
Published: (2024)
GUI-ARP: Enhancing Grounding with Adaptive Region Perception for GUI Agents
by: Ye, Xianhang, et al.
Published: (2025)
by: Ye, Xianhang, et al.
Published: (2025)
FFA Sora, video generation as fundus fluorescein angiography simulator
by: Wu, Xinyuan, et al.
Published: (2024)
by: Wu, Xinyuan, et al.
Published: (2024)
Enhancing Diffusion Models for Inverse Problems with Covariance-Aware Posterior Sampling
by: Hamidi, Shayan Mohajer, et al.
Published: (2024)
by: Hamidi, Shayan Mohajer, et al.
Published: (2024)
Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation
by: Dong, Wei, et al.
Published: (2024)
by: Dong, Wei, et al.
Published: (2024)
SP$^2$T: Sparse Proxy Attention for Dual-stream Point Transformer
by: Wan, Jiaxu, et al.
Published: (2024)
by: Wan, Jiaxu, et al.
Published: (2024)
ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning
by: Xu, Ziqiang, et al.
Published: (2025)
by: Xu, Ziqiang, et al.
Published: (2025)
Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation
by: Zhang, Tong, et al.
Published: (2024)
by: Zhang, Tong, et al.
Published: (2024)
RePCM: Region-Specific and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis
by: Yang, Xuan, et al.
Published: (2026)
by: Yang, Xuan, et al.
Published: (2026)
SDiT: Spiking Diffusion Model with Transformer
by: Yang, Shu, et al.
Published: (2024)
by: Yang, Shu, et al.
Published: (2024)
Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models
by: Li, Changlin, et al.
Published: (2025)
by: Li, Changlin, et al.
Published: (2025)
Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models
by: Li, Zejian, et al.
Published: (2025)
by: Li, Zejian, et al.
Published: (2025)
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?
by: Wen, Zimo, et al.
Published: (2026)
by: Wen, Zimo, et al.
Published: (2026)
SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition
by: Yang, Jingxiao, et al.
Published: (2026)
by: Yang, Jingxiao, et al.
Published: (2026)
ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance
by: Yang, Yang, et al.
Published: (2026)
by: Yang, Yang, et al.
Published: (2026)
Replication in Visual Diffusion Models: A Survey and Outlook
by: Wang, Wenhao, et al.
Published: (2024)
by: Wang, Wenhao, et al.
Published: (2024)
Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers
by: You, Haoran, et al.
Published: (2024)
by: You, Haoran, et al.
Published: (2024)
TMT: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation
by: Zhang, Enming, et al.
Published: (2025)
by: Zhang, Enming, et al.
Published: (2025)
AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers
by: Liu, Dong, et al.
Published: (2026)
by: Liu, Dong, et al.
Published: (2026)
MM-ISTS: Cooperating Irregularly Sampled Time Series Forecasting with Multimodal Vision-Text LLMs
by: Lei, Zhi, et al.
Published: (2026)
by: Lei, Zhi, et al.
Published: (2026)
DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching
by: Zou, Chang, et al.
Published: (2026)
by: Zou, Chang, et al.
Published: (2026)
Gradient-Free Classifier Guidance for Diffusion Model Sampling
by: Shenoy, Rahul, et al.
Published: (2024)
by: Shenoy, Rahul, et al.
Published: (2024)
O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing
by: Chen, Yuqing, et al.
Published: (2025)
by: Chen, Yuqing, et al.
Published: (2025)
State Space Model Meets Transformer: A New Paradigm for 3D Object Detection
by: Wang, Chuxin, et al.
Published: (2025)
by: Wang, Chuxin, et al.
Published: (2025)
Spiking Vision Transformer with Saccadic Attention
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
CLQ: Cross-Layer Guided Orthogonal-based Quantization for Diffusion Transformers
by: Liu, Kai, et al.
Published: (2025)
by: Liu, Kai, et al.
Published: (2025)
CCIS-Diff: A Generative Model with Stable Diffusion Prior for Controlled Colonoscopy Image Synthesis
by: Xie, Yifan, et al.
Published: (2024)
by: Xie, Yifan, et al.
Published: (2024)
Similar Items
-
AVA: Towards Agentic Video Analytics with Vision Language Models
by: Yan, Yuxuan, et al.
Published: (2025) -
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
by: Zhou, Ziqin, et al.
Published: (2025) -
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
by: Zhou, Ziwei, et al.
Published: (2026) -
Region-to-Region: Enhancing Generative Image Harmonization with Adaptive Regional Injection
by: Zhang, Zhiqiu, et al.
Published: (2025) -
AHPA: Adaptive Hierarchical Prior Alignment for Diffusion Transformers
by: Min, Ruibin, et al.
Published: (2026)