Saved in:
| Main Authors: | Wang, Zengbin, Hu, Xuecai, Wang, Yong, Xiong, Feng, Zhang, Man, Chu, Xiangxiang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.20354 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
by: Chu, Xiangxiang, et al.
Published: (2025)
by: Chu, Xiangxiang, et al.
Published: (2025)
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
by: Ji, Yuxiang, et al.
Published: (2026)
by: Ji, Yuxiang, et al.
Published: (2026)
Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser
by: Cai, Qingyuan, et al.
Published: (2024)
by: Cai, Qingyuan, et al.
Published: (2024)
AR-MAP: Are Autoregressive Large Language Models Implicit Teachers for Diffusion Large Language Models?
by: Lin, Liang, et al.
Published: (2026)
by: Lin, Liang, et al.
Published: (2026)
FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow
by: Li, Hangyu, et al.
Published: (2024)
by: Li, Hangyu, et al.
Published: (2024)
OpenAnimals: Revisiting Person Re-Identification for Animals Towards Better Generalization
by: Hou, Saihui, et al.
Published: (2024)
by: Hou, Saihui, et al.
Published: (2024)
TEXTS-Diff: TEXTS-Aware Diffusion Model for Real-World Text Image Super-Resolution
by: He, Haodong, et al.
Published: (2026)
by: He, Haodong, et al.
Published: (2026)
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
by: Bagchi, Anurag, et al.
Published: (2024)
by: Bagchi, Anurag, et al.
Published: (2024)
MMGenBench: Fully Automatically Evaluating LMMs from the Text-to-Image Generation Perspective
by: Huang, Hailang, et al.
Published: (2024)
by: Huang, Hailang, et al.
Published: (2024)
MedFM-Robust: Benchmarking Robustness of Medical Foundation Models
by: Cui, Xiangxiang, et al.
Published: (2026)
by: Cui, Xiangxiang, et al.
Published: (2026)
BarbieGait: An Identity-Consistent Synthetic Human Dataset with Versatile Cloth-Changing for Gait Recognition
by: Cai, Qingyuan, et al.
Published: (2026)
by: Cai, Qingyuan, et al.
Published: (2026)
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model
by: Li, Mingxing, et al.
Published: (2025)
by: Li, Mingxing, et al.
Published: (2025)
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
by: Zhao, Chenxi, et al.
Published: (2026)
by: Zhao, Chenxi, et al.
Published: (2026)
Layer-wise Instance Binding for Regional and Occlusion Control in Text-to-Image Diffusion Transformers
by: Chen, Ruidong, et al.
Published: (2026)
by: Chen, Ruidong, et al.
Published: (2026)
FastDDHPose: Towards Unified, Efficient, and Disentangled 3D Human Pose Estimation
by: Cai, Qingyuan, et al.
Published: (2025)
by: Cai, Qingyuan, et al.
Published: (2025)
QAGait: Revisit Gait Recognition from a Quality Perspective
by: Wang, Zengbin, et al.
Published: (2024)
by: Wang, Zengbin, et al.
Published: (2024)
LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling
by: Li, Huaqiu, et al.
Published: (2025)
by: Li, Huaqiu, et al.
Published: (2025)
Learning Geometric Invariance for Gait Recognition
by: Wang, Zengbin, et al.
Published: (2026)
by: Wang, Zengbin, et al.
Published: (2026)
Preference Alignment for Diffusion Model via Explicit Denoised Distribution Estimation
by: Shi, Dingyuan, et al.
Published: (2024)
by: Shi, Dingyuan, et al.
Published: (2024)
Ace-Skill: Bootstrapping Multimodal Agents with Prioritized and Clustered Evolution
by: Xiong, Feng, et al.
Published: (2026)
by: Xiong, Feng, et al.
Published: (2026)
UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement
by: Zhang, Xiao, et al.
Published: (2025)
by: Zhang, Xiao, et al.
Published: (2025)
DSI-Bench: A Benchmark for Dynamic Spatial Intelligence
by: Zhang, Ziang, et al.
Published: (2025)
by: Zhang, Ziang, et al.
Published: (2025)
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
by: Yang, Sihan, et al.
Published: (2025)
by: Yang, Sihan, et al.
Published: (2025)
Uncovering the Text Embedding in Text-to-Image Diffusion Models
by: Yu, Hu, et al.
Published: (2024)
by: Yu, Hu, et al.
Published: (2024)
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
by: Chu, Meng, et al.
Published: (2023)
by: Chu, Meng, et al.
Published: (2023)
Detect Everything with Few Examples
by: Zhang, Xinyu, et al.
Published: (2023)
by: Zhang, Xinyu, et al.
Published: (2023)
Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness
by: Chen, Honghao, et al.
Published: (2024)
by: Chen, Honghao, et al.
Published: (2024)
FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion
by: Ruan, Jiacheng, et al.
Published: (2024)
by: Ruan, Jiacheng, et al.
Published: (2024)
From Pixels to Places: A Systematic Benchmark for Evaluating Image Geolocalization Ability in Large Language Models
by: Li, Lingyao, et al.
Published: (2025)
by: Li, Lingyao, et al.
Published: (2025)
Grounding Everything in Tokens for Multimodal Large Language Models
by: Ren, Xiangxuan, et al.
Published: (2025)
by: Ren, Xiangxuan, et al.
Published: (2025)
SSR: Pushing the Limit of Spatial Intelligence with Structured Scene Reasoning
by: Zhang, Yi, et al.
Published: (2026)
by: Zhang, Yi, et al.
Published: (2026)
PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus
by: Liu, Zhaochen, et al.
Published: (2024)
by: Liu, Zhaochen, et al.
Published: (2024)
SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation
by: Zhou, Sashuai, et al.
Published: (2026)
by: Zhou, Sashuai, et al.
Published: (2026)
Spatial4D-Bench: A Versatile 4D Spatial Intelligence Benchmark
by: Wang, Pan, et al.
Published: (2025)
by: Wang, Pan, et al.
Published: (2025)
Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports
by: Yang, Yuchen, et al.
Published: (2026)
by: Yang, Yuchen, et al.
Published: (2026)
Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models
by: Salman, Shaeke, et al.
Published: (2024)
by: Salman, Shaeke, et al.
Published: (2024)
GenSpace: Benchmarking Spatially-Aware Image Generation
by: Wang, Zehan, et al.
Published: (2025)
by: Wang, Zehan, et al.
Published: (2025)
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
by: Chu, Xiangxiang, et al.
Published: (2024)
by: Chu, Xiangxiang, et al.
Published: (2024)
Spatial-Aware Conformal Prediction for Trustworthy Hyperspectral Image Classification
by: Liu, Kangdao, et al.
Published: (2024)
by: Liu, Kangdao, et al.
Published: (2024)
Distillation Improves Visual Place Recognition for Low Quality Images
by: Yang, Anbang, et al.
Published: (2023)
by: Yang, Anbang, et al.
Published: (2023)
Similar Items
-
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
by: Chu, Xiangxiang, et al.
Published: (2025) -
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
by: Ji, Yuxiang, et al.
Published: (2026) -
Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser
by: Cai, Qingyuan, et al.
Published: (2024) -
AR-MAP: Are Autoregressive Large Language Models Implicit Teachers for Diffusion Large Language Models?
by: Lin, Liang, et al.
Published: (2026) -
FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow
by: Li, Hangyu, et al.
Published: (2024)