Saved in:
| Main Authors: | Li, Yifan, Yang, Shuai, Liu, Jiaying |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.14974 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond
by: Zhang, Jiahang, et al.
Published: (2024)
by: Zhang, Jiahang, et al.
Published: (2024)
VibeFlow: Versatile Video Chroma-Lux Editing through Self-Supervised Learning
by: Li, Yifan, et al.
Published: (2026)
by: Li, Yifan, et al.
Published: (2026)
PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model
by: Gao, Xiang, et al.
Published: (2025)
by: Gao, Xiang, et al.
Published: (2025)
Can MLLMs Reason Beyond Language? VisReason: A Comprehensive Benchmark for Vision-Centric Reasoning
by: Guo, Longteng, et al.
Published: (2026)
by: Guo, Longteng, et al.
Published: (2026)
Benchmarking Endoscopic Surgical Image Restoration and Beyond
by: Pei, Jialun, et al.
Published: (2025)
by: Pei, Jialun, et al.
Published: (2025)
Adaptive Context Matters: Towards Provable Multi-Modality Guidance for Super-Resolution
by: Luo, Jinyi, et al.
Published: (2026)
by: Luo, Jinyi, et al.
Published: (2026)
Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search
by: Yang, Shuyu, et al.
Published: (2024)
by: Yang, Shuyu, et al.
Published: (2024)
Beyond Open Vocabulary: Multimodal Prompting for Object Detection in Remote Sensing Images
by: Yang, Shuai, et al.
Published: (2026)
by: Yang, Shuai, et al.
Published: (2026)
Control Color: Multimodal Diffusion-based Interactive Image Colorization
by: Liang, Zhexin, et al.
Published: (2024)
by: Liang, Zhexin, et al.
Published: (2024)
GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation Models
by: Butt, Muhammad Atif, et al.
Published: (2025)
by: Butt, Muhammad Atif, et al.
Published: (2025)
Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation
by: Gao, Xiang, et al.
Published: (2024)
by: Gao, Xiang, et al.
Published: (2024)
OmniFM: Toward Modality-Robust and Task-Agnostic Federated Learning for Heterogeneous Medical Imaging
by: Liu, Meilin, et al.
Published: (2026)
by: Liu, Meilin, et al.
Published: (2026)
Beyond Model Design: Data-Centric Training and Self-Ensemble for Gaussian Color Image Denoising
by: Chang, Gengjia, et al.
Published: (2026)
by: Chang, Gengjia, et al.
Published: (2026)
Intelligent Artistic Typography: A Comprehensive Review of Artistic Text Design and Generation
by: Bai, Yuhang, et al.
Published: (2024)
by: Bai, Yuhang, et al.
Published: (2024)
DFBench: Benchmarking Deepfake Image Detection Capability of Large Multimodal Models
by: Wang, Jiarui, et al.
Published: (2025)
by: Wang, Jiarui, et al.
Published: (2025)
FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation
by: Gao, Xiang, et al.
Published: (2024)
by: Gao, Xiang, et al.
Published: (2024)
ColorConceptBench: A Benchmark for Probabilistic Color-Concept Understanding in Text-to-Image Models
by: Ruan, Chenxi, et al.
Published: (2026)
by: Ruan, Chenxi, et al.
Published: (2026)
Boosting Weakly-Supervised Referring Image Segmentation via Progressive Comprehension
by: Yang, Zaiquan, et al.
Published: (2024)
by: Yang, Zaiquan, et al.
Published: (2024)
Beyond the Visible: Benchmarking Occlusion Perception in Multimodal Large Language Models
by: Liu, Zhaochen, et al.
Published: (2025)
by: Liu, Zhaochen, et al.
Published: (2025)
Contrast-X: A Multi-Modal Contrast Image Synthesis Benchmark and Universal Modality Flow Matching
by: Chen, Yifan, et al.
Published: (2026)
by: Chen, Yifan, et al.
Published: (2026)
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
by: Li, Lei, et al.
Published: (2024)
by: Li, Lei, et al.
Published: (2024)
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
by: Yang, Shuai, et al.
Published: (2024)
by: Yang, Shuai, et al.
Published: (2024)
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
by: Jiang, Kaixuan, et al.
Published: (2025)
by: Jiang, Kaixuan, et al.
Published: (2025)
Beyond the Last Frame: Process-aware Evaluation for Generative Video Reasoning
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
Advancing Visual Reliability: Color-Accurate Underwater Image Enhancement for Real-Time Underwater Missions
by: Zhou, Yiqiang, et al.
Published: (2026)
by: Zhou, Yiqiang, et al.
Published: (2026)
Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models
by: Szu-Tu, Li-Zhong, et al.
Published: (2025)
by: Szu-Tu, Li-Zhong, et al.
Published: (2025)
MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations
by: Xu, Liang, et al.
Published: (2024)
by: Xu, Liang, et al.
Published: (2024)
Beyond Forgetting in Continual Medical Image Segmentation: A Comprehensive Benchmark Study
by: Wang, Bomin, et al.
Published: (2026)
by: Wang, Bomin, et al.
Published: (2026)
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
by: Bai, Xuehai, et al.
Published: (2026)
by: Bai, Xuehai, et al.
Published: (2026)
From Pixels to Places: A Systematic Benchmark for Evaluating Image Geolocalization Ability in Large Language Models
by: Li, Lingyao, et al.
Published: (2025)
by: Li, Lingyao, et al.
Published: (2025)
Animate-X++: Universal Character Image Animation with Dynamic Backgrounds
by: Tan, Shuai, et al.
Published: (2025)
by: Tan, Shuai, et al.
Published: (2025)
UniREditBench: A Unified Reasoning-based Image Editing Benchmark
by: Han, Feng, et al.
Published: (2025)
by: Han, Feng, et al.
Published: (2025)
Image2Struct: Benchmarking Structure Extraction for Vision-Language Models
by: Roberts, Josselin Somerville, et al.
Published: (2024)
by: Roberts, Josselin Somerville, et al.
Published: (2024)
CD-NGP: A Fast Scalable Continual Representation for Dynamic Scenes
by: Liu, Zhenhuan, et al.
Published: (2024)
by: Liu, Zhenhuan, et al.
Published: (2024)
Palette-based Color Transfer between Images
by: Lv, Chenlei, et al.
Published: (2024)
by: Lv, Chenlei, et al.
Published: (2024)
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics
by: Hu, Jinghao, et al.
Published: (2024)
by: Hu, Jinghao, et al.
Published: (2024)
BioBench: A Blueprint to Move Beyond ImageNet for Scientific ML Benchmarks
by: Stevens, Samuel
Published: (2025)
by: Stevens, Samuel
Published: (2025)
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
by: Zhou, Yifan, et al.
Published: (2025)
by: Zhou, Yifan, et al.
Published: (2025)
Video Diffusion Models are Training-free Motion Interpreter and Controller
by: Xiao, Zeqi, et al.
Published: (2024)
by: Xiao, Zeqi, et al.
Published: (2024)
RecipeGen: A Benchmark for Real-World Recipe Image Generation
by: Zhang, Ruoxuan, et al.
Published: (2025)
by: Zhang, Ruoxuan, et al.
Published: (2025)
Similar Items
-
Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond
by: Zhang, Jiahang, et al.
Published: (2024) -
VibeFlow: Versatile Video Chroma-Lux Editing through Self-Supervised Learning
by: Li, Yifan, et al.
Published: (2026) -
PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model
by: Gao, Xiang, et al.
Published: (2025) -
Can MLLMs Reason Beyond Language? VisReason: A Comprehensive Benchmark for Vision-Centric Reasoning
by: Guo, Longteng, et al.
Published: (2026) -
Benchmarking Endoscopic Surgical Image Restoration and Beyond
by: Pei, Jialun, et al.
Published: (2025)