:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Linfei, Zhang, Lin, Shen, Ying
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.14880
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DexVLG: Dexterous Vision-Language-Grasp Model at Scale
by: He, Jiawei, et al.
Published: (2025)

Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models
by: Wang, Tianyu, et al.
Published: (2024)

RGBT-Ground Benchmark: Visual Grounding Beyond RGB in Complex Real-World Scenarios
by: Zhao, Tianyi, et al.
Published: (2025)

INR-Bench: A Unified Benchmark for Implicit Neural Representations in Multi-Domain Regression and Reconstruction
by: Li, Linfei, et al.
Published: (2025)

GS3LAM: Gaussian Semantic Splatting SLAM
by: Li, Linfei, et al.
Published: (2026)

SmartSplat: Feature-Smart Gaussians for Scalable Compression of Ultra-High-Resolution Images
by: Li, Linfei, et al.
Published: (2025)

SynPlay: Large-Scale Synthetic Human Data with Real-World Diversity for Aerial-View Perception
by: Yim, Jinsub, et al.
Published: (2024)

AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations
by: Cai, Zhixi, et al.
Published: (2025)

SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes
by: Khargonkar, Ninad, et al.
Published: (2023)

PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing
by: Xu, Ruihang, et al.
Published: (2026)

Real-Time Privacy Preservation for Robot Visual Perception
by: Choi, Minkyu, et al.
Published: (2025)

MathScape: Benchmarking Multimodal Large Language Models in Real-World Mathematical Contexts
by: Liang, Hao, et al.
Published: (2024)

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
by: Duan, Jiafei, et al.
Published: (2024)

AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios
by: Hou, Yunhao, et al.
Published: (2025)

Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network
by: Shu, Yong, et al.
Published: (2024)

Evaluating Real-World Robot Manipulation Policies in Simulation
by: Li, Xuanlin, et al.
Published: (2024)

DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World
by: Li, Xiangtai, et al.
Published: (2025)

RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought
by: Qiao, Junbo, et al.
Published: (2025)

VisualTrans: A Benchmark for Real-World Visual Transformation Reasoning
by: Ji, Yuheng, et al.
Published: (2025)

MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark
by: Woo, Sanghyun, et al.
Published: (2024)

TartanGround: A Large-Scale Dataset for Ground Robot Perception and Navigation
by: Patel, Manthan, et al.
Published: (2025)

PolyReal: A Benchmark for Real-World Polymer Science Workflows
by: Liu, Wanhao, et al.
Published: (2026)

RealD$^2$iff: Bridging Real-World Gap in Robot Manipulation via Depth Diffusion
by: Liang, Xiujian, et al.
Published: (2025)

OmniGround: A Comprehensive Spatio-Temporal Grounding Benchmark for Real-World Complex Scenarios
by: Gao, Hong, et al.
Published: (2025)

HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction
by: Shi, Zhonghao, et al.
Published: (2025)

RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
by: Xun, Shuhang, et al.
Published: (2025)

Real3D: Scaling Up Large Reconstruction Models with Real-World Images
by: Jiang, Hanwen, et al.
Published: (2024)

EPIC-Bench: A Perception-Centric Benchmark for Fine-Grained Embodied Visual Grounding in Vision-Language Models
by: Shan, Haozhe, et al.
Published: (2026)

Advancing Real-World Parking Slot Detection with Large-Scale Dataset and Semi-Supervised Baseline
by: Zhang, Zhihao, et al.
Published: (2025)

MathReal: We Keep It Real! A Real Scene Benchmark for Evaluating Math Reasoning in Multimodal Large Language Models
by: Feng, Jun, et al.
Published: (2025)

CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios
by: Qiao, Xiangshuo, et al.
Published: (2024)

How Far Has AI Come in Liver Fibrosis Staging? A Large-Scale Real-World Dataset and Benchmark
by: Liu, Yuanye, et al.
Published: (2026)

TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation
by: Fan, Hongwei, et al.
Published: (2025)

WorldEval: World Model as Real-World Robot Policies Evaluator
by: Li, Yaxuan, et al.
Published: (2025)

One-Step Diffusion-based Real-World Image Super-Resolution with Visual Perception Distillation
by: Wu, Xue, et al.
Published: (2025)

POINav: Benchmarking and Enhancing Final-Meters Arrival in Real-World Vision-Language Navigation
by: Gong, Ruiyan, et al.
Published: (2026)

Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation
by: Li, Yuyang, et al.
Published: (2025)

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
by: Huang, Haifeng, et al.
Published: (2025)

Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images
by: Liang, Yingping, et al.
Published: (2025)

VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
by: Srivastava, Divyansh, et al.
Published: (2024)