Saved in:
| Main Authors: | Li, Linfei, Zhang, Lin, Shen, Ying |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.14880 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DexVLG: Dexterous Vision-Language-Grasp Model at Scale
by: He, Jiawei, et al.
Published: (2025)
by: He, Jiawei, et al.
Published: (2025)
Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models
by: Wang, Tianyu, et al.
Published: (2024)
by: Wang, Tianyu, et al.
Published: (2024)
RGBT-Ground Benchmark: Visual Grounding Beyond RGB in Complex Real-World Scenarios
by: Zhao, Tianyi, et al.
Published: (2025)
by: Zhao, Tianyi, et al.
Published: (2025)
INR-Bench: A Unified Benchmark for Implicit Neural Representations in Multi-Domain Regression and Reconstruction
by: Li, Linfei, et al.
Published: (2025)
by: Li, Linfei, et al.
Published: (2025)
GS3LAM: Gaussian Semantic Splatting SLAM
by: Li, Linfei, et al.
Published: (2026)
by: Li, Linfei, et al.
Published: (2026)
SmartSplat: Feature-Smart Gaussians for Scalable Compression of Ultra-High-Resolution Images
by: Li, Linfei, et al.
Published: (2025)
by: Li, Linfei, et al.
Published: (2025)
SynPlay: Large-Scale Synthetic Human Data with Real-World Diversity for Aerial-View Perception
by: Yim, Jinsub, et al.
Published: (2024)
by: Yim, Jinsub, et al.
Published: (2024)
AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations
by: Cai, Zhixi, et al.
Published: (2025)
by: Cai, Zhixi, et al.
Published: (2025)
SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes
by: Khargonkar, Ninad, et al.
Published: (2023)
by: Khargonkar, Ninad, et al.
Published: (2023)
PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing
by: Xu, Ruihang, et al.
Published: (2026)
by: Xu, Ruihang, et al.
Published: (2026)
Real-Time Privacy Preservation for Robot Visual Perception
by: Choi, Minkyu, et al.
Published: (2025)
by: Choi, Minkyu, et al.
Published: (2025)
MathScape: Benchmarking Multimodal Large Language Models in Real-World Mathematical Contexts
by: Liang, Hao, et al.
Published: (2024)
by: Liang, Hao, et al.
Published: (2024)
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
by: Duan, Jiafei, et al.
Published: (2024)
by: Duan, Jiafei, et al.
Published: (2024)
AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios
by: Hou, Yunhao, et al.
Published: (2025)
by: Hou, Yunhao, et al.
Published: (2025)
Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network
by: Shu, Yong, et al.
Published: (2024)
by: Shu, Yong, et al.
Published: (2024)
Evaluating Real-World Robot Manipulation Policies in Simulation
by: Li, Xuanlin, et al.
Published: (2024)
by: Li, Xuanlin, et al.
Published: (2024)
DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World
by: Li, Xiangtai, et al.
Published: (2025)
by: Li, Xiangtai, et al.
Published: (2025)
RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought
by: Qiao, Junbo, et al.
Published: (2025)
by: Qiao, Junbo, et al.
Published: (2025)
VisualTrans: A Benchmark for Real-World Visual Transformation Reasoning
by: Ji, Yuheng, et al.
Published: (2025)
by: Ji, Yuheng, et al.
Published: (2025)
MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark
by: Woo, Sanghyun, et al.
Published: (2024)
by: Woo, Sanghyun, et al.
Published: (2024)
TartanGround: A Large-Scale Dataset for Ground Robot Perception and Navigation
by: Patel, Manthan, et al.
Published: (2025)
by: Patel, Manthan, et al.
Published: (2025)
PolyReal: A Benchmark for Real-World Polymer Science Workflows
by: Liu, Wanhao, et al.
Published: (2026)
by: Liu, Wanhao, et al.
Published: (2026)
RealD$^2$iff: Bridging Real-World Gap in Robot Manipulation via Depth Diffusion
by: Liang, Xiujian, et al.
Published: (2025)
by: Liang, Xiujian, et al.
Published: (2025)
OmniGround: A Comprehensive Spatio-Temporal Grounding Benchmark for Real-World Complex Scenarios
by: Gao, Hong, et al.
Published: (2025)
by: Gao, Hong, et al.
Published: (2025)
HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction
by: Shi, Zhonghao, et al.
Published: (2025)
by: Shi, Zhonghao, et al.
Published: (2025)
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
by: Xun, Shuhang, et al.
Published: (2025)
by: Xun, Shuhang, et al.
Published: (2025)
Real3D: Scaling Up Large Reconstruction Models with Real-World Images
by: Jiang, Hanwen, et al.
Published: (2024)
by: Jiang, Hanwen, et al.
Published: (2024)
EPIC-Bench: A Perception-Centric Benchmark for Fine-Grained Embodied Visual Grounding in Vision-Language Models
by: Shan, Haozhe, et al.
Published: (2026)
by: Shan, Haozhe, et al.
Published: (2026)
Advancing Real-World Parking Slot Detection with Large-Scale Dataset and Semi-Supervised Baseline
by: Zhang, Zhihao, et al.
Published: (2025)
by: Zhang, Zhihao, et al.
Published: (2025)
MathReal: We Keep It Real! A Real Scene Benchmark for Evaluating Math Reasoning in Multimodal Large Language Models
by: Feng, Jun, et al.
Published: (2025)
by: Feng, Jun, et al.
Published: (2025)
CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios
by: Qiao, Xiangshuo, et al.
Published: (2024)
by: Qiao, Xiangshuo, et al.
Published: (2024)
How Far Has AI Come in Liver Fibrosis Staging? A Large-Scale Real-World Dataset and Benchmark
by: Liu, Yuanye, et al.
Published: (2026)
by: Liu, Yuanye, et al.
Published: (2026)
TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation
by: Fan, Hongwei, et al.
Published: (2025)
by: Fan, Hongwei, et al.
Published: (2025)
WorldEval: World Model as Real-World Robot Policies Evaluator
by: Li, Yaxuan, et al.
Published: (2025)
by: Li, Yaxuan, et al.
Published: (2025)
One-Step Diffusion-based Real-World Image Super-Resolution with Visual Perception Distillation
by: Wu, Xue, et al.
Published: (2025)
by: Wu, Xue, et al.
Published: (2025)
POINav: Benchmarking and Enhancing Final-Meters Arrival in Real-World Vision-Language Navigation
by: Gong, Ruiyan, et al.
Published: (2026)
by: Gong, Ruiyan, et al.
Published: (2026)
Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation
by: Li, Yuyang, et al.
Published: (2025)
by: Li, Yuyang, et al.
Published: (2025)
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
by: Huang, Haifeng, et al.
Published: (2025)
by: Huang, Haifeng, et al.
Published: (2025)
Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images
by: Liang, Yingping, et al.
Published: (2025)
by: Liang, Yingping, et al.
Published: (2025)
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
by: Srivastava, Divyansh, et al.
Published: (2024)
by: Srivastava, Divyansh, et al.
Published: (2024)
Similar Items
-
DexVLG: Dexterous Vision-Language-Grasp Model at Scale
by: He, Jiawei, et al.
Published: (2025) -
Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models
by: Wang, Tianyu, et al.
Published: (2024) -
RGBT-Ground Benchmark: Visual Grounding Beyond RGB in Complex Real-World Scenarios
by: Zhao, Tianyi, et al.
Published: (2025) -
INR-Bench: A Unified Benchmark for Implicit Neural Representations in Multi-Domain Regression and Reconstruction
by: Li, Linfei, et al.
Published: (2025) -
GS3LAM: Gaussian Semantic Splatting SLAM
by: Li, Linfei, et al.
Published: (2026)