Saved in:
| Main Authors: | Li, Sihang, Tan, Siqi, Chang, Bowen, Zhang, Jing, Feng, Chen, Li, Yiming |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.00138 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Focus On What Matters: Separated Models For Visual-Based RL Generalization
by: Zhang, Di, et al.
Published: (2024)
by: Zhang, Di, et al.
Published: (2024)
GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization
by: Sidorov, Gennady, et al.
Published: (2024)
by: Sidorov, Gennady, et al.
Published: (2024)
Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems
by: Islam, Chashi Mahiul, et al.
Published: (2024)
by: Islam, Chashi Mahiul, et al.
Published: (2024)
Extrapolated Urban View Synthesis Benchmark
by: Han, Xiangyu, et al.
Published: (2024)
by: Han, Xiangyu, et al.
Published: (2024)
PhysInOne: Visual Physics Learning and Reasoning in One Suite
by: Zhou, Siyuan, et al.
Published: (2026)
by: Zhou, Siyuan, et al.
Published: (2026)
Instruction-Guided Visual Masking
by: Zheng, Jinliang, et al.
Published: (2024)
by: Zheng, Jinliang, et al.
Published: (2024)
Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting
by: Li, Jinning, et al.
Published: (2024)
by: Li, Jinning, et al.
Published: (2024)
RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar
by: Ding, Fangqiang, et al.
Published: (2024)
by: Ding, Fangqiang, et al.
Published: (2024)
VISTA: Enhancing Visual Conditioning via Track-Following Preference Optimization in Vision-Language-Action Models
by: Chen, Yiye, et al.
Published: (2026)
by: Chen, Yiye, et al.
Published: (2026)
RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment Generalization
by: Liu, Songming, et al.
Published: (2026)
by: Liu, Songming, et al.
Published: (2026)
When Should We Prefer State-to-Visual DAgger Over Visual Reinforcement Learning?
by: Mu, Tongzhou, et al.
Published: (2024)
by: Mu, Tongzhou, et al.
Published: (2024)
DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation
by: Wang, Chen, et al.
Published: (2024)
by: Wang, Chen, et al.
Published: (2024)
Iterative Refinement Improves Compositional Image Generation
by: Jaiswal, Shantanu, et al.
Published: (2026)
by: Jaiswal, Shantanu, et al.
Published: (2026)
UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite
by: Guo, Sicen, et al.
Published: (2023)
by: Guo, Sicen, et al.
Published: (2023)
LatentPilot: Scene-Aware Vision-and-Language Navigation by Dreaming Ahead with Latent Visual Reasoning
by: Hao, Haihong, et al.
Published: (2026)
by: Hao, Haihong, et al.
Published: (2026)
SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment
by: Garrett, Caelan, et al.
Published: (2024)
by: Garrett, Caelan, et al.
Published: (2024)
Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation
by: Zhang, Tong, et al.
Published: (2024)
by: Zhang, Tong, et al.
Published: (2024)
Towards Ambiguity-Free Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity
by: Xu, Xiaohao, et al.
Published: (2025)
by: Xu, Xiaohao, et al.
Published: (2025)
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
by: Zhao, Qingqing, et al.
Published: (2025)
by: Zhao, Qingqing, et al.
Published: (2025)
Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training
by: Ruan, Hongzhi, et al.
Published: (2026)
by: Ruan, Hongzhi, et al.
Published: (2026)
Exploring Domain Shift on Radar-Based 3D Object Detection Amidst Diverse Environmental Conditions
by: Zhang, Miao, et al.
Published: (2024)
by: Zhang, Miao, et al.
Published: (2024)
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
by: Wang, Yufei, et al.
Published: (2023)
by: Wang, Yufei, et al.
Published: (2023)
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
by: Shang, Jinghuan, et al.
Published: (2024)
by: Shang, Jinghuan, et al.
Published: (2024)
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
by: Liu, Songming, et al.
Published: (2024)
by: Liu, Songming, et al.
Published: (2024)
Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts
by: Feng, Qi
Published: (2025)
by: Feng, Qi
Published: (2025)
VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model
by: Wang, Beichen, et al.
Published: (2024)
by: Wang, Beichen, et al.
Published: (2024)
Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy
by: Wang, Chen, et al.
Published: (2024)
by: Wang, Chen, et al.
Published: (2024)
ROSAR: An Adversarial Re-Training Framework for Robust Side-Scan Sonar Object Detection
by: Aubard, Martin, et al.
Published: (2024)
by: Aubard, Martin, et al.
Published: (2024)
Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling
by: Niu, Zenghao, et al.
Published: (2025)
by: Niu, Zenghao, et al.
Published: (2025)
ArtFormer: Controllable Generation of Diverse 3D Articulated Objects
by: Su, Jiayi, et al.
Published: (2024)
by: Su, Jiayi, et al.
Published: (2024)
STT: Stateful Tracking with Transformers for Autonomous Driving
by: Jing, Longlong, et al.
Published: (2024)
by: Jing, Longlong, et al.
Published: (2024)
NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking
by: Dauner, Daniel, et al.
Published: (2024)
by: Dauner, Daniel, et al.
Published: (2024)
Learning Visual Feature-Based World Models via Residual Latent Action
by: Zhang, Xinyu, et al.
Published: (2026)
by: Zhang, Xinyu, et al.
Published: (2026)
Context-aware Multi-Model Object Detection for Diversely Heterogeneous Compute Systems
by: Davis, Justin, et al.
Published: (2024)
by: Davis, Justin, et al.
Published: (2024)
Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agent
by: He, Linfeng, et al.
Published: (2024)
by: He, Linfeng, et al.
Published: (2024)
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
by: Majumdar, Arjun, et al.
Published: (2023)
by: Majumdar, Arjun, et al.
Published: (2023)
Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning
by: Zhang, Xiaoyu, et al.
Published: (2024)
by: Zhang, Xiaoyu, et al.
Published: (2024)
LIX: Implicitly Infusing Spatial Geometric Prior Knowledge into Visual Semantic Segmentation for Autonomous Driving
by: Guo, Sicen, et al.
Published: (2024)
by: Guo, Sicen, et al.
Published: (2024)
LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds
by: Zhang, Zihui, et al.
Published: (2025)
by: Zhang, Zihui, et al.
Published: (2025)
MATRIX: Multi-Agent Trajectory Generation with Diverse Contexts
by: Xu, Zhuo, et al.
Published: (2024)
by: Xu, Zhuo, et al.
Published: (2024)
Similar Items
-
Focus On What Matters: Separated Models For Visual-Based RL Generalization
by: Zhang, Di, et al.
Published: (2024) -
GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization
by: Sidorov, Gennady, et al.
Published: (2024) -
Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems
by: Islam, Chashi Mahiul, et al.
Published: (2024) -
Extrapolated Urban View Synthesis Benchmark
by: Han, Xiangyu, et al.
Published: (2024) -
PhysInOne: Visual Physics Learning and Reasoning in One Suite
by: Zhou, Siyuan, et al.
Published: (2026)