Saved in:
| Main Authors: | Luo, Yulin, Fan, Chun-Kai, Dong, Menghang, Shi, Jiayu, Zhao, Mengdi, Zhang, Bo-Wen, Chi, Cheng, Liu, Jiaming, Dai, Gaole, Zhang, Rongyu, An, Ruichuan, Wu, Kun, Che, Zhengping, Xie, Shaoxuan, Yao, Guocai, Zhao, Zhongxia, Wang, Pengwei, Liu, Guang, Wang, Zhongyuan, Huang, Tiejun, Zhang, Shanghang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.17801 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SpikeGen: Decoupled "Rods and Cones" Visual Representation Processing with Latent Generative Framework
by: Dai, Gaole, et al.
Published: (2025)
by: Dai, Gaole, et al.
Published: (2025)
Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation
by: Tan, Huajie, et al.
Published: (2025)
by: Tan, Huajie, et al.
Published: (2025)
METIS: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action Model
by: Fu, Yankai, et al.
Published: (2025)
by: Fu, Yankai, et al.
Published: (2025)
LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
by: Luo, Yulin, et al.
Published: (2024)
by: Luo, Yulin, et al.
Published: (2024)
MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation
by: Zhang, Rongyu, et al.
Published: (2025)
by: Zhang, Rongyu, et al.
Published: (2025)
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
by: Ji, Yuheng, et al.
Published: (2025)
by: Ji, Yuheng, et al.
Published: (2025)
Multimodal Large Language Models for Bioimage Analysis
by: Zhang, Shanghang, et al.
Published: (2024)
by: Zhang, Shanghang, et al.
Published: (2024)
MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning
by: Zhang, Qizhe, et al.
Published: (2023)
by: Zhang, Qizhe, et al.
Published: (2023)
Orochi: Versatile Biomedical Image Processor
by: Dai, Gaole, et al.
Published: (2025)
by: Dai, Gaole, et al.
Published: (2025)
Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation
by: Zhang, Rongyu, et al.
Published: (2024)
by: Zhang, Rongyu, et al.
Published: (2024)
Action-Sketcher: From Reasoning to Action via Visual Sketches for Long-Horizon Robotic Manipulation
by: Tan, Huajie, et al.
Published: (2026)
by: Tan, Huajie, et al.
Published: (2026)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
by: Zhao, Zhongyu, et al.
Published: (2024)
by: Zhao, Zhongyu, et al.
Published: (2024)
SpikePingpong: Spike Vision-based Fast-Slow Pingpong Robot System
by: Wang, Hao, et al.
Published: (2025)
by: Wang, Hao, et al.
Published: (2025)
RoboBrain 2.5: Depth in Sight, Time in Mind
by: Tan, Huajie, et al.
Published: (2026)
by: Tan, Huajie, et al.
Published: (2026)
AffordGrasp: In-Context Affordance Reasoning for Open-Vocabulary Task-Oriented Grasping in Clutter
by: Tang, Yingbo, et al.
Published: (2025)
by: Tang, Yingbo, et al.
Published: (2025)
RepCaM++: Exploring Transparent Visual Prompt With Inference-Time Re-Parameterization for Neural Video Delivery
by: Zhang, Rongyu, et al.
Published: (2025)
by: Zhang, Rongyu, et al.
Published: (2025)
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics
by: Zhou, Enshen, et al.
Published: (2025)
by: Zhou, Enshen, et al.
Published: (2025)
MoASE++: Mixture of Activation Sparsity Experts with Domain-Adaptive On-policy Distillation for Continual Test Time Adaptation
by: Zhang, Ronyu, et al.
Published: (2026)
by: Zhang, Ronyu, et al.
Published: (2026)
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
by: Jia, Yueru, et al.
Published: (2024)
by: Jia, Yueru, et al.
Published: (2024)
BEVUDA++: Geometric-aware Unsupervised Domain Adaptation for Multi-View 3D Object Detection
by: Zhang, Rongyu, et al.
Published: (2025)
by: Zhang, Rongyu, et al.
Published: (2025)
Implicit Neural Image Field for Biological Microscopy Image Compression
by: Dai, Gaole, et al.
Published: (2024)
by: Dai, Gaole, et al.
Published: (2024)
SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera
by: Dai, Gaole, et al.
Published: (2024)
by: Dai, Gaole, et al.
Published: (2024)
Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought
by: Zhang, Shuyi, et al.
Published: (2025)
by: Zhang, Shuyi, et al.
Published: (2025)
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
by: Tan, Huajie, et al.
Published: (2025)
by: Tan, Huajie, et al.
Published: (2025)
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
by: Chen, Sixiang, et al.
Published: (2025)
by: Chen, Sixiang, et al.
Published: (2025)
PIGEON: VLM-Driven Object Navigation via Points of Interest Selection
by: Peng, Cheng, et al.
Published: (2025)
by: Peng, Cheng, et al.
Published: (2025)
Demo-JEPA: Joint-Embedding Predictive Architecture for One-shot Cross-Embodiment Imitation
by: He, Jingyang, et al.
Published: (2026)
by: He, Jingyang, et al.
Published: (2026)
BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object Detection
by: Liu, Jiaming, et al.
Published: (2022)
by: Liu, Jiaming, et al.
Published: (2022)
Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer
by: Liu, Jiaming, et al.
Published: (2022)
by: Liu, Jiaming, et al.
Published: (2022)
SaPaVe: Towards Active Perception and Manipulation in Vision-Language-Action Models for Robotics
by: Liu, Mengzhen, et al.
Published: (2026)
by: Liu, Mengzhen, et al.
Published: (2026)
RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence
by: Hou, Chengkai, et al.
Published: (2025)
by: Hou, Chengkai, et al.
Published: (2025)
EVA: An Embodied World Model for Future Video Anticipation
by: Chi, Xiaowei, et al.
Published: (2024)
by: Chi, Xiaowei, et al.
Published: (2024)
MapNav: A Novel Memory Representation via Annotated Semantic Maps for Vision-and-Language Navigation
by: Zhang, Lingfeng, et al.
Published: (2025)
by: Zhang, Lingfeng, et al.
Published: (2025)
AoE: Always-on Egocentric Human Video Collection for Embodied AI
by: Yang, Bowen, et al.
Published: (2026)
by: Yang, Bowen, et al.
Published: (2026)
$NavA^3$: Understanding Any Instruction, Navigating Anywhere, Finding Anything
by: Zhang, Lingfeng, et al.
Published: (2025)
by: Zhang, Lingfeng, et al.
Published: (2025)
M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation
by: Chi, Xiaowei, et al.
Published: (2023)
by: Chi, Xiaowei, et al.
Published: (2023)
A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track
by: Chen, Zehui, et al.
Published: (2024)
by: Chen, Zehui, et al.
Published: (2024)
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
by: Han, Yi, et al.
Published: (2025)
by: Han, Yi, et al.
Published: (2025)
DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action
by: Fang, Zhen, et al.
Published: (2025)
by: Fang, Zhen, et al.
Published: (2025)
EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?
by: Chen, Xinyan, et al.
Published: (2025)
by: Chen, Xinyan, et al.
Published: (2025)
Similar Items
-
SpikeGen: Decoupled "Rods and Cones" Visual Representation Processing with Latent Generative Framework
by: Dai, Gaole, et al.
Published: (2025) -
Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation
by: Tan, Huajie, et al.
Published: (2025) -
METIS: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action Model
by: Fu, Yankai, et al.
Published: (2025) -
LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
by: Luo, Yulin, et al.
Published: (2024) -
MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation
by: Zhang, Rongyu, et al.
Published: (2025)