Saved in:
| Main Authors: | Zhai, Mingliang, Li, Cheng, Guo, Zengyuan, Yang, Ningrui, Qin, Xiameng, Zhao, Sanyuan, Han, Junyu, Tao, Ji, Wu, Yuwei, Jia, Yunde |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.06324 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Memory-Centric Embodied Question Answering
by: Zhai, Mingliang, et al.
Published: (2025)
by: Zhai, Mingliang, et al.
Published: (2025)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
by: Zhai, Yukun, et al.
Published: (2023)
by: Zhai, Yukun, et al.
Published: (2023)
Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention
by: Lu, Hannan, et al.
Published: (2024)
by: Lu, Hannan, et al.
Published: (2024)
Multi-Step Reasoning for Embodied Question Answering via Tool Augmentation
by: Zhai, Mingliang, et al.
Published: (2025)
by: Zhai, Mingliang, et al.
Published: (2025)
Temporally Consistent Stereo Matching
by: Zeng, Jiaxi, et al.
Published: (2024)
by: Zeng, Jiaxi, et al.
Published: (2024)
Collaborative Position Reasoning Network for Referring Image Segmentation
by: Cao, Jianjian, et al.
Published: (2024)
by: Cao, Jianjian, et al.
Published: (2024)
Large-Scale Riemannian Meta-Optimization via Subspace Adaptation
by: Yu, Peilin, et al.
Published: (2025)
by: Yu, Peilin, et al.
Published: (2025)
Residual Hyperbolic Graph Convolution Networks
by: Xue, Yangkai, et al.
Published: (2024)
by: Xue, Yangkai, et al.
Published: (2024)
Curvature Learning for Generalization of Hyperbolic Neural Networks
by: Fan, Xiaomeng, et al.
Published: (2025)
by: Fan, Xiaomeng, et al.
Published: (2025)
Multi-Label Stereo Matching for Transparent Scene Depth Estimation
by: Liu, Zhidan, et al.
Published: (2025)
by: Liu, Zhidan, et al.
Published: (2025)
Multi-Sourced Compositional Generalization in Visual Question Answering
by: Li, Chuanhao, et al.
Published: (2025)
by: Li, Chuanhao, et al.
Published: (2025)
MIRROR: Multimodal Iterative Reasoning via Reflection on Visual Regions
by: Zhang, Haoyu, et al.
Published: (2026)
by: Zhang, Haoyu, et al.
Published: (2026)
Diving into the Fusion of Monocular Priors for Generalized Stereo Matching
by: Yao, Chengtang, et al.
Published: (2025)
by: Yao, Chengtang, et al.
Published: (2025)
Hyperbolic Dual Feature Augmentation for Open-Environment
by: Yu, Peilin, et al.
Published: (2025)
by: Yu, Peilin, et al.
Published: (2025)
Composition-Incremental Learning for Compositional Generalization
by: Li, Zhen, et al.
Published: (2025)
by: Li, Zhen, et al.
Published: (2025)
Adaptive Model Ensemble for Continual Learning
by: Mao, Yuchuan, et al.
Published: (2025)
by: Mao, Yuchuan, et al.
Published: (2025)
Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning
by: Fan, Xiaomeng, et al.
Published: (2025)
by: Fan, Xiaomeng, et al.
Published: (2025)
3D Visual Illusion Depth Estimation
by: Yao, Chengtang, et al.
Published: (2025)
by: Yao, Chengtang, et al.
Published: (2025)
Fine-Grained 3D Facial Reconstruction for Micro-Expressions
by: Sun, Che, et al.
Published: (2026)
by: Sun, Che, et al.
Published: (2026)
AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning Process
by: Zhang, Xintong, et al.
Published: (2026)
by: Zhang, Xintong, et al.
Published: (2026)
Open 3D World in Autonomous Driving
by: Cheng, Xinlong, et al.
Published: (2024)
by: Cheng, Xinlong, et al.
Published: (2024)
Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
by: Zhang, Xintong, et al.
Published: (2025)
by: Zhang, Xintong, et al.
Published: (2025)
Quantum repeaters enhanced by vacuum beam guides
by: Gan, Yu, et al.
Published: (2025)
by: Gan, Yu, et al.
Published: (2025)
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT
by: Hu, Xiaotao, et al.
Published: (2024)
by: Hu, Xiaotao, et al.
Published: (2024)
Synthesis of isobutyl cinnamate based on DESs catalyst: Optimization and kinetics
by: Jumei Xu, et al.
Published: (2024)
by: Jumei Xu, et al.
Published: (2024)
Consistency of Compositional Generalization across Multiple Levels
by: Li, Chuanhao, et al.
Published: (2024)
by: Li, Chuanhao, et al.
Published: (2024)
LongSplat: Online Generalizable 3D Gaussian Splatting from Long Sequence Images
by: Huang, Guichen, et al.
Published: (2025)
by: Huang, Guichen, et al.
Published: (2025)
Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
by: Wu, Wei, et al.
Published: (2025)
by: Wu, Wei, et al.
Published: (2025)
Morphology-Independent Facial Expression Imitation for Human-Face Robots
by: Chen, Xu, et al.
Published: (2026)
by: Chen, Xu, et al.
Published: (2026)
Progressive enhancement and restoration for mural images under low-light and defected conditions based on multi-receptive field strategy
by: Wei, Xiameng, et al.
Published: (2024)
by: Wei, Xiameng, et al.
Published: (2024)
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
by: Nie, Ming, et al.
Published: (2023)
by: Nie, Ming, et al.
Published: (2023)
StageInteractor: Query-based Object Detector with Cross-stage Interaction
by: Teng, Yao, et al.
Published: (2023)
by: Teng, Yao, et al.
Published: (2023)
World Models for Autonomous Driving: An Initial Survey
by: Guan, Yanchen, et al.
Published: (2024)
by: Guan, Yanchen, et al.
Published: (2024)
DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving
by: Yang, Pengxuan, et al.
Published: (2026)
by: Yang, Pengxuan, et al.
Published: (2026)
DynVLA: Learning World Dynamics for Action Reasoning in Autonomous Driving
by: Shang, Shuyao, et al.
Published: (2026)
by: Shang, Shuyao, et al.
Published: (2026)
DriveWorld-VLA: Unified Latent-Space World Modeling with Vision-Language-Action for Autonomous Driving
by: jia, Feiyang, et al.
Published: (2026)
by: jia, Feiyang, et al.
Published: (2026)
Magnetic field‐guided hollow mesoporous magnetite nanoparticles for enhanced sonodynamic therapy
by: Bin Li, et al.
Published: (2025)
by: Bin Li, et al.
Published: (2025)
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models
by: Li, Pengxiang, et al.
Published: (2024)
by: Li, Pengxiang, et al.
Published: (2024)
NaviDriveVLM: Decoupling High-Level Reasoning and Motion Planning for Autonomous Driving
by: Tao, Ximeng, et al.
Published: (2026)
by: Tao, Ximeng, et al.
Published: (2026)
Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving
by: Wang, Lu, et al.
Published: (2025)
by: Wang, Lu, et al.
Published: (2025)
Similar Items
-
Memory-Centric Embodied Question Answering
by: Zhai, Mingliang, et al.
Published: (2025) -
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
by: Zhai, Yukun, et al.
Published: (2023) -
Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention
by: Lu, Hannan, et al.
Published: (2024) -
Multi-Step Reasoning for Embodied Question Answering via Tool Augmentation
by: Zhai, Mingliang, et al.
Published: (2025) -
Temporally Consistent Stereo Matching
by: Zeng, Jiaxi, et al.
Published: (2024)