Guardado en:
| Autores principales: | Chang, Qiong, Li, Xiang, Xu, Xin, Liu, Xin, Li, Yun, Jun, Miyazaki |
|---|---|
| Formato: | Preprint |
| Publicado: |
2023
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2305.11566 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Efficient stereo matching on embedded GPUs with zero-means cross correlation
por: Chang, Qiong, et al.
Publicado: (2022)
por: Chang, Qiong, et al.
Publicado: (2022)
2D or 3D: Who Governs Salience in VLA Models? -- Tri-Stage Token Pruning Framework with Modality Salience Awareness
por: Zheng, Zihao, et al.
Publicado: (2026)
por: Zheng, Zihao, et al.
Publicado: (2026)
Customizable Perturbation Synthesis for Robust SLAM Benchmarking
por: Xu, Xiaohao, et al.
Publicado: (2024)
por: Xu, Xiaohao, et al.
Publicado: (2024)
HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera
por: Lu, Yunfan, et al.
Publicado: (2024)
por: Lu, Yunfan, et al.
Publicado: (2024)
PanoGen++: Domain-Adapted Text-Guided Panoramic Environment Generation for Vision-and-Language Navigation
por: Wang, Sen, et al.
Publicado: (2025)
por: Wang, Sen, et al.
Publicado: (2025)
WoW: Towards a World omniscient World model Through Embodied Interaction
por: Chi, Xiaowei, et al.
Publicado: (2025)
por: Chi, Xiaowei, et al.
Publicado: (2025)
UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction for Autonomous Driving
por: Min, Chen, et al.
Publicado: (2023)
por: Min, Chen, et al.
Publicado: (2023)
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar
por: Guan, Runwei, et al.
Publicado: (2024)
por: Guan, Runwei, et al.
Publicado: (2024)
RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving
por: Huang, Zhijian, et al.
Publicado: (2024)
por: Huang, Zhijian, et al.
Publicado: (2024)
XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments
por: Qian, Kangan, et al.
Publicado: (2026)
por: Qian, Kangan, et al.
Publicado: (2026)
Integrating Multi-Modal Sensors: A Review of Fusion Techniques for Intelligent Vehicles
por: Wei, Chuheng, et al.
Publicado: (2025)
por: Wei, Chuheng, et al.
Publicado: (2025)
EVA: An Embodied World Model for Future Video Anticipation
por: Chi, Xiaowei, et al.
Publicado: (2024)
por: Chi, Xiaowei, et al.
Publicado: (2024)
EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis
por: Fang, Jianwu, et al.
Publicado: (2025)
por: Fang, Jianwu, et al.
Publicado: (2025)
REMAC: Reference-Based Martian Asymmetrical Image Compression
por: Ding, Qing, et al.
Publicado: (2026)
por: Ding, Qing, et al.
Publicado: (2026)
MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model
por: Zeng, Kang, et al.
Publicado: (2024)
por: Zeng, Kang, et al.
Publicado: (2024)
A dynamic memory assignment strategy for dilation-based ICP algorithm on embedded GPUs
por: Chang, Qiong, et al.
Publicado: (2025)
por: Chang, Qiong, et al.
Publicado: (2025)
RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba
por: Peng, Kunyu, et al.
Publicado: (2025)
por: Peng, Kunyu, et al.
Publicado: (2025)
CFMW: Cross-modality Fusion Mamba for Robust Object Detection under Adverse Weather
por: Li, Haoyuan, et al.
Publicado: (2024)
por: Li, Haoyuan, et al.
Publicado: (2024)
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence
por: Hong, Yining, et al.
Publicado: (2025)
por: Hong, Yining, et al.
Publicado: (2025)
SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation
por: Chen, Tong, et al.
Publicado: (2024)
por: Chen, Tong, et al.
Publicado: (2024)
A Novel FACS-Aligned Anatomical Text Description Paradigm for Fine-Grained Facial Behavior Synthesis
por: Wang, Jiahe, et al.
Publicado: (2026)
por: Wang, Jiahe, et al.
Publicado: (2026)
Extending Depth of Field for Varifocal Multiview Images
por: Li, Zhilong, et al.
Publicado: (2024)
por: Li, Zhilong, et al.
Publicado: (2024)
Memory-Guided View Refinement for Dynamic Human-in-the-loop EQA
por: Lu, Xin, et al.
Publicado: (2026)
por: Lu, Xin, et al.
Publicado: (2026)
Depth and Image Fusion for Road Obstacle Detection Using Stereo Camera
por: Perezyabov, Oleg, et al.
Publicado: (2025)
por: Perezyabov, Oleg, et al.
Publicado: (2025)
Disparity-based Stereo Image Compression with Aligned Cross-View Priors
por: Zhai, Yongqi, et al.
Publicado: (2022)
por: Zhai, Yongqi, et al.
Publicado: (2022)
MHAD: Multimodal Home Activity Dataset with Multi-Angle Videos and Synchronized Physiological Signals
por: Yu, Lei, et al.
Publicado: (2024)
por: Yu, Lei, et al.
Publicado: (2024)
Learning Contrastive Self-Distillation for Ultra-Fine-Grained Visual Categorization Targeting Limited Samples
por: Fang, Ziye, et al.
Publicado: (2023)
por: Fang, Ziye, et al.
Publicado: (2023)
DeCo-VAE: Learning Compact Latents for Video Reconstruction via Decoupled Representation
por: Yin, Xiangchen, et al.
Publicado: (2025)
por: Yin, Xiangchen, et al.
Publicado: (2025)
A Very Big Video Reasoning Suite
por: Wang, Maijunxian, et al.
Publicado: (2026)
por: Wang, Maijunxian, et al.
Publicado: (2026)
Scaling Spatial Intelligence with Multimodal Foundation Models
por: Cai, Zhongang, et al.
Publicado: (2025)
por: Cai, Zhongang, et al.
Publicado: (2025)
Spatio-Temporal Data Enhanced Vision-Language Model for Traffic Scene Understanding
por: Ma, Jingtian, et al.
Publicado: (2025)
por: Ma, Jingtian, et al.
Publicado: (2025)
Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision
por: Wei, Yiping, et al.
Publicado: (2023)
por: Wei, Yiping, et al.
Publicado: (2023)
Exploring Event-based Human Pose Estimation with 3D Event Representations
por: Yin, Xiaoting, et al.
Publicado: (2023)
por: Yin, Xiaoting, et al.
Publicado: (2023)
Exploring Self-supervised Skeleton-based Action Recognition in Occluded Environments
por: Chen, Yifei, et al.
Publicado: (2023)
por: Chen, Yifei, et al.
Publicado: (2023)
SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting
por: Lin, Shengjie, et al.
Publicado: (2025)
por: Lin, Shengjie, et al.
Publicado: (2025)
E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes
por: Zhai, Jiajun, et al.
Publicado: (2026)
por: Zhai, Jiajun, et al.
Publicado: (2026)
Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction Tuning
por: Lu, Yiting, et al.
Publicado: (2025)
por: Lu, Yiting, et al.
Publicado: (2025)
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
por: Cai, Zhongang, et al.
Publicado: (2025)
por: Cai, Zhongang, et al.
Publicado: (2025)
NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving
por: Peng, Qucheng, et al.
Publicado: (2025)
por: Peng, Qucheng, et al.
Publicado: (2025)
VIVAT: Virtuous Improving VAE Training through Artifact Mitigation
por: Novitskiy, Lev, et al.
Publicado: (2025)
por: Novitskiy, Lev, et al.
Publicado: (2025)
Ejemplares similares
-
Efficient stereo matching on embedded GPUs with zero-means cross correlation
por: Chang, Qiong, et al.
Publicado: (2022) -
2D or 3D: Who Governs Salience in VLA Models? -- Tri-Stage Token Pruning Framework with Modality Salience Awareness
por: Zheng, Zihao, et al.
Publicado: (2026) -
Customizable Perturbation Synthesis for Robust SLAM Benchmarking
por: Xu, Xiaohao, et al.
Publicado: (2024) -
HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera
por: Lu, Yunfan, et al.
Publicado: (2024) -
PanoGen++: Domain-Adapted Text-Guided Panoramic Environment Generation for Vision-and-Language Navigation
por: Wang, Sen, et al.
Publicado: (2025)