Saved in:
| Main Authors: | Feng, Tuo, Wang, Wenguan, Ma, Fan, Yang, Yi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.15173 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data
by: Feng, Tuo, et al.
Published: (2024)
by: Feng, Tuo, et al.
Published: (2024)
A Survey of World Models for Autonomous Driving
by: Feng, Tuo, et al.
Published: (2025)
by: Feng, Tuo, et al.
Published: (2025)
Navigation Instruction Generation with BEV Perception and Large Language Models
by: Fan, Sheng, et al.
Published: (2024)
by: Fan, Sheng, et al.
Published: (2024)
T3DNet: Compressing Point Cloud Models for Lightweight 3D Recognition
by: Yang, Zhiyuan, et al.
Published: (2024)
by: Yang, Zhiyuan, et al.
Published: (2024)
FC3DNet: A Fully Connected Encoder-Decoder for Efficient Demoir'eing
by: Du, Zhibo, et al.
Published: (2024)
by: Du, Zhibo, et al.
Published: (2024)
Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models
by: Li, Liulei, et al.
Published: (2024)
by: Li, Liulei, et al.
Published: (2024)
PKINet-v2: Towards Powerful and Efficient Poly-Kernel Remote Sensing Object Detection
by: Cai, Xinhao, et al.
Published: (2026)
by: Cai, Xinhao, et al.
Published: (2026)
CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras
by: Pang, Mingxi, et al.
Published: (2026)
by: Pang, Mingxi, et al.
Published: (2026)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
by: Yang, Zongxin, et al.
Published: (2024)
by: Yang, Zongxin, et al.
Published: (2024)
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity
by: Quan, Ruijie, et al.
Published: (2024)
by: Quan, Ruijie, et al.
Published: (2024)
3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation
by: Gao, Jianzhe, et al.
Published: (2026)
by: Gao, Jianzhe, et al.
Published: (2026)
A Survey on 3D Gaussian Splatting
by: Chen, Guikun, et al.
Published: (2024)
by: Chen, Guikun, et al.
Published: (2024)
Volumetric Environment Representation for Vision-Language Navigation
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
Vision-Language Navigation with Energy-Based Policy
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing
by: Wang, Wenguan, et al.
Published: (2022)
by: Wang, Wenguan, et al.
Published: (2022)
Int3DNet: Scene-Motion Cross Attention Network for 3D Intention Prediction in Mixed Reality
by: Ha, Taewook, et al.
Published: (2026)
by: Ha, Taewook, et al.
Published: (2026)
SparseFusion: Efficient Sparse Multi-Modal Fusion Framework for Long-Range 3D Perception
by: Li, Yiheng, et al.
Published: (2024)
by: Li, Yiheng, et al.
Published: (2024)
Poly Kernel Inception Network for Remote Sensing Detection
by: Cai, Xinhao, et al.
Published: (2024)
by: Cai, Xinhao, et al.
Published: (2024)
Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
by: Ma, Jian, et al.
Published: (2024)
by: Ma, Jian, et al.
Published: (2024)
Neural Clustering based Visual Representation Learning
by: Chen, Guikun, et al.
Published: (2024)
by: Chen, Guikun, et al.
Published: (2024)
Clustering Propagation for Universal Medical Image Segmentation
by: Ding, Yuhang, et al.
Published: (2024)
by: Ding, Yuhang, et al.
Published: (2024)
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
by: Chen, Minghan, et al.
Published: (2024)
by: Chen, Minghan, et al.
Published: (2024)
DIFFVSGG: Diffusion-Driven Online Video Scene Graph Generation
by: Chen, Mu, et al.
Published: (2025)
by: Chen, Mu, et al.
Published: (2025)
AGA3DNet: Anatomy-Guided Gaussian Priors with Multi-view xLSTM for 3D Brain MRI Subtype Classification
by: Duan, Peiyu, et al.
Published: (2026)
by: Duan, Peiyu, et al.
Published: (2026)
SinkTrack: Attention Sink based Context Anchoring for Large Language Models
by: Liu, Xu, et al.
Published: (2026)
by: Liu, Xu, et al.
Published: (2026)
Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior
by: Chen, Cheng, et al.
Published: (2024)
by: Chen, Cheng, et al.
Published: (2024)
Long-SCOPE: Fully Sparse Long-Range Cooperative 3D Perception
by: Wang, Jiahao, et al.
Published: (2026)
by: Wang, Jiahao, et al.
Published: (2026)
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
by: Yin, Junbo, et al.
Published: (2024)
by: Yin, Junbo, et al.
Published: (2024)
Scene Graph Generation with Role-Playing Large Language Models
by: Chen, Guikun, et al.
Published: (2024)
by: Chen, Guikun, et al.
Published: (2024)
PE3R: Perception-Efficient 3D Reconstruction
by: Hu, Jie, et al.
Published: (2025)
by: Hu, Jie, et al.
Published: (2025)
MoCoLSK: Modality Conditioned High-Resolution Downscaling for Land Surface Temperature
by: Dai, Qun, et al.
Published: (2024)
by: Dai, Qun, et al.
Published: (2024)
SparseDiT: Token Sparsification for Efficient Diffusion Transformer
by: Chang, Shuning, et al.
Published: (2024)
by: Chang, Shuning, et al.
Published: (2024)
LawDNet: Enhanced Audio-Driven Lip Synthesis via Local Affine Warping Deformation
by: Junli, Deng, et al.
Published: (2024)
by: Junli, Deng, et al.
Published: (2024)
MT3DNet: Multi-Task learning Network for 3D Surgical Scene Reconstruction
by: Parab, Mithun, et al.
Published: (2024)
by: Parab, Mithun, et al.
Published: (2024)
SlimComm: Doppler-Guided Sparse Queries for Bandwidth-Efficient Cooperative 3-D Perception
by: Yazgan, Melih, et al.
Published: (2025)
by: Yazgan, Melih, et al.
Published: (2025)
Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images
by: Zhou, Bo, et al.
Published: (2026)
by: Zhou, Bo, et al.
Published: (2026)
Refine3DNet: Scaling Precision in 3D Object Reconstruction from Multi-View RGB Images using Attention
by: Balakrishnan, Ajith, et al.
Published: (2024)
by: Balakrishnan, Ajith, et al.
Published: (2024)
Visual Knowledge in the Big Model Era: Retrospect and Prospect
by: Wang, Wenguan, et al.
Published: (2024)
by: Wang, Wenguan, et al.
Published: (2024)
ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving
by: Yang, Sheng, et al.
Published: (2025)
by: Yang, Sheng, et al.
Published: (2025)
Fully Sparse Fusion for 3D Object Detection
by: Li, Yingyan, et al.
Published: (2023)
by: Li, Yingyan, et al.
Published: (2023)
Similar Items
-
Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data
by: Feng, Tuo, et al.
Published: (2024) -
A Survey of World Models for Autonomous Driving
by: Feng, Tuo, et al.
Published: (2025) -
Navigation Instruction Generation with BEV Perception and Large Language Models
by: Fan, Sheng, et al.
Published: (2024) -
T3DNet: Compressing Point Cloud Models for Lightweight 3D Recognition
by: Yang, Zhiyuan, et al.
Published: (2024) -
FC3DNet: A Fully Connected Encoder-Decoder for Efficient Demoir'eing
by: Du, Zhibo, et al.
Published: (2024)