:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Feng, Tuo, Wang, Wenguan, Ma, Fan, Yang, Yi
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.15173
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data
by: Feng, Tuo, et al.
Published: (2024)

A Survey of World Models for Autonomous Driving
by: Feng, Tuo, et al.
Published: (2025)

Navigation Instruction Generation with BEV Perception and Large Language Models
by: Fan, Sheng, et al.
Published: (2024)

T3DNet: Compressing Point Cloud Models for Lightweight 3D Recognition
by: Yang, Zhiyuan, et al.
Published: (2024)

FC3DNet: A Fully Connected Encoder-Decoder for Efficient Demoir'eing
by: Du, Zhibo, et al.
Published: (2024)

Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models
by: Li, Liulei, et al.
Published: (2024)

PKINet-v2: Towards Powerful and Efficient Poly-Kernel Remote Sensing Object Detection
by: Cai, Xinhao, et al.
Published: (2026)

CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras
by: Pang, Mingxi, et al.
Published: (2026)

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
by: Yang, Zongxin, et al.
Published: (2024)

Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity
by: Quan, Ruijie, et al.
Published: (2024)

3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation
by: Gao, Jianzhe, et al.
Published: (2026)

A Survey on 3D Gaussian Splatting
by: Chen, Guikun, et al.
Published: (2024)

Volumetric Environment Representation for Vision-Language Navigation
by: Liu, Rui, et al.
Published: (2024)

Vision-Language Navigation with Energy-Based Policy
by: Liu, Rui, et al.
Published: (2024)

Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing
by: Wang, Wenguan, et al.
Published: (2022)

Int3DNet: Scene-Motion Cross Attention Network for 3D Intention Prediction in Mixed Reality
by: Ha, Taewook, et al.
Published: (2026)

SparseFusion: Efficient Sparse Multi-Modal Fusion Framework for Long-Range 3D Perception
by: Li, Yiheng, et al.
Published: (2024)

Poly Kernel Inception Network for Remote Sensing Detection
by: Cai, Xinhao, et al.
Published: (2024)

Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
by: Ma, Jian, et al.
Published: (2024)

Neural Clustering based Visual Representation Learning
by: Chen, Guikun, et al.
Published: (2024)

Clustering Propagation for Universal Medical Image Segmentation
by: Ding, Yuhang, et al.
Published: (2024)

Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
by: Chen, Minghan, et al.
Published: (2024)

DIFFVSGG: Diffusion-Driven Online Video Scene Graph Generation
by: Chen, Mu, et al.
Published: (2025)

AGA3DNet: Anatomy-Guided Gaussian Priors with Multi-view xLSTM for 3D Brain MRI Subtype Classification
by: Duan, Peiyu, et al.
Published: (2026)

SinkTrack: Attention Sink based Context Anchoring for Large Language Models
by: Liu, Xu, et al.
Published: (2026)

Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior
by: Chen, Cheng, et al.
Published: (2024)

Long-SCOPE: Fully Sparse Long-Range Cooperative 3D Perception
by: Wang, Jiahao, et al.
Published: (2026)

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
by: Yin, Junbo, et al.
Published: (2024)

Scene Graph Generation with Role-Playing Large Language Models
by: Chen, Guikun, et al.
Published: (2024)

PE3R: Perception-Efficient 3D Reconstruction
by: Hu, Jie, et al.
Published: (2025)

MoCoLSK: Modality Conditioned High-Resolution Downscaling for Land Surface Temperature
by: Dai, Qun, et al.
Published: (2024)

SparseDiT: Token Sparsification for Efficient Diffusion Transformer
by: Chang, Shuning, et al.
Published: (2024)

LawDNet: Enhanced Audio-Driven Lip Synthesis via Local Affine Warping Deformation
by: Junli, Deng, et al.
Published: (2024)

MT3DNet: Multi-Task learning Network for 3D Surgical Scene Reconstruction
by: Parab, Mithun, et al.
Published: (2024)

SlimComm: Doppler-Guided Sparse Queries for Bandwidth-Efficient Cooperative 3-D Perception
by: Yazgan, Melih, et al.
Published: (2025)

Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images
by: Zhou, Bo, et al.
Published: (2026)

Refine3DNet: Scaling Precision in 3D Object Reconstruction from Multi-View RGB Images using Attention
by: Balakrishnan, Ajith, et al.
Published: (2024)

Visual Knowledge in the Big Model Era: Retrospect and Prospect
by: Wang, Wenguan, et al.
Published: (2024)

ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving
by: Yang, Sheng, et al.
Published: (2025)

Fully Sparse Fusion for 3D Object Detection
by: Li, Yingyan, et al.
Published: (2023)