:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Dingrui, Lai, Zheyuan, Li, Yuda, Wu, Yi, Ma, Yuexin, Betz, Johannes, Yang, Ruigang, Li, Wei
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2405.04100
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving
by: Wang, Dingrui, et al.
Published: (2024)

SocialMirror: Reconstructing 3D Human Interaction Behaviors from Monocular Videos with Semantic and Geometric Guidance
by: Xia, Qi, et al.
Published: (2026)

DRIP: Discriminative Rotation-Invariant Pole Landmark Descriptor for 3D LiDAR Localization
by: Li, Dingrui, et al.
Published: (2024)

EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving
by: Schäfer, Finn Rasmus, et al.
Published: (2026)

NPC: Neural Predictive Control for Fuel-Efficient Autonomous Trucks
by: Ren, Jiaping, et al.
Published: (2024)

One Model, Two Minds: Task-Conditioned Reasoning for Unified Image Quality and Aesthetic Assessment
by: Yin, Wen, et al.
Published: (2026)

State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend
by: Cui, Fei, et al.
Published: (2024)

Fusion of Short-term and Long-term Attention for Video Mirror Detection
by: Xu, Mingchen, et al.
Published: (2024)

GaussianFusionOcc: A Seamless Sensor Fusion Approach for 3D Occupancy Prediction Using 3D Gaussians
by: Pavković, Tomislav, et al.
Published: (2025)

Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation
by: Yang, Xiuyu, et al.
Published: (2025)

Beyond Flat Unknown Labels in Open-World Object Detection
by: Zhang, Yuchen, et al.
Published: (2025)

Beyond Known Objects: A Novel Framework for Open-Set Object Detection using Negative-Aware Norm
by: Zhang, Yuchen, et al.
Published: (2026)

Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps
by: Pham, Khanh Son, et al.
Published: (2025)

From Shadows to Safety: Occlusion Tracking and Risk Mitigation for Urban Autonomous Driving
by: Moller, Korbinian, et al.
Published: (2025)

Target-Bench: Can Video World Models Achieve Mapless Path Planning with Semantic Targets?
by: Wang, Dingrui, et al.
Published: (2025)

OccMamba: Semantic Occupancy Prediction with State Space Models
by: Li, Heng, et al.
Published: (2024)

OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries
by: Lu, Yuhang, et al.
Published: (2023)

Learned Ranking Function: From Short-term Behavior Predictions to Long-term User Satisfaction
by: Wu, Yi, et al.
Published: (2024)

Can VLMs Unlock Semantic Anomaly Detection? A Framework for Structured Reasoning
by: Brusnicki, Roberto, et al.
Published: (2025)

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
by: Yin, Junbo, et al.
Published: (2024)

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset
by: Wagner, Royden, et al.
Published: (2026)

How Well Do Vision-Language Models Understand Sequential Driving Scenes? A Sensitivity Study
by: Brusnicki, Roberto, et al.
Published: (2026)

OccLE: Label-Efficient 3D Semantic Occupancy Prediction
by: Fang, Naiyu, et al.
Published: (2025)

GM-DF: Generalized Multi-Scenario Deepfake Detection
by: Lai, Yingxin, et al.
Published: (2024)

Towards Practical Human Motion Prediction with LiDAR Point Clouds
by: Han, Xiao, et al.
Published: (2024)

Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation
by: Liang, Tianming, et al.
Published: (2025)

AffordGrasp: Cross-Modal Diffusion for Affordance-Aware Grasp Synthesis
by: Wu, Xiaofei, et al.
Published: (2026)

SALI: Short-term Alignment and Long-term Interaction Network for Colonoscopy Video Polyp Segmentation
by: Hu, Qiang, et al.
Published: (2024)

SceneTracker: Long-term Scene Flow Estimation Network
by: Wang, Bo, et al.
Published: (2024)

VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning
by: Ding, Yang, et al.
Published: (2025)

STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation
by: Wang, Jiamin, et al.
Published: (2025)

FastGrasp: Efficient Grasp Synthesis with Diffusion
by: Wu, Xiaofei, et al.
Published: (2024)

Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis
by: Gao, Yuan, et al.
Published: (2025)

EXPLORE-Bench: Egocentric Scene Prediction with Long-Horizon Reasoning
by: Yu, Chengjun, et al.
Published: (2026)

SWAG: Long-term Surgical Workflow Prediction with Generative-based Anticipation
by: Boels, Maxence, et al.
Published: (2024)

MeanFlow Transformers with Representation Autoencoders
by: Hu, Zheyuan, et al.
Published: (2025)

Registration between Point Cloud Streams and Sequential Bounding Boxes via Gradient Descent
by: Li, Xuesong, et al.
Published: (2024)

TAE: Target-aware enhancer for nighttime UAV tracking
by: Chen, Yanyan, et al.
Published: (2026)

BehaviorVLM: Unified Finetuning-Free Behavioral Understanding with Vision-Language Reasoning
by: Ke, Jingyang, et al.
Published: (2026)

Lighten CARAFE: Dynamic Lightweight Upsampling with Guided Reassemble Kernels
by: Fu, Ruigang, et al.
Published: (2024)