:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Zelin, Fan, Fenglei, Liao, Wenlong, Yan, Junchi
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.20002
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving
by: Jia, Xiaosong, et al.
Published: (2024)

ActiveAD: Planning-Oriented Active Learning for End-to-End Autonomous Driving
by: Lu, Han, et al.
Published: (2024)

M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios
by: Liao, Ning, et al.
Published: (2023)

HG3-NeRF: Hierarchical Geometric, Semantic, and Photometric Guided Neural Radiance Fields for Sparse View Inputs
by: Gao, Zelin, et al.
Published: (2024)

Repulsor: Accelerating Generative Modeling with a Contrastive Memory Bank
by: Zhang, Shaofeng, et al.
Published: (2025)

Mip-Grid: Anti-aliased Grid Representations for Neural Radiance Fields
by: Nam, Seungtae, et al.
Published: (2024)

Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach
by: Zhang, Beichen, et al.
Published: (2024)

FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach
by: Liao, Ning, et al.
Published: (2026)

EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation
by: Li, Yan, et al.
Published: (2026)

VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
by: Zhang, Xiangdong, et al.
Published: (2025)

FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving
by: Zhu, Yutao, et al.
Published: (2024)

One CT Unified Model Training Framework to Rule All Scanning Protocols
by: Xu, Fengzhi, et al.
Published: (2026)

Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach
by: Zhang, Shaofeng, et al.
Published: (2024)

Large Vision Model-Guided Masked Low-Rank Approximation for Ground-Roll Attenuation
by: Liao, Jiacheng, et al.
Published: (2026)

On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets
by: Liao, Ning, et al.
Published: (2023)

Joint Generative Modeling of Grounded Scene Graphs and Images via Diffusion Models
by: Xu, Bicheng, et al.
Published: (2024)

PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders
by: Zhang, Xiangdong, et al.
Published: (2024)

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views
by: Zhang, Xiangdong, et al.
Published: (2025)

Q-Ground: Image Quality Grounding with Large Multi-modality Models
by: Chen, Chaofeng, et al.
Published: (2024)

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
by: Wang, Haoxuan, et al.
Published: (2024)

Foresee-to-Ground: From Predictive Temporal Perception to Evidence-Driven Reasoning for Video Temporal Grounding
by: Zheng, Zelin, et al.
Published: (2026)

A Simple and Better Baseline for Visual Grounding
by: Wang, Jingchao, et al.
Published: (2025)

Towards Robust Infrared Small Target Detection: A Feature-Enhanced and Sensitivity-Tunable Framework
by: Zhao, Jinmiao, et al.
Published: (2024)

Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation
by: Li, Yan, et al.
Published: (2024)

GISE-TTT:A Framework for Global InformationSegmentation and Enhancement
by: Hao, Fenglei, et al.
Published: (2025)

Mask-Based Modeling for Neural Radiance Fields
by: Yang, Ganlin, et al.
Published: (2023)

3D Reconstruction and New View Synthesis of Indoor Environments based on a Dual Neural Radiance Field
by: Bao, Zhenyu, et al.
Published: (2024)

GeoMix: Towards Geometry-Aware Data Augmentation
by: Zhao, Wentao, et al.
Published: (2024)

GroundVTS: Visual Token Sampling in Multimodal Large Language Models for Video Temporal Grounding
by: Fan, Rong, et al.
Published: (2026)

Factorized Multi-Resolution HashGrid for Efficient Neural Radiance Fields: Execution on Edge-Devices
by: Jun-Seong, Kim, et al.
Published: (2026)

ViTree: Single-path Neural Tree for Step-wise Interpretable Fine-grained Visual Categorization
by: Lao, Danning, et al.
Published: (2024)

Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation
by: Hu, Jian, et al.
Published: (2024)

K-Buffers: A Plug-in Method for Enhancing Neural Fields with Multiple Buffers
by: Ren, Haofan, et al.
Published: (2025)

GroundGrid:LiDAR Point Cloud Ground Segmentation and Terrain Estimation
by: Steinke, Nicolai, et al.
Published: (2024)

Beyond a Single Frame: Multi-Frame Spatially Grounded Reasoning Across Volumetric MRI
by: Moukheiber, Lama, et al.
Published: (2026)

Bi-Grid Reconstruction for Image Anomaly Detection
by: Huang, Huichuan, et al.
Published: (2025)

LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
by: Wang, Junchi, et al.
Published: (2024)

VLM-Loc: Localization in Point Cloud Maps via Vision-Language Models
by: Kang, Shuhao, et al.
Published: (2026)

SaENeRF: Suppressing Artifacts in Event-based Neural Radiance Fields
by: Wang, Yuanjian, et al.
Published: (2025)

Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review
by: Shi, Yining, et al.
Published: (2023)