:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jiang, Haoyi, Liu, Liu, Wang, Xinjie, He, Yonghao, Sui, Wei, Su, Zhizhong, Liu, Wenyu, Wang, Xinggang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.21186
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
by: Jiang, Haoyi, et al.
Published: (2024)

Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images
by: Sun, Xiangyu, et al.
Published: (2025)

GLS: Geometry-aware 3D Language Gaussian Splatting
by: Qiu, Jiaxiong, et al.
Published: (2024)

3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image
by: Yin, Ze-Xin, et al.
Published: (2026)

EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence
by: Wang, Xinjie, et al.
Published: (2025)

DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation
by: Yin, Ze-Xin, et al.
Published: (2025)

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
by: Jiang, Bo, et al.
Published: (2025)

PersonViT: Large-scale Self-supervised Vision Transformer for Person Re-Identification
by: Hu, Bin, et al.
Published: (2024)

TabletopGen: Instance-Level Interactive 3D Tabletop Scene Generation from Text or Single Image
by: Wang, Ziqian, et al.
Published: (2025)

STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting
by: Deng, Yunze, et al.
Published: (2025)

Dynamic 2D Gaussians: Geometrically Accurate Radiance Fields for Dynamic Objects
by: Zhang, Shuai, et al.
Published: (2024)

MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
by: Li, Yingyue, et al.
Published: (2025)

Matte Anything: Interactive Natural Image Matting with Segment Anything Models
by: Yao, Jingfeng, et al.
Published: (2023)

Polar Parametrization for Vision-based Surround-View 3D Detection
by: Chen, Shaoyu, et al.
Published: (2022)

MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning
by: Zou, Jialv, et al.
Published: (2024)

GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images
by: Xu, Ziyang, et al.
Published: (2024)

2D Gaussians Meet Visual Tokenizer
by: Shi, Yiang, et al.
Published: (2025)

Fast High Dynamic Range Radiance Fields for Dynamic Scenes
by: Wu, Guanjun, et al.
Published: (2024)

FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification
by: Yao, Jingfeng, et al.
Published: (2024)

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
by: Zhu, Lianghui, et al.
Published: (2024)

Gait Recognition via Collaborating Discriminative and Generative Diffusion Models
by: Xiong, Haijun, et al.
Published: (2025)

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
by: Yi, Taoran, et al.
Published: (2023)

MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning
by: Zhang, Wenrui, et al.
Published: (2025)

Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition
by: Xiong, Haijun, et al.
Published: (2024)

Cross-Layer Attentive Feature Upsampling for Low-latency Semantic Segmentation
by: Cheng, Tianheng, et al.
Published: (2026)

Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion
by: Liu, Liu, et al.
Published: (2024)

OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
by: Zou, Jialv, et al.
Published: (2025)

SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
by: Liu, Jiawei, et al.
Published: (2025)

4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer
by: Wu, Xianfeng, et al.
Published: (2025)

Visual Text Generation in the Wild
by: Zhu, Yuanzhi, et al.
Published: (2024)

DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models
by: Zeng, Lunbin, et al.
Published: (2025)

Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds
by: Lu, Jia, et al.
Published: (2025)

GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition
by: Xiong, Haijun, et al.
Published: (2023)

DeltaMIL: Gated Memory Integration for Efficient and Discriminative Whole Slide Image Analysis
by: Zhu, Yueting, et al.
Published: (2025)

Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
by: Li, Yongkang, et al.
Published: (2024)

SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
by: Xu, Chao, et al.
Published: (2024)

Speeding Up the Learning of 3D Gaussians with Much Shorter Gaussian Lists
by: Liu, Jiaqi, et al.
Published: (2026)

Occupancy as Set of Points
by: Shi, Yiang, et al.
Published: (2024)

MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images
by: Xu, Ziyang, et al.
Published: (2024)

A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space
by: He, Yonghao, et al.
Published: (2024)