Saved in:
| Main Authors: | Jiang, Haoyi, Liu, Liu, Wang, Xinjie, He, Yonghao, Sui, Wei, Su, Zhizhong, Liu, Wenyu, Wang, Xinggang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.21186 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
by: Jiang, Haoyi, et al.
Published: (2024)
by: Jiang, Haoyi, et al.
Published: (2024)
Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images
by: Sun, Xiangyu, et al.
Published: (2025)
by: Sun, Xiangyu, et al.
Published: (2025)
GLS: Geometry-aware 3D Language Gaussian Splatting
by: Qiu, Jiaxiong, et al.
Published: (2024)
by: Qiu, Jiaxiong, et al.
Published: (2024)
3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image
by: Yin, Ze-Xin, et al.
Published: (2026)
by: Yin, Ze-Xin, et al.
Published: (2026)
EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence
by: Wang, Xinjie, et al.
Published: (2025)
by: Wang, Xinjie, et al.
Published: (2025)
DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation
by: Yin, Ze-Xin, et al.
Published: (2025)
by: Yin, Ze-Xin, et al.
Published: (2025)
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
by: Jiang, Bo, et al.
Published: (2025)
by: Jiang, Bo, et al.
Published: (2025)
PersonViT: Large-scale Self-supervised Vision Transformer for Person Re-Identification
by: Hu, Bin, et al.
Published: (2024)
by: Hu, Bin, et al.
Published: (2024)
TabletopGen: Instance-Level Interactive 3D Tabletop Scene Generation from Text or Single Image
by: Wang, Ziqian, et al.
Published: (2025)
by: Wang, Ziqian, et al.
Published: (2025)
STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting
by: Deng, Yunze, et al.
Published: (2025)
by: Deng, Yunze, et al.
Published: (2025)
Dynamic 2D Gaussians: Geometrically Accurate Radiance Fields for Dynamic Objects
by: Zhang, Shuai, et al.
Published: (2024)
by: Zhang, Shuai, et al.
Published: (2024)
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
by: Li, Yingyue, et al.
Published: (2025)
by: Li, Yingyue, et al.
Published: (2025)
Matte Anything: Interactive Natural Image Matting with Segment Anything Models
by: Yao, Jingfeng, et al.
Published: (2023)
by: Yao, Jingfeng, et al.
Published: (2023)
Polar Parametrization for Vision-based Surround-View 3D Detection
by: Chen, Shaoyu, et al.
Published: (2022)
by: Chen, Shaoyu, et al.
Published: (2022)
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning
by: Zou, Jialv, et al.
Published: (2024)
by: Zou, Jialv, et al.
Published: (2024)
GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images
by: Xu, Ziyang, et al.
Published: (2024)
by: Xu, Ziyang, et al.
Published: (2024)
2D Gaussians Meet Visual Tokenizer
by: Shi, Yiang, et al.
Published: (2025)
by: Shi, Yiang, et al.
Published: (2025)
Fast High Dynamic Range Radiance Fields for Dynamic Scenes
by: Wu, Guanjun, et al.
Published: (2024)
by: Wu, Guanjun, et al.
Published: (2024)
FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification
by: Yao, Jingfeng, et al.
Published: (2024)
by: Yao, Jingfeng, et al.
Published: (2024)
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
by: Zhu, Lianghui, et al.
Published: (2024)
by: Zhu, Lianghui, et al.
Published: (2024)
Gait Recognition via Collaborating Discriminative and Generative Diffusion Models
by: Xiong, Haijun, et al.
Published: (2025)
by: Xiong, Haijun, et al.
Published: (2025)
GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
by: Yi, Taoran, et al.
Published: (2023)
by: Yi, Taoran, et al.
Published: (2023)
MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning
by: Zhang, Wenrui, et al.
Published: (2025)
by: Zhang, Wenrui, et al.
Published: (2025)
Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition
by: Xiong, Haijun, et al.
Published: (2024)
by: Xiong, Haijun, et al.
Published: (2024)
Cross-Layer Attentive Feature Upsampling for Low-latency Semantic Segmentation
by: Cheng, Tianheng, et al.
Published: (2026)
by: Cheng, Tianheng, et al.
Published: (2026)
Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion
by: Liu, Liu, et al.
Published: (2024)
by: Liu, Liu, et al.
Published: (2024)
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
by: Zou, Jialv, et al.
Published: (2025)
by: Zou, Jialv, et al.
Published: (2025)
SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
by: Liu, Jiawei, et al.
Published: (2025)
by: Liu, Jiawei, et al.
Published: (2025)
4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer
by: Wu, Xianfeng, et al.
Published: (2025)
by: Wu, Xianfeng, et al.
Published: (2025)
Visual Text Generation in the Wild
by: Zhu, Yuanzhi, et al.
Published: (2024)
by: Zhu, Yuanzhi, et al.
Published: (2024)
DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models
by: Zeng, Lunbin, et al.
Published: (2025)
by: Zeng, Lunbin, et al.
Published: (2025)
Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds
by: Lu, Jia, et al.
Published: (2025)
by: Lu, Jia, et al.
Published: (2025)
GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition
by: Xiong, Haijun, et al.
Published: (2023)
by: Xiong, Haijun, et al.
Published: (2023)
DeltaMIL: Gated Memory Integration for Efficient and Discriminative Whole Slide Image Analysis
by: Zhu, Yueting, et al.
Published: (2025)
by: Zhu, Yueting, et al.
Published: (2025)
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
by: Li, Yongkang, et al.
Published: (2024)
by: Li, Yongkang, et al.
Published: (2024)
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
by: Xu, Chao, et al.
Published: (2024)
by: Xu, Chao, et al.
Published: (2024)
Speeding Up the Learning of 3D Gaussians with Much Shorter Gaussian Lists
by: Liu, Jiaqi, et al.
Published: (2026)
by: Liu, Jiaqi, et al.
Published: (2026)
Occupancy as Set of Points
by: Shi, Yiang, et al.
Published: (2024)
by: Shi, Yiang, et al.
Published: (2024)
MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images
by: Xu, Ziyang, et al.
Published: (2024)
by: Xu, Ziyang, et al.
Published: (2024)
A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space
by: He, Yonghao, et al.
Published: (2024)
by: He, Yonghao, et al.
Published: (2024)
Similar Items
-
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
by: Jiang, Haoyi, et al.
Published: (2024) -
Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images
by: Sun, Xiangyu, et al.
Published: (2025) -
GLS: Geometry-aware 3D Language Gaussian Splatting
by: Qiu, Jiaxiong, et al.
Published: (2024) -
3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image
by: Yin, Ze-Xin, et al.
Published: (2026) -
EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence
by: Wang, Xinjie, et al.
Published: (2025)