Saved in:
| Main Authors: | Lin, Yuhui, Yu, Siyue, Yang, Yuxing, Cheng, Guangliang, Xiao, Jimin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.02689 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hunyuan3D 1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
by: Yang, Xianghui, et al.
Published: (2024)
by: Yang, Xianghui, et al.
Published: (2024)
Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets
by: Hunyuan3D, Team, et al.
Published: (2025)
by: Hunyuan3D, Team, et al.
Published: (2025)
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
by: Zheng, Duo, et al.
Published: (2025)
by: Zheng, Duo, et al.
Published: (2025)
TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP
by: Li, Fan, et al.
Published: (2025)
by: Li, Fan, et al.
Published: (2025)
LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion Framework
by: Kang, Xin, et al.
Published: (2025)
by: Kang, Xin, et al.
Published: (2025)
On the Limits of Token Reduction for Efficient Unified Vision Language Training
by: Chen, Siyi, et al.
Published: (2026)
by: Chen, Siyi, et al.
Published: (2026)
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
by: Zhou, Jiaheng, et al.
Published: (2025)
by: Zhou, Jiaheng, et al.
Published: (2025)
IDPruner: Harmonizing Importance and Diversity in Visual Token Pruning for MLLMs
by: Tan, Yifan, et al.
Published: (2026)
by: Tan, Yifan, et al.
Published: (2026)
Event USKT : U-State Space Model in Knowledge Transfer for Event Cameras
by: Lin, Yuhui, et al.
Published: (2024)
by: Lin, Yuhui, et al.
Published: (2024)
SSGaussian: Semantic-Aware and Structure-Preserving 3D Style Transfer
by: Xu, Jimin, et al.
Published: (2025)
by: Xu, Jimin, et al.
Published: (2025)
Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation
by: Ye, Chongjie, et al.
Published: (2026)
by: Ye, Chongjie, et al.
Published: (2026)
Frequency-Aware Token Reduction for Efficient Vision Transformer
by: Lee, Dong-Jae, et al.
Published: (2025)
by: Lee, Dong-Jae, et al.
Published: (2025)
Multi-modal Relation Distillation for Unified 3D Representation Learning
by: Wang, Huiqun, et al.
Published: (2024)
by: Wang, Huiqun, et al.
Published: (2024)
Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image
by: Yang, Yuxiao, et al.
Published: (2025)
by: Yang, Yuxiao, et al.
Published: (2025)
SC3D: Label-Efficient Outdoor 3D Object Detection via Single Click Annotation
by: Xia, Qiming, et al.
Published: (2024)
by: Xia, Qiming, et al.
Published: (2024)
Artifact Reduction in Undersampled 3D Cone-Beam CTs using a Hybrid 2D-3D CNN Framework
by: Thalhammer, Johannes, et al.
Published: (2026)
by: Thalhammer, Johannes, et al.
Published: (2026)
Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs
by: Feng, Yigui, et al.
Published: (2026)
by: Feng, Yigui, et al.
Published: (2026)
Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation
by: Mao, Jiawei, et al.
Published: (2025)
by: Mao, Jiawei, et al.
Published: (2025)
VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference
by: Jiang, Pengfei, et al.
Published: (2025)
by: Jiang, Pengfei, et al.
Published: (2025)
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention
by: Chen, Yiwen, et al.
Published: (2025)
by: Chen, Yiwen, et al.
Published: (2025)
InfoTok: Information-Theoretic Regularization for Capacity-Constrained Shared Visual Tokenization in Unified MLLMs
by: Tang, Lv, et al.
Published: (2026)
by: Tang, Lv, et al.
Published: (2026)
ForestFormer3D: A Unified Framework for End-to-End Segmentation of Forest LiDAR 3D Point Clouds
by: Xiang, Binbin, et al.
Published: (2025)
by: Xiang, Binbin, et al.
Published: (2025)
Faster Parameter-Efficient Tuning with Token Redundancy Reduction
by: Kim, Kwonyoung, et al.
Published: (2025)
by: Kim, Kwonyoung, et al.
Published: (2025)
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization
by: Luo, Yongdong, et al.
Published: (2024)
by: Luo, Yongdong, et al.
Published: (2024)
S$^2$-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance
by: Xu, Beining, et al.
Published: (2025)
by: Xu, Beining, et al.
Published: (2025)
MedPruner: Training-Free Hierarchical Token Pruning for Efficient 3D Medical Image Understanding in Vision-Language Models
by: Liu, Shengyuan, et al.
Published: (2026)
by: Liu, Shengyuan, et al.
Published: (2026)
2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision
by: Yang, Cheng-Kun, et al.
Published: (2023)
by: Yang, Cheng-Kun, et al.
Published: (2023)
OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning
by: Li, Geng, et al.
Published: (2026)
by: Li, Geng, et al.
Published: (2026)
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
by: Zhang, Qizhe, et al.
Published: (2025)
by: Zhang, Qizhe, et al.
Published: (2025)
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
by: Li, Siyuan, et al.
Published: (2025)
by: Li, Siyuan, et al.
Published: (2025)
EvoPrune: Early-Stage Visual Token Pruning for Efficient MLLMs
by: Chen, Yuhao, et al.
Published: (2026)
by: Chen, Yuhao, et al.
Published: (2026)
AdaTP: Attention-Debiased Token Pruning for Video Large Language Models
by: Sun, Fengyuan, et al.
Published: (2025)
by: Sun, Fengyuan, et al.
Published: (2025)
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
by: Jiao, Yang, et al.
Published: (2025)
by: Jiao, Yang, et al.
Published: (2025)
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
by: Li, Wenhao, et al.
Published: (2023)
by: Li, Wenhao, et al.
Published: (2023)
Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation
by: Yang, Jiayu, et al.
Published: (2025)
by: Yang, Jiayu, et al.
Published: (2025)
Towards Fair Medical AI: Adversarial Debiasing of 3D CT Foundation Embeddings
by: Zheng, Guangyao, et al.
Published: (2025)
by: Zheng, Guangyao, et al.
Published: (2025)
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
by: Shang, Yuzhang, et al.
Published: (2024)
by: Shang, Yuzhang, et al.
Published: (2024)
From 2D Alignment to 3D Plausibility: Unifying Heterogeneous 2D Priors and Penetration-Free Diffusion for Occlusion-Robust Two-Hand Reconstruction
by: Han, Gaoge, et al.
Published: (2025)
by: Han, Gaoge, et al.
Published: (2025)
HY3D-Bench: Generation of 3D Assets
by: Hunyuan3D, Team, et al.
Published: (2026)
by: Hunyuan3D, Team, et al.
Published: (2026)
Unifying 2D and 3D Vision-Language Understanding
by: Jain, Ayush, et al.
Published: (2025)
by: Jain, Ayush, et al.
Published: (2025)
Similar Items
-
Hunyuan3D 1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
by: Yang, Xianghui, et al.
Published: (2024) -
Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets
by: Hunyuan3D, Team, et al.
Published: (2025) -
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
by: Zheng, Duo, et al.
Published: (2025) -
TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP
by: Li, Fan, et al.
Published: (2025) -
LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion Framework
by: Kang, Xin, et al.
Published: (2025)