:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Chongyu, Huang, Ting, Sun, Chunyu, Ning, Xinyu, Wang, Di, Tang, Hao
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.05695
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PromptTea: Let Prompts Tell TeaCache the Optimal Threshold
by: Huang, Zishen, et al.
Published: (2025)

Efficiently Expanding Receptive Fields: Local Split Attention and Parallel Aggregation for Enhanced Large-scale Point Cloud Semantic Segmentation
by: Wang, Haodong, et al.
Published: (2024)

OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds
by: Wang, Chongyu, et al.
Published: (2025)

Context Unrolling in Omni Models
by: Yang, Ceyuan, et al.
Published: (2026)

DSPFusion: Image Fusion via Degradation and Semantic Dual-Prior Guidance
by: Tang, Linfeng, et al.
Published: (2025)

Slow Perception: Let's Perceive Geometric Figures Step-by-step
by: Wei, Haoran, et al.
Published: (2024)

Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
by: Han, Ruizi, et al.
Published: (2024)

Geometric Prior Based Deep Human Point Cloud Geometry Compression
by: Wu, Xinju, et al.
Published: (2023)

Multimodal Industrial Anomaly Detection via Geometric Prior
by: Li, Min, et al.
Published: (2026)

Action-Geometry Prediction with 3D Geometric Prior for Bimanual Manipulation
by: Xu, Chongyang, et al.
Published: (2026)

Efficient Portrait Matte Creation With Layer Diffusion and Connectivity Priors
by: Lu, Zhiyuan, et al.
Published: (2025)

Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets
by: Liu, Jing, et al.
Published: (2025)

TELA: Text to Layer-wise 3D Clothed Human Generation
by: Dong, Junting, et al.
Published: (2024)

Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation
by: Bai, Weimin, et al.
Published: (2025)

FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain
by: Wang, YuAn, et al.
Published: (2025)

Segmentation-guided Layer-wise Image Vectorization with Gradient Fills
by: Zhou, Hengyu, et al.
Published: (2024)

LARV: Data-Free Layer-wise Adaptive Rescaling Veneer for Model Merging
by: Wang, Xinyu, et al.
Published: (2026)

PGAHum: Prior-Guided Geometry and Appearance Learning for High-Fidelity Animatable Human Reconstruction
by: Wang, Hao, et al.
Published: (2024)

Enhancing Image Aesthetics with Dual-Conditioned Diffusion Models Guided by Multimodal Perception
by: Nan, Xinyu, et al.
Published: (2026)

Learning to Synergize Semantic and Geometric Priors for Limited-Data Wheat Disease Segmentation
by: Wang, Shijie, et al.
Published: (2026)

Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller
by: Zang, Chuanqi, et al.
Published: (2024)

SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
by: Shi, Liangtao, et al.
Published: (2025)

Vision Function Layer in Multimodal LLMs
by: Shi, Cheng, et al.
Published: (2025)

LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition
by: Li, Jinyuan, et al.
Published: (2024)

LaCo: Efficient Layer-wise Compression of Visual Tokens for Multimodal Large Language Models
by: Liu, Juntao, et al.
Published: (2025)

ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration
by: Huang, Ning-Chi, et al.
Published: (2024)

Layer-wise Instance Binding for Regional and Occlusion Control in Text-to-Image Diffusion Transformers
by: Chen, Ruidong, et al.
Published: (2026)

AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors
by: Qiao, Xiaozhen, et al.
Published: (2026)

Sparse Gain Radio Map Reconstruction With Geometry Priors and Uncertainty-Guided Measurement Selection
by: Zeng, Zhihan, et al.
Published: (2026)

Dragging with Geometry: From Pixels to Geometry-Guided Image Editing
by: Pu, Xinyu, et al.
Published: (2025)

RePer-360: Releasing Perspective Priors for 360$^\circ$ Depth Estimation via Self-Modulation
by: Guan, Cheng, et al.
Published: (2026)

Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning
by: Chen, Mingrui, et al.
Published: (2025)

SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion
by: Guo, Jiajie, et al.
Published: (2025)

LISA: A Layer-wise Integration and Suppression Approach for Hallucination Mitigation in Multimodal Large Language Models
by: Guo, Zhihui, et al.
Published: (2025)

GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
by: Liang, Jiafeng, et al.
Published: (2024)

VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors
by: Tang, Jimin, et al.
Published: (2026)

Advancing Structured Priors for Sparse-Voxel Surface Reconstruction
by: Chi, Ting-Hsun, et al.
Published: (2026)

Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement
by: Zhu, Lingyu, et al.
Published: (2024)

Unrolled Reconstruction with Integrated Super-Resolution for Accelerated 3D LGE MRI
by: Hisham, Md Hasibul Husain, et al.
Published: (2026)

3D CoCa: Contrastive Learners are 3D Captioners
by: Huang, Ting, et al.
Published: (2025)