:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Luo, Wang, Wu, Di, Na, Hengyuan, Zhu, Yinlin, Hu, Miao, Quan, Guocong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence I.2.10
Online Access:	https://arxiv.org/abs/2511.12170
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PPC-MT: Parallel Point Cloud Completion with Mamba-Transformer Hybrid Architecture
by: Li, Jie, et al.
Published: (2026)

Flexible-weighted Chamfer Distance: Enhanced Objective Function for Point Cloud Completion
by: Li, Jie, et al.
Published: (2025)

Rethinking Multimodal Few-Shot 3D Point Cloud Segmentation: From Fused Refinement to Decoupled Arbitration
by: Bian, Wentao, et al.
Published: (2026)

Sora as a World Model? A Complete Survey on Text-to-Video Generation
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024)

ChartComplete: A Taxonomy-based Inclusive Chart Dataset
by: Mustapha, Ahmad, et al.
Published: (2026)

Context-Aware Indoor Point Cloud Object Generation through User Instructions
by: Luo, Yiyang, et al.
Published: (2023)

3D Adaptive Structural Convolution Network for Domain-Invariant Point Cloud Recognition
by: Kim, Younggun, et al.
Published: (2024)

A Persistent Homology Design Space for 3D Point Cloud Deep Learning
by: Kudeshia, Prachi, et al.
Published: (2026)

SPARK: Scalable Real-Time Point Cloud Aggregation with Multi-View Self-Calibration
by: Sun, Chentian
Published: (2026)

FUSE-Flow: Scalable Real-Time Multi-View Point Cloud Reconstruction Using Confidence
by: Sun, Chentian
Published: (2026)

Rotation-Adaptive Point Cloud Domain Generalization via Intricate Orientation Learning
by: Liu, Bangzhen, et al.
Published: (2025)

SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization
by: Liu, Sicheng, et al.
Published: (2024)

GeoHeight-Bench: Towards Height-Aware Multimodal Reasoning in Remote Sensing
by: Hu, Xuran, et al.
Published: (2026)

treeX: Unsupervised Tree Instance Segmentation in Dense Forest Point Clouds
by: Burmeister, Josafat-Mattias, et al.
Published: (2025)

3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding
by: Chen, Yiping, et al.
Published: (2026)

Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion
by: Zhu, Yu, et al.
Published: (2025)

SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
by: Chen, Zhangquan, et al.
Published: (2025)

From Latent to Engine Manifolds: Analyzing ImageBind's Multimodal Embedding Space
by: Hamara, Andrew, et al.
Published: (2024)

VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
by: He, Jianxiang, et al.
Published: (2025)

ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation
by: Bidulka, Luke, et al.
Published: (2024)

Rethinking Uncertainty in Segmentation: From Estimation to Decision
by: Maganti, Saket
Published: (2026)

Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds
by: Lemke, Oliver, et al.
Published: (2024)

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
by: Chen, Zhangquan, et al.
Published: (2025)

Vectra: A New Metric, Dataset, and Model for Visual Quality Assessment in E-Commerce In-Image Machine Translation
by: Wu, Qingyu, et al.
Published: (2026)

MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models
by: Ji, Yiyan, et al.
Published: (2025)

Labels or Input? Rethinking Augmentation in Multimodal Hate Detection
by: Singh, Sahajpreet, et al.
Published: (2025)

U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)

Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization
by: Liu, Yisu, et al.
Published: (2024)

ERNet: Efficient Non-Rigid Registration Network for Point Sequences
by: He, Guangzhao, et al.
Published: (2025)

OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention
by: Chen, Zhangquan, et al.
Published: (2026)

Visual Enhanced Depth Scaling for Multimodal Latent Reasoning
by: Han, Yudong, et al.
Published: (2026)

Multimodal Action Quality Assessment
by: Zeng, Ling-An, et al.
Published: (2024)

Interpretable Tau-PET Synthesis from Multimodal T1-Weighted and FLAIR MRI Using Partial Information Decomposition Guided Disentangled Quantized Half-UNet
by: Chopra, Agamdeep S., et al.
Published: (2026)

Supervised Contrastive Learning for Few-Shot AI-Generated Image Detection and Attribution
by: Urueña, Jaime Álvarez, et al.
Published: (2025)

Pointing-Based Object Recognition
by: Hajdúch, Lukáš, et al.
Published: (2026)

Enhancing Sports Strategy with Video Analytics and Data Mining: Assessing the effectiveness of Multimodal LLMs in tennis video analysis
by: Teo, Charlton
Published: (2025)

Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection
by: Wang, Gaojian, et al.
Published: (2025)

EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event Prediction
by: Su, Qile, et al.
Published: (2025)

VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
by: Chen, Zhangquan, et al.
Published: (2025)

Topology-Aware Latent Diffusion for 3D Shape Generation
by: Hu, Jiangbei, et al.
Published: (2024)