Saved in:
| Main Authors: | Luo, Wang, Wu, Di, Na, Hengyuan, Zhu, Yinlin, Hu, Miao, Quan, Guocong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.12170 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PPC-MT: Parallel Point Cloud Completion with Mamba-Transformer Hybrid Architecture
by: Li, Jie, et al.
Published: (2026)
by: Li, Jie, et al.
Published: (2026)
Flexible-weighted Chamfer Distance: Enhanced Objective Function for Point Cloud Completion
by: Li, Jie, et al.
Published: (2025)
by: Li, Jie, et al.
Published: (2025)
Rethinking Multimodal Few-Shot 3D Point Cloud Segmentation: From Fused Refinement to Decoupled Arbitration
by: Bian, Wentao, et al.
Published: (2026)
by: Bian, Wentao, et al.
Published: (2026)
Sora as a World Model? A Complete Survey on Text-to-Video Generation
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024)
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024)
ChartComplete: A Taxonomy-based Inclusive Chart Dataset
by: Mustapha, Ahmad, et al.
Published: (2026)
by: Mustapha, Ahmad, et al.
Published: (2026)
Context-Aware Indoor Point Cloud Object Generation through User Instructions
by: Luo, Yiyang, et al.
Published: (2023)
by: Luo, Yiyang, et al.
Published: (2023)
3D Adaptive Structural Convolution Network for Domain-Invariant Point Cloud Recognition
by: Kim, Younggun, et al.
Published: (2024)
by: Kim, Younggun, et al.
Published: (2024)
A Persistent Homology Design Space for 3D Point Cloud Deep Learning
by: Kudeshia, Prachi, et al.
Published: (2026)
by: Kudeshia, Prachi, et al.
Published: (2026)
SPARK: Scalable Real-Time Point Cloud Aggregation with Multi-View Self-Calibration
by: Sun, Chentian
Published: (2026)
by: Sun, Chentian
Published: (2026)
FUSE-Flow: Scalable Real-Time Multi-View Point Cloud Reconstruction Using Confidence
by: Sun, Chentian
Published: (2026)
by: Sun, Chentian
Published: (2026)
Rotation-Adaptive Point Cloud Domain Generalization via Intricate Orientation Learning
by: Liu, Bangzhen, et al.
Published: (2025)
by: Liu, Bangzhen, et al.
Published: (2025)
SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization
by: Liu, Sicheng, et al.
Published: (2024)
by: Liu, Sicheng, et al.
Published: (2024)
GeoHeight-Bench: Towards Height-Aware Multimodal Reasoning in Remote Sensing
by: Hu, Xuran, et al.
Published: (2026)
by: Hu, Xuran, et al.
Published: (2026)
treeX: Unsupervised Tree Instance Segmentation in Dense Forest Point Clouds
by: Burmeister, Josafat-Mattias, et al.
Published: (2025)
by: Burmeister, Josafat-Mattias, et al.
Published: (2025)
3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding
by: Chen, Yiping, et al.
Published: (2026)
by: Chen, Yiping, et al.
Published: (2026)
Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion
by: Zhu, Yu, et al.
Published: (2025)
by: Zhu, Yu, et al.
Published: (2025)
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
by: Chen, Zhangquan, et al.
Published: (2025)
by: Chen, Zhangquan, et al.
Published: (2025)
From Latent to Engine Manifolds: Analyzing ImageBind's Multimodal Embedding Space
by: Hamara, Andrew, et al.
Published: (2024)
by: Hamara, Andrew, et al.
Published: (2024)
VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
by: He, Jianxiang, et al.
Published: (2025)
by: He, Jianxiang, et al.
Published: (2025)
ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation
by: Bidulka, Luke, et al.
Published: (2024)
by: Bidulka, Luke, et al.
Published: (2024)
Rethinking Uncertainty in Segmentation: From Estimation to Decision
by: Maganti, Saket
Published: (2026)
by: Maganti, Saket
Published: (2026)
Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds
by: Lemke, Oliver, et al.
Published: (2024)
by: Lemke, Oliver, et al.
Published: (2024)
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
by: Chen, Zhangquan, et al.
Published: (2025)
by: Chen, Zhangquan, et al.
Published: (2025)
Vectra: A New Metric, Dataset, and Model for Visual Quality Assessment in E-Commerce In-Image Machine Translation
by: Wu, Qingyu, et al.
Published: (2026)
by: Wu, Qingyu, et al.
Published: (2026)
MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models
by: Ji, Yiyan, et al.
Published: (2025)
by: Ji, Yiyan, et al.
Published: (2025)
Labels or Input? Rethinking Augmentation in Multimodal Hate Detection
by: Singh, Sahajpreet, et al.
Published: (2025)
by: Singh, Sahajpreet, et al.
Published: (2025)
U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)
by: Li, Huibin, et al.
Published: (2025)
Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization
by: Liu, Yisu, et al.
Published: (2024)
by: Liu, Yisu, et al.
Published: (2024)
ERNet: Efficient Non-Rigid Registration Network for Point Sequences
by: He, Guangzhao, et al.
Published: (2025)
by: He, Guangzhao, et al.
Published: (2025)
OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention
by: Chen, Zhangquan, et al.
Published: (2026)
by: Chen, Zhangquan, et al.
Published: (2026)
Visual Enhanced Depth Scaling for Multimodal Latent Reasoning
by: Han, Yudong, et al.
Published: (2026)
by: Han, Yudong, et al.
Published: (2026)
Multimodal Action Quality Assessment
by: Zeng, Ling-An, et al.
Published: (2024)
by: Zeng, Ling-An, et al.
Published: (2024)
Interpretable Tau-PET Synthesis from Multimodal T1-Weighted and FLAIR MRI Using Partial Information Decomposition Guided Disentangled Quantized Half-UNet
by: Chopra, Agamdeep S., et al.
Published: (2026)
by: Chopra, Agamdeep S., et al.
Published: (2026)
Supervised Contrastive Learning for Few-Shot AI-Generated Image Detection and Attribution
by: Urueña, Jaime Álvarez, et al.
Published: (2025)
by: Urueña, Jaime Álvarez, et al.
Published: (2025)
Pointing-Based Object Recognition
by: Hajdúch, Lukáš, et al.
Published: (2026)
by: Hajdúch, Lukáš, et al.
Published: (2026)
Enhancing Sports Strategy with Video Analytics and Data Mining: Assessing the effectiveness of Multimodal LLMs in tennis video analysis
by: Teo, Charlton
Published: (2025)
by: Teo, Charlton
Published: (2025)
Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection
by: Wang, Gaojian, et al.
Published: (2025)
by: Wang, Gaojian, et al.
Published: (2025)
EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event Prediction
by: Su, Qile, et al.
Published: (2025)
by: Su, Qile, et al.
Published: (2025)
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
by: Chen, Zhangquan, et al.
Published: (2025)
by: Chen, Zhangquan, et al.
Published: (2025)
Topology-Aware Latent Diffusion for 3D Shape Generation
by: Hu, Jiangbei, et al.
Published: (2024)
by: Hu, Jiangbei, et al.
Published: (2024)
Similar Items
-
PPC-MT: Parallel Point Cloud Completion with Mamba-Transformer Hybrid Architecture
by: Li, Jie, et al.
Published: (2026) -
Flexible-weighted Chamfer Distance: Enhanced Objective Function for Point Cloud Completion
by: Li, Jie, et al.
Published: (2025) -
Rethinking Multimodal Few-Shot 3D Point Cloud Segmentation: From Fused Refinement to Decoupled Arbitration
by: Bian, Wentao, et al.
Published: (2026) -
Sora as a World Model? A Complete Survey on Text-to-Video Generation
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024) -
ChartComplete: A Taxonomy-based Inclusive Chart Dataset
by: Mustapha, Ahmad, et al.
Published: (2026)