Saved in:
| Main Authors: | Tang, Xuejiao, Zhang, Wenbin |
|---|---|
| Format: | Preprint |
| Published: |
2022
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2204.08027 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Curvature-Aware Captioning:Leveraging Geodesic Attention for 3D Scene Understanding
by: He, Ziyao, et al.
Published: (2026)
by: He, Ziyao, et al.
Published: (2026)
SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding
by: Zeng, Nianbo, et al.
Published: (2025)
by: Zeng, Nianbo, et al.
Published: (2025)
Lifting Unlabeled Internet-level Data for 3D Scene Understanding
by: Chen, Yixin, et al.
Published: (2026)
by: Chen, Yixin, et al.
Published: (2026)
HAMF: A Hybrid Attention-Mamba Framework for Joint Scene Context Understanding and Future Motion Representation Learning
by: Mei, Xiaodong, et al.
Published: (2025)
by: Mei, Xiaodong, et al.
Published: (2025)
TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes
by: Fu, Yanping, et al.
Published: (2024)
by: Fu, Yanping, et al.
Published: (2024)
Towards Holistic Surgical Scene Understanding
by: Valderrama, Natalia, et al.
Published: (2022)
by: Valderrama, Natalia, et al.
Published: (2022)
Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
by: Li, Yansheng, et al.
Published: (2024)
by: Li, Yansheng, et al.
Published: (2024)
Heat Diffusion Models -- Interpixel Attention Mechanism
by: Zhang, Pengfei, et al.
Published: (2025)
by: Zhang, Pengfei, et al.
Published: (2025)
RieMind: Geometry-Grounded Spatial Agent for Scene Understanding
by: Ropero, Fernando, et al.
Published: (2026)
by: Ropero, Fernando, et al.
Published: (2026)
Toward Robust Multimodal Learning using Multimodal Foundational Models
by: Zhao, Xianbing, et al.
Published: (2024)
by: Zhao, Xianbing, et al.
Published: (2024)
3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding
by: Linghu, Xiongkun, et al.
Published: (2026)
by: Linghu, Xiongkun, et al.
Published: (2026)
Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding
by: Ma, Ke, et al.
Published: (2026)
by: Ma, Ke, et al.
Published: (2026)
Text-Scene: A Scene-to-Language Parsing Framework for 3D Scene Understanding
by: Li, Haoyuan, et al.
Published: (2025)
by: Li, Haoyuan, et al.
Published: (2025)
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind
by: Li, Qingmei, et al.
Published: (2025)
by: Li, Qingmei, et al.
Published: (2025)
Evaluating Compositional Scene Understanding in Multimodal Generative Models
by: Fu, Shuhao, et al.
Published: (2025)
by: Fu, Shuhao, et al.
Published: (2025)
Solving Scene Understanding for Autonomous Navigation in Unstructured Environments
by: Renji, Naveen Mathews, et al.
Published: (2025)
by: Renji, Naveen Mathews, et al.
Published: (2025)
DenseScan: Advancing 3D Scene Understanding with 2D Dense Annotation
by: Wang, Zirui, et al.
Published: (2025)
by: Wang, Zirui, et al.
Published: (2025)
MASR: Self-Reflective Reasoning through Multimodal Hierarchical Attention Focusing for Agent-based Video Understanding
by: Cao, Shiwen, et al.
Published: (2025)
by: Cao, Shiwen, et al.
Published: (2025)
MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
by: Lin, Baijiong, et al.
Published: (2024)
by: Lin, Baijiong, et al.
Published: (2024)
Trustworthy Automated Driving through Qualitative Scene Understanding and Explanations
by: Belmecheri, Nassim, et al.
Published: (2024)
by: Belmecheri, Nassim, et al.
Published: (2024)
HexPlane Representation for 3D Semantic Scene Understanding
by: Chen, Zeren, et al.
Published: (2025)
by: Chen, Zeren, et al.
Published: (2025)
PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding
by: Nguyen, Vinh
Published: (2024)
by: Nguyen, Vinh
Published: (2024)
DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding
by: Yu, Xiaoxuan, et al.
Published: (2024)
by: Yu, Xiaoxuan, et al.
Published: (2024)
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
by: Hermosilla, Pedro, et al.
Published: (2025)
by: Hermosilla, Pedro, et al.
Published: (2025)
Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs
by: Wang, Wei-Yao, et al.
Published: (2025)
by: Wang, Wei-Yao, et al.
Published: (2025)
Learning to Look: Cognitive Attention Alignment with Vision-Language Models
by: Yang, Ryan L., et al.
Published: (2025)
by: Yang, Ryan L., et al.
Published: (2025)
An Efficient Aerial Image Detection with Variable Receptive Fields
by: Wenbin, Liu
Published: (2025)
by: Wenbin, Liu
Published: (2025)
RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding
by: Liu, Hanqing, et al.
Published: (2026)
by: Liu, Hanqing, et al.
Published: (2026)
Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding
by: Li, Yinghui, et al.
Published: (2026)
by: Li, Yinghui, et al.
Published: (2026)
Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models
by: Xu, Yifan, et al.
Published: (2025)
by: Xu, Yifan, et al.
Published: (2025)
Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning
by: Zhang, Hang, et al.
Published: (2024)
by: Zhang, Hang, et al.
Published: (2024)
Towards Unified Surgical Scene Understanding:Bridging Reasoning and Grounding via MLLMs
by: Huang, Jincai, et al.
Published: (2026)
by: Huang, Jincai, et al.
Published: (2026)
Hierarchical Question-Answering for Driving Scene Understanding Using Vision-Language Models
by: Mohamud, Safaa Abdullahi Moallim, et al.
Published: (2025)
by: Mohamud, Safaa Abdullahi Moallim, et al.
Published: (2025)
Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs
by: Lee, Insu, et al.
Published: (2025)
by: Lee, Insu, et al.
Published: (2025)
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
by: Fu, Rao, et al.
Published: (2024)
by: Fu, Rao, et al.
Published: (2024)
CogME: A Cognition-Inspired Multi-Dimensional Evaluation Metric for Story Understanding
by: Shin, Minjung, et al.
Published: (2021)
by: Shin, Minjung, et al.
Published: (2021)
Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification
by: Barros, Artur, et al.
Published: (2025)
by: Barros, Artur, et al.
Published: (2025)
EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition
by: Wang, Xiao, et al.
Published: (2025)
by: Wang, Xiao, et al.
Published: (2025)
NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
by: Tian, Kexin, et al.
Published: (2025)
by: Tian, Kexin, et al.
Published: (2025)
Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation
by: Gong, Boyang, et al.
Published: (2026)
by: Gong, Boyang, et al.
Published: (2026)
Similar Items
-
Curvature-Aware Captioning:Leveraging Geodesic Attention for 3D Scene Understanding
by: He, Ziyao, et al.
Published: (2026) -
SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding
by: Zeng, Nianbo, et al.
Published: (2025) -
Lifting Unlabeled Internet-level Data for 3D Scene Understanding
by: Chen, Yixin, et al.
Published: (2026) -
HAMF: A Hybrid Attention-Mamba Framework for Joint Scene Context Understanding and Future Motion Representation Learning
by: Mei, Xiaodong, et al.
Published: (2025) -
TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes
by: Fu, Yanping, et al.
Published: (2024)