:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tang, Xuejiao, Zhang, Wenbin
Format:	Preprint
Published:	2022
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2204.08027
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Curvature-Aware Captioning:Leveraging Geodesic Attention for 3D Scene Understanding
by: He, Ziyao, et al.
Published: (2026)

SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding
by: Zeng, Nianbo, et al.
Published: (2025)

Lifting Unlabeled Internet-level Data for 3D Scene Understanding
by: Chen, Yixin, et al.
Published: (2026)

HAMF: A Hybrid Attention-Mamba Framework for Joint Scene Context Understanding and Future Motion Representation Learning
by: Mei, Xiaodong, et al.
Published: (2025)

TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes
by: Fu, Yanping, et al.
Published: (2024)

Towards Holistic Surgical Scene Understanding
by: Valderrama, Natalia, et al.
Published: (2022)

Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
by: Li, Yansheng, et al.
Published: (2024)

Heat Diffusion Models -- Interpixel Attention Mechanism
by: Zhang, Pengfei, et al.
Published: (2025)

RieMind: Geometry-Grounded Spatial Agent for Scene Understanding
by: Ropero, Fernando, et al.
Published: (2026)

Toward Robust Multimodal Learning using Multimodal Foundational Models
by: Zhao, Xianbing, et al.
Published: (2024)

3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding
by: Linghu, Xiongkun, et al.
Published: (2026)

Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding
by: Ma, Ke, et al.
Published: (2026)

Text-Scene: A Scene-to-Language Parsing Framework for 3D Scene Understanding
by: Li, Haoyuan, et al.
Published: (2025)

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind
by: Li, Qingmei, et al.
Published: (2025)

Evaluating Compositional Scene Understanding in Multimodal Generative Models
by: Fu, Shuhao, et al.
Published: (2025)

Solving Scene Understanding for Autonomous Navigation in Unstructured Environments
by: Renji, Naveen Mathews, et al.
Published: (2025)

DenseScan: Advancing 3D Scene Understanding with 2D Dense Annotation
by: Wang, Zirui, et al.
Published: (2025)

MASR: Self-Reflective Reasoning through Multimodal Hierarchical Attention Focusing for Agent-based Video Understanding
by: Cao, Shiwen, et al.
Published: (2025)

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
by: Lin, Baijiong, et al.
Published: (2024)

Trustworthy Automated Driving through Qualitative Scene Understanding and Explanations
by: Belmecheri, Nassim, et al.
Published: (2024)

HexPlane Representation for 3D Semantic Scene Understanding
by: Chen, Zeren, et al.
Published: (2025)

PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding
by: Nguyen, Vinh
Published: (2024)

DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding
by: Yu, Xiaoxuan, et al.
Published: (2024)

Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
by: Hermosilla, Pedro, et al.
Published: (2025)

Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs
by: Wang, Wei-Yao, et al.
Published: (2025)

Learning to Look: Cognitive Attention Alignment with Vision-Language Models
by: Yang, Ryan L., et al.
Published: (2025)

An Efficient Aerial Image Detection with Variable Receptive Fields
by: Wenbin, Liu
Published: (2025)

RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding
by: Liu, Hanqing, et al.
Published: (2026)

Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding
by: Li, Yinghui, et al.
Published: (2026)

Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models
by: Xu, Yifan, et al.
Published: (2025)

Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning
by: Zhang, Hang, et al.
Published: (2024)

Towards Unified Surgical Scene Understanding:Bridging Reasoning and Grounding via MLLMs
by: Huang, Jincai, et al.
Published: (2026)

Hierarchical Question-Answering for Driving Scene Understanding Using Vision-Language Models
by: Mohamud, Safaa Abdullahi Moallim, et al.
Published: (2025)

Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs
by: Lee, Insu, et al.
Published: (2025)

Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
by: Fu, Rao, et al.
Published: (2024)

CogME: A Cognition-Inspired Multi-Dimensional Evaluation Metric for Story Understanding
by: Shin, Minjung, et al.
Published: (2021)

Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification
by: Barros, Artur, et al.
Published: (2025)

EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition
by: Wang, Xiao, et al.
Published: (2025)

NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
by: Tian, Kexin, et al.
Published: (2025)

Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation
by: Gong, Boyang, et al.
Published: (2026)