Saved in:
| Main Authors: | He, Qingrong, Lin, Kejun, Chen, Shizhe, Hu, Anwen, Jin, Qin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.14705 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Situat3DChange: Situated 3D Change Understanding Dataset for Multimodal Large Language Model
by: Liu, Ruiping, et al.
Published: (2025)
by: Liu, Ruiping, et al.
Published: (2025)
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
by: Zhang, Liang, et al.
Published: (2024)
by: Zhang, Liang, et al.
Published: (2024)
Empowering Large Language Models with 3D Situation Awareness
by: Yuan, Zhihao, et al.
Published: (2025)
by: Yuan, Zhihao, et al.
Published: (2025)
Learning High-Fidelity Robot Self-Model with Articulated 3D Gaussian Splatting
by: Hu, Kejun, et al.
Published: (2025)
by: Hu, Kejun, et al.
Published: (2025)
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models
by: Zhang, Yue, et al.
Published: (2024)
by: Zhang, Yue, et al.
Published: (2024)
Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy
by: Garcia, Ricardo, et al.
Published: (2024)
by: Garcia, Ricardo, et al.
Published: (2024)
ThinkFake: Reasoning in Multimodal Large Language Models for AI-Generated Image Detection
by: Huang, Tai-Ming, et al.
Published: (2025)
by: Huang, Tai-Ming, et al.
Published: (2025)
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
by: Huang, Jiaxin, et al.
Published: (2025)
by: Huang, Jiaxin, et al.
Published: (2025)
Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models
by: Zhang, Jialiang, et al.
Published: (2026)
by: Zhang, Jialiang, et al.
Published: (2026)
Scaling Cross-Environment Failure Reasoning Data for Vision-Language Robotic Manipulation
by: Pacaud, Paul, et al.
Published: (2025)
by: Pacaud, Paul, et al.
Published: (2025)
Think3D: Thinking with Space for Spatial Reasoning
by: Zhang, Zaibin, et al.
Published: (2026)
by: Zhang, Zaibin, et al.
Published: (2026)
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
by: Zhu, Wenxin, et al.
Published: (2025)
by: Zhu, Wenxin, et al.
Published: (2025)
Thinking in Dynamics: How Multimodal Large Language Models Perceive, Track, and Reason Dynamics in Physical 4D World
by: Huang, Yuzhi, et al.
Published: (2026)
by: Huang, Yuzhi, et al.
Published: (2026)
From Semantics, Scene to Instance-awareness: Distilling Foundation Model for Grounded Open-vocabulary Situation Recognition
by: Cai, Chen, et al.
Published: (2025)
by: Cai, Chen, et al.
Published: (2025)
PointACT: Vision-Language-Action Models with Multi-Scale Point-Action Interaction
by: Chen, Shizhe, et al.
Published: (2026)
by: Chen, Shizhe, et al.
Published: (2026)
GTA-Net: An IoT-Integrated 3D Human Pose Estimation System for Real-Time Adolescent Sports Posture Correction
by: Yuan, Shizhe, et al.
Published: (2024)
by: Yuan, Shizhe, et al.
Published: (2024)
SUGAR: Pre-training 3D Visual Representations for Robotics
by: Chen, Shizhe, et al.
Published: (2024)
by: Chen, Shizhe, et al.
Published: (2024)
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
by: Huang, Kuan-Chih, et al.
Published: (2024)
by: Huang, Kuan-Chih, et al.
Published: (2024)
SOE: SO(3)-Equivariant 3D MRI Encoding
by: He, Shizhe, et al.
Published: (2024)
by: He, Shizhe, et al.
Published: (2024)
Enhancing Video Large Language Models with Structured Multi-Video Collaborative Reasoning
by: He, Zhihao, et al.
Published: (2025)
by: He, Zhihao, et al.
Published: (2025)
Rethinking the State Update Gate for Long-Sequence Recurrent 3D Reconstruction
by: Ren, Kejun, et al.
Published: (2026)
by: Ren, Kejun, et al.
Published: (2026)
What if? Emulative Simulation with World Models for Situated Reasoning
by: Liu, Ruiping, et al.
Published: (2026)
by: Liu, Ruiping, et al.
Published: (2026)
Situational Awareness Matters in 3D Vision Language Reasoning
by: Man, Yunze, et al.
Published: (2024)
by: Man, Yunze, et al.
Published: (2024)
Multi-modal Situated Reasoning in 3D Scenes
by: Linghu, Xiongkun, et al.
Published: (2024)
by: Linghu, Xiongkun, et al.
Published: (2024)
Online 3D Scene Reconstruction Using Neural Object Priors
by: Chabal, Thomas, et al.
Published: (2025)
by: Chabal, Thomas, et al.
Published: (2025)
Can Vision-Language Models Think from the Sky? Unifying UAV Reasoning and Generation
by: Sun, Jintao, et al.
Published: (2026)
by: Sun, Jintao, et al.
Published: (2026)
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
by: Hu, Anwen, et al.
Published: (2024)
by: Hu, Anwen, et al.
Published: (2024)
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
by: Li, Yunxin, et al.
Published: (2025)
by: Li, Yunxin, et al.
Published: (2025)
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
by: Wu, Peiran, et al.
Published: (2025)
by: Wu, Peiran, et al.
Published: (2025)
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
by: Wang, Jiaqi, et al.
Published: (2025)
by: Wang, Jiaqi, et al.
Published: (2025)
Enhancing Vision Language Models with Logic Reasoning for Situational Awareness
by: Pradeep, Pavana, et al.
Published: (2026)
by: Pradeep, Pavana, et al.
Published: (2026)
Dual-Anchoring: Addressing State Drift in Vision-Language Navigation
by: Wu, Kangyi, et al.
Published: (2026)
by: Wu, Kangyi, et al.
Published: (2026)
Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models
by: Wang, Lehan, et al.
Published: (2025)
by: Wang, Lehan, et al.
Published: (2025)
NextBestPath: Efficient 3D Mapping of Unseen Environments
by: Li, Shiyao, et al.
Published: (2025)
by: Li, Shiyao, et al.
Published: (2025)
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
by: Hu, Anwen, et al.
Published: (2024)
by: Hu, Anwen, et al.
Published: (2024)
Making Large Language Models Better Planners with Reasoning-Decision Alignment
by: Huang, Zhijian, et al.
Published: (2024)
by: Huang, Zhijian, et al.
Published: (2024)
Reinforcing Video Reasoning with Focused Thinking
by: Dang, Jisheng, et al.
Published: (2025)
by: Dang, Jisheng, et al.
Published: (2025)
OpenMaskDINO3D : Reasoning 3D Segmentation via Large Language Model
by: Zhang, Kunshen
Published: (2025)
by: Zhang, Kunshen
Published: (2025)
Beyond Medical Diagnostics: How Medical Multimodal Large Language Models Think in Space
by: Trinh, Quoc-Huy, et al.
Published: (2026)
by: Trinh, Quoc-Huy, et al.
Published: (2026)
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
by: Ye, Jiabo, et al.
Published: (2024)
by: Ye, Jiabo, et al.
Published: (2024)
Similar Items
-
Situat3DChange: Situated 3D Change Understanding Dataset for Multimodal Large Language Model
by: Liu, Ruiping, et al.
Published: (2025) -
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
by: Zhang, Liang, et al.
Published: (2024) -
Empowering Large Language Models with 3D Situation Awareness
by: Yuan, Zhihao, et al.
Published: (2025) -
Learning High-Fidelity Robot Self-Model with Articulated 3D Gaussian Splatting
by: Hu, Kejun, et al.
Published: (2025) -
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models
by: Zhang, Yue, et al.
Published: (2024)