Saved in:
| Main Authors: | Tang, Jinzhou, Liu, Sidi, Xiu, Waikit, Chen, Weixing, Wang, Keze |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.16899 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Pixels: Introducing Geometric-Semantic World Priors for Video-based Embodied Models via Spatio-temporal Alignment
by: Tang, Jinzhou, et al.
Published: (2025)
by: Tang, Jinzhou, et al.
Published: (2025)
Traffic-MLLM: Curiosity-Regularized Supervised Learning for Traffic Scenario Case-Based Reasoning
by: Xiu, Waikit, et al.
Published: (2025)
by: Xiu, Waikit, et al.
Published: (2025)
From Motion to Behavior: Hierarchical Modeling of Humanoid Generative Behavior Control
by: Zhang, Jusheng, et al.
Published: (2025)
by: Zhang, Jusheng, et al.
Published: (2025)
Contrastive Learning-Driven Traffic Sign Perception: Multi-Modal Fusion of Text and Vision
by: Lu, Qiang, et al.
Published: (2025)
by: Lu, Qiang, et al.
Published: (2025)
DreamSAC: Learning Hamiltonian World Models via Symmetry Exploration
by: Tang, Jinzhou, et al.
Published: (2026)
by: Tang, Jinzhou, et al.
Published: (2026)
Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains
by: Zhang, Jesen, et al.
Published: (2025)
by: Zhang, Jesen, et al.
Published: (2025)
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
by: Wang, Zeqing, et al.
Published: (2023)
by: Wang, Zeqing, et al.
Published: (2023)
ResAgent: Entropy-based Prior Point Discovery and Visual Reasoning for Referring Expression Segmentation
by: Wang, Yihao, et al.
Published: (2026)
by: Wang, Yihao, et al.
Published: (2026)
Training-Free Spatio-temporal Decoupled Reasoning Video Segmentation with Adaptive Object Memory
by: Zhu, Zhengtong, et al.
Published: (2026)
by: Zhu, Zhengtong, et al.
Published: (2026)
Frozen LLMs as Map-Aware Spatio-Temporal Reasoners for Vehicle Trajectory Prediction
by: Liu, Yanjiao, et al.
Published: (2026)
by: Liu, Yanjiao, et al.
Published: (2026)
Adaptive-VoCo: Complexity-Aware Visual Token Compression for Vision-Language Models
by: Guo, Xiaoyang, et al.
Published: (2025)
by: Guo, Xiaoyang, et al.
Published: (2025)
Spatio-temporal Decoupled Knowledge Compensator for Few-Shot Action Recognition
by: Qu, Hongyu, et al.
Published: (2026)
by: Qu, Hongyu, et al.
Published: (2026)
Weather-R1: Logically Consistent Reinforcement Fine-Tuning for Multimodal Reasoning in Meteorology
by: Wu, Kaiyu, et al.
Published: (2026)
by: Wu, Kaiyu, et al.
Published: (2026)
Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs
by: Wan, Wentao, et al.
Published: (2025)
by: Wan, Wentao, et al.
Published: (2025)
MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
by: Zhang, Jusheng, et al.
Published: (2025)
by: Zhang, Jusheng, et al.
Published: (2025)
Towards Explainable Industrial Anomaly Detection via Knowledge-Guided Latent Reasoning
by: Chen, Peng, et al.
Published: (2026)
by: Chen, Peng, et al.
Published: (2026)
Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)
by: Li, Bangzheng, et al.
Published: (2025)
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
by: Song, Xinshuai, et al.
Published: (2024)
by: Song, Xinshuai, et al.
Published: (2024)
GTMA: Dynamic Representation Optimization for OOD Vision-Language Models
by: Zhang, Jensen, et al.
Published: (2025)
by: Zhang, Jensen, et al.
Published: (2025)
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
by: Lin, Jingli, et al.
Published: (2025)
by: Lin, Jingli, et al.
Published: (2025)
Process-of-Thought Reasoning for Videos
by: Zhang, Jusheng, et al.
Published: (2026)
by: Zhang, Jusheng, et al.
Published: (2026)
TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models
by: Wang, Zeqing, et al.
Published: (2025)
by: Wang, Zeqing, et al.
Published: (2025)
SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting
by: Wu, Hefeng, et al.
Published: (2023)
by: Wu, Hefeng, et al.
Published: (2023)
Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective
by: Xue, Qiyao, et al.
Published: (2025)
by: Xue, Qiyao, et al.
Published: (2025)
3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians
by: Wei, Zeming, et al.
Published: (2025)
by: Wei, Zeming, et al.
Published: (2025)
ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving
by: Peng, Qihang, et al.
Published: (2025)
by: Peng, Qihang, et al.
Published: (2025)
Spatio-temporal Sign Language Representation and Translation
by: Hamidullah, Yasser, et al.
Published: (2025)
by: Hamidullah, Yasser, et al.
Published: (2025)
Semantic-Enriched Latent Visual Reasoning
by: Xu, Tianrun, et al.
Published: (2026)
by: Xu, Tianrun, et al.
Published: (2026)
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
by: Tang, Chen, et al.
Published: (2025)
by: Tang, Chen, et al.
Published: (2025)
An Efficient System for Automatic Map Storytelling -- A Case Study on Historical Maps
by: Liu, Ziyi, et al.
Published: (2024)
by: Liu, Ziyi, et al.
Published: (2024)
Improving Network Interpretability via Explanation Consistency Evaluation
by: Wu, Hefeng, et al.
Published: (2024)
by: Wu, Hefeng, et al.
Published: (2024)
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer
by: Zhang, Pengze, et al.
Published: (2026)
by: Zhang, Pengze, et al.
Published: (2026)
PSTTS: A Plug-and-Play Token Selector for Efficient Event-based Spatio-temporal Representation Learning
by: Zhao, Xiangmo, et al.
Published: (2025)
by: Zhao, Xiangmo, et al.
Published: (2025)
Generative AI in Map-Making: A Technical Exploration and Its Implications for Cartographers
by: Affolter, Claudio, et al.
Published: (2025)
by: Affolter, Claudio, et al.
Published: (2025)
SpaceMind++: Toward Allocentric Cognitive Maps for Spatially Grounded Video MLLMs
by: Gu, Bo, et al.
Published: (2026)
by: Gu, Bo, et al.
Published: (2026)
PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models
by: Wang, Zeqing, et al.
Published: (2025)
by: Wang, Zeqing, et al.
Published: (2025)
ReasonMap: Towards Fine-Grained Visual Reasoning from Transit Maps
by: Feng, Sicheng, et al.
Published: (2025)
by: Feng, Sicheng, et al.
Published: (2025)
DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition
by: Liu, Haijing, et al.
Published: (2025)
by: Liu, Haijing, et al.
Published: (2025)
Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition
by: Liu, Haijing, et al.
Published: (2024)
by: Liu, Haijing, et al.
Published: (2024)
Hierarchical Spatio-temporal Segmentation Network for Ejection Fraction Estimation in Echocardiography Videos
by: Wang, Dongfang, et al.
Published: (2025)
by: Wang, Dongfang, et al.
Published: (2025)
Similar Items
-
Beyond Pixels: Introducing Geometric-Semantic World Priors for Video-based Embodied Models via Spatio-temporal Alignment
by: Tang, Jinzhou, et al.
Published: (2025) -
Traffic-MLLM: Curiosity-Regularized Supervised Learning for Traffic Scenario Case-Based Reasoning
by: Xiu, Waikit, et al.
Published: (2025) -
From Motion to Behavior: Hierarchical Modeling of Humanoid Generative Behavior Control
by: Zhang, Jusheng, et al.
Published: (2025) -
Contrastive Learning-Driven Traffic Sign Perception: Multi-Modal Fusion of Text and Vision
by: Lu, Qiang, et al.
Published: (2025) -
DreamSAC: Learning Hamiltonian World Models via Symmetry Exploration
by: Tang, Jinzhou, et al.
Published: (2026)