:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Jiahui, Chen, Yurui, Zhou, Yanpeng, Xu, Yueming, Huang, Ze, Mei, Jilin, Chen, Junhui, Yuan, Yu-Jie, Cai, Xinyue, Huang, Guowei, Quan, Xingyue, Xu, Hang, Zhang, Li
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.22976
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
by: Zhang, Jiahui, et al.
Published: (2025)

UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding
by: Xu, Yueming, et al.
Published: (2025)

GraphCoT-VLA: A 3D Spatial-Aware Reasoning Vision-Language-Action Model for Robotic Manipulation with Ambiguous Instructions
by: Huang, Helong, et al.
Published: (2025)

Whole-Body Inverse Kinematics with Graph Diffusion
by: Huang, Helong, et al.
Published: (2026)

RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training
by: Nie, Yunshuang, et al.
Published: (2026)

Beyond Flatlands: Unlocking Spatial Intelligence by Decoupling 3D Reasoning from Numerical Regression
by: Guo, Zhongbin, et al.
Published: (2025)

A Computer Vision Problem in Flatland
by: Agarwal, Sameer, et al.
Published: (2025)

Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation
by: Huang, Jinbang, et al.
Published: (2025)

Scale‐Invariant Waveguiding in Flatland
by: Zhixia Xu, et al.
Published: (2026)

Circular Isoptics in Flatland
by: Thomas, Alexander
Published: (2025)

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
by: Nie, Ming, et al.
Published: (2023)

LaneCorrect: Self-supervised Lane Detection
by: Nie, Ming, et al.
Published: (2024)

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning
by: Zhang, Gengyuan, et al.
Published: (2023)

ForceFlow: Learning to Feel and Act via Contact-Driven Flow Matching
by: Zhang, Shuoheng, et al.
Published: (2026)

Ru Single Atoms Integrated into Cobalt Oxide Spinel Structure with Interstitial Carbon for Enhanced Electrocatalytic Water Oxidation
by: Guowei Wang, et al.
Published: (2024)

Observation of Analog Flatland Cherenkov Radiations on Metasurfaces (Laser Photonics Rev. 18(2)/2024)
by: Zhixia Xu, et al.
Published: (2024)

SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning
by: Liu, Yuecheng, et al.
Published: (2025)

Prompting CO 2 Electroreduction to Ethanol by Iron Group Metal Ion Dopants Induced Multi‐sites at the Interface of SnSe/SnSe 2 p–n Heterojunction
by: Xinyue Zheng, et al.
Published: (2024)

ANCoEF: Asynchronous Neuromorphic Algorithm/Hardware Co-Exploration Framework with a Fully Asynchronous Simulator
by: Zhang, Jian, et al.
Published: (2024)

Frontispiece: Scale‐Invariant Waveguiding in Flatland (EXP2 1/2026)
by: Zhixia Xu, et al.
Published: (2026)

TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?
by: Yan, Lewen, et al.
Published: (2025)

Copy-Move Forgery Detection and Question Answering for Remote Sensing Image
by: Zhang, Ze, et al.
Published: (2024)

DeflareMamba: Hierarchical Vision Mamba for Contextually Consistent Lens Flare Removal
by: Huang, Yihang, et al.
Published: (2025)

Learning Global Hypothesis Space for Enhancing Synergistic Reasoning Chain
by: Zhang, Jiaquan, et al.
Published: (2026)

Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model
by: Lou, Xinyue, et al.
Published: (2025)

Reinforcing Action Policies by Prophesying
by: Zhang, Jiahui, et al.
Published: (2025)

WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation
by: Lu, Jiachen, et al.
Published: (2023)

OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation
by: Jiang, Haochen, et al.
Published: (2024)

Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation
by: Zhang, Lingfeng, et al.
Published: (2025)

Case Law Grounding: Using Precedents to Align Decision-Making for Humans and AI
by: Chen, Quan Ze, et al.
Published: (2023)

UNIT: Unifying Image and Text Recognition in One Vision Encoder
by: Zhu, Yi, et al.
Published: (2024)

Enhanced DSP Architecture for Small Floating‐Point Based Deep Learning Accelerators on FPGAs
by: Kuiming Ma, et al.
Published: (2026)

The Relationships Between Contingent Reward Leadership, Perceived Organizational Support, Knowledge‐Sharing Intention, Innovative Culture, and Kindergarten Teachers' Creative Teaching Performance: A Mediated Moderation Model
by: Maoyong Huang, et al.
Published: (2025)

EVLP:Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning
by: Cai, Xinyan, et al.
Published: (2025)

Purely Quadratic Non-Gaussianity from Tachyonic Instability: Primordial Black Holes and Scalar-Induced Gravitational Waves
by: Zhang, He-Xu, et al.
Published: (2026)

DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models
by: Zhu, Jie, et al.
Published: (2025)

Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
by: Chen, Jun, et al.
Published: (2024)

A Multimodal Fusion Framework for Early Non‐Invasive Screening of Cognitive Impairment Using Language Digital Biomarkers
by: Jiahui Xu, et al.
Published: (2025)

Theoretical Insights into Line Graph Transformation on Graph Learning
by: Yang, Fan, et al.
Published: (2024)

SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning
by: Xiang, Kun, et al.
Published: (2025)