:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiang, Tong, Zhao, Hongxia, Zhu, Fenghua, Chen, Yuanyuan, Lv, Yisheng
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.13823
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

OpenCOOD-Air: Prompting Heterogeneous Ground-Air Collaborative Perception with Spatial Conversion and Offset Prediction
by: Wu, Xianke, et al.
Published: (2026)

CogRail: Benchmarking VLMs in Cognitive Intrusion Perception for Intelligent Railway Transportation Systems
by: Tian, Yonglin, et al.
Published: (2026)

RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System
by: Guan, Runwei, et al.
Published: (2025)

Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots
by: Liu, Minghuan, et al.
Published: (2025)

CATNet: Collaborative Alignment and Transformation Network for Cooperative Perception
by: Chen, Gong, et al.
Published: (2026)

Med-Scout: Curing MLLMs' Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training
by: Liu, Anglin, et al.
Published: (2026)

MambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive Reordering
by: Tian, Yonglin, et al.
Published: (2024)

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
by: Yue, Tongtian, et al.
Published: (2024)

DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems
by: Zhang, Tong, et al.
Published: (2025)

MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
by: Wang, Ruicheng, et al.
Published: (2025)

Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering
by: Chen, Xiang, et al.
Published: (2024)

Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness
by: Zhao, Jiaxing, et al.
Published: (2025)

Split-Fuse-Transport: Annotation-Free Saliency via Dual Clustering and Optimal Transport Alignment
by: Ramzan, Muhammad Umer, et al.
Published: (2025)

Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer
by: Shao, Fenghua, et al.
Published: (2024)

Gating Syn-to-Real Knowledge for Pedestrian Crossing Prediction in Safe Driving
by: Bai, Jie, et al.
Published: (2024)

MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving
by: Zhang, Enming, et al.
Published: (2024)

Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs
by: Jeziorek, Kamil, et al.
Published: (2025)

Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models
by: Weng, Fenghua, et al.
Published: (2025)

Enabling Intelligent Traffic Systems: A Deep Learning Method for Accurate Arabic License Plate Recognition
by: Sayedelahl, M. A.
Published: (2024)

Self-Supervised Alignment Learning for Medical Image Segmentation
by: Li, Haofeng, et al.
Published: (2024)

Hierarchical Self-Prompting SAM: A Prompt-Free Medical Image Segmentation Framework
by: Zhang, Mengmeng, et al.
Published: (2025)

AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving
by: Zhang, Ruifei, et al.
Published: (2025)

KPLM-STA: Physically-Accurate Shadow Synthesis for Human Relighting via Keypoint-Based Light Modeling
by: Yin, Xinhui, et al.
Published: (2025)

GRAM-MAMBA: Holistic Feature Alignment for Wireless Perception with Adaptive Low-Rank Compensation
by: Yang, Weiqi, et al.
Published: (2025)

All-in-One Transferring Image Compression from Human Perception to Multi-Machine Perception
by: Zhao, Jiancheng, et al.
Published: (2025)

Reconstructing Building Height from Spaceborne TomoSAR Point Clouds Using a Dual-Topology Network
by: Chen, Zhaiyu, et al.
Published: (2026)

Facial Action Unit Detection by Adaptively Constraining Self-Attention and Causally Deconfounding Sample
by: Shao, Zhiwen, et al.
Published: (2024)

RT-DATR: Real-time Unsupervised Domain Adaptive Detection Transformer with Adversarial Feature Alignment
by: Lv, Feng, et al.
Published: (2025)

MeshLAM: Feed-Forward One-Shot Animatable Textured Mesh Avatar Reconstruction
by: He, Yisheng, et al.
Published: (2026)

UniAlignment: Semantic Alignment for Unified Image Generation, Understanding, Manipulation and Perception
by: Song, Xinyang, et al.
Published: (2025)

GVGS: Gaussian Visibility-Aware Multi-View Geometry for Accurate Surface Reconstruction
by: Su, Mai, et al.
Published: (2026)

Self-Localized Collaborative Perception
by: Ni, Zhenyang, et al.
Published: (2024)

EdgePoint2: Compact Descriptors for Superior Efficiency and Accuracy
by: Yao, Haodi, et al.
Published: (2025)

Pre-training CLIP against Data Poisoning with Optimal Transport-based Matching and Alignment
by: Zhang, Tong, et al.
Published: (2025)

Degradation-Aware Residual-Conditioned Optimal Transport for Unified Image Restoration
by: Tang, Xiaole, et al.
Published: (2024)

SARL: Spatially-Aware Self-Supervised Representation Learning for Visuo-Tactile Perception
by: Khurana, Gurmeher, et al.
Published: (2025)

Enabling Fast and Accurate Crowdsourced Annotation for Elevation-Aware Flood Extent Mapping
by: Dyken, Landon, et al.
Published: (2024)

FeaKM: Robust Collaborative Perception under Noisy Pose Conditions
by: Hao, Jiuwu, et al.
Published: (2025)

DA-Mamba: Learning Domain-Aware State Space Model for Global-Local Alignment in Domain Adaptive Object Detection
by: Li, Haochen, et al.
Published: (2026)

MOGeo: Beyond One-to-One Cross-View Object Geo-localization
by: Lv, Bo, et al.
Published: (2026)