:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Xingjian, Duan, Yutong, Chen, Zaishu
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.17304
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
by: Zhang, Jiahao, et al.
Published: (2024)

PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios
by: Tong, Zebei, et al.
Published: (2026)

UniSync: Towards Generalizable and High-Fidelity Lip Synchronization for Challenging Scenarios
by: Fan, Ruidi, et al.
Published: (2026)

Where, Not What: Compelling Video LLMs to Learn Geometric Causality for 3D-Grounding
by: Zhong, Yutong
Published: (2025)

Anomaly Triplet-Net: Progress Recognition Model Using Deep Metric Learning Considering Occlusion for Manual Assembly Work
by: Kitsukawa, Takumi, et al.
Published: (2025)

Proc-GS: Procedural Building Generation for City Assembly with 3D Gaussians
by: Li, Yixuan, et al.
Published: (2024)

Dataset Ownership Verification for Pre-trained Masked Models
by: Xie, Yuechen, et al.
Published: (2025)

Two-Stage Adaptive Network for Semi-Supervised Cross-Domain Crater Detection under Varying Scenario Distributions
by: Liu, Yifan, et al.
Published: (2023)

Pair2Scene: Learning Local Object Relations for Procedural Scene Generation
by: Ran, Xingjian, et al.
Published: (2026)

CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation
by: Long, Yuxing, et al.
Published: (2025)

SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models
by: Guo, Xianda, et al.
Published: (2024)

CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario
by: Duan, Zhizhao, et al.
Published: (2024)

GOReloc: Graph-based Object-Level Relocalization for Visual SLAM
by: Wang, Yutong, et al.
Published: (2024)

Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
by: Zhong, Hanwen, et al.
Published: (2025)

WCCNet: Wavelet-context Cooperative Network for Efficient Multispectral Pedestrian Detection
by: Wang, Xingjian, et al.
Published: (2023)

Towards Accurate One-Stage Object Detection with AP-Loss
by: Chen, Kean, et al.
Published: (2019)

SegHist: A General Segmentation-based Framework for Chinese Historical Document Text Line Detection
by: Hu, Xingjian, et al.
Published: (2024)

Deterministic World Models for Verification of Closed-loop Vision-based Systems
by: Geng, Yuang, et al.
Published: (2025)

TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
by: Zhang, Xingjian, et al.
Published: (2025)

Two-Stage Human Verification using HandCAPTCHA and Anti-Spoofed Finger Biometrics with Feature Selection
by: Bera, Asish, et al.
Published: (2024)

A Siamese-based Verification System for Open-set Architecture Attribution of Synthetic Images
by: Abady, Lydia, et al.
Published: (2023)

Expressive Speech-driven Facial Animation with controllable emotions
by: Chen, Yutong, et al.
Published: (2023)

Learning Spatial-Preserving Hierarchical Representations for Digital Pathology
by: Wu, Weiyi, et al.
Published: (2024)

Vision-based Vehicle Re-identification in Bridge Scenario using Flock Similarity
by: Zhang, Chunfeng, et al.
Published: (2024)

TemPose-TF-ASF: Two-Stage Bidirectional Stroke Context Fusion for Badminton Stroke Classification
by: Liu, Tzu-Yu, et al.
Published: (2026)

Diffusion-based Light Field Synthesis
by: Gao, Ruisheng, et al.
Published: (2024)

Color-Pair Guided Robust Zero-Shot 6D Pose Estimation and Tracking of Cluttered Objects on Edge Devices
by: Yang, Xingjian, et al.
Published: (2025)

Causality-based Transfer of Driving Scenarios to Unseen Intersections
by: Glasmacher, Christoph, et al.
Published: (2024)

Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation
by: Yan, Yichen, et al.
Published: (2024)

Dynamic Risk Assessment Methodology with an LDM-based System for Parking Scenarios
by: Cañas, Paola Natalia, et al.
Published: (2024)

CoReVLA: A Dual-Stage End-to-End Autonomous Driving Framework for Long-Tail Scenarios via Collect-and-Refine
by: Fang, Shiyu, et al.
Published: (2025)

EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
by: Yan, Yichen, et al.
Published: (2023)

Fuse & Calibrate: A bi-directional Vision-Language Guided Framework for Referring Image Segmentation
by: Yan, Yichen, et al.
Published: (2024)

Verification Mirage: Mapping the Reliability Boundary of Self-Verification in Medical VQA
by: Jin, Ruinan, et al.
Published: (2026)

PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening
by: Wu, RuoCheng, et al.
Published: (2024)

SSPFusion: A Semantic Structure-Preserving Approach for Infrared and Visible Image Fusion
by: Yang, Qiao, et al.
Published: (2023)

Add-SD: Rational Generation without Manual Reference
by: Yang, Lingfeng, et al.
Published: (2024)

Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild
by: Wang, Xingjian, et al.
Published: (2024)

Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction
by: Shou, Yuntao, et al.
Published: (2024)

CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation
by: Wang, Wenxuan, et al.
Published: (2023)