Saved in:
| Main Authors: | Zhang, Xingjian, Duan, Yutong, Chen, Zaishu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.17304 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
by: Zhang, Jiahao, et al.
Published: (2024)
by: Zhang, Jiahao, et al.
Published: (2024)
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios
by: Tong, Zebei, et al.
Published: (2026)
by: Tong, Zebei, et al.
Published: (2026)
UniSync: Towards Generalizable and High-Fidelity Lip Synchronization for Challenging Scenarios
by: Fan, Ruidi, et al.
Published: (2026)
by: Fan, Ruidi, et al.
Published: (2026)
Where, Not What: Compelling Video LLMs to Learn Geometric Causality for 3D-Grounding
by: Zhong, Yutong
Published: (2025)
by: Zhong, Yutong
Published: (2025)
Anomaly Triplet-Net: Progress Recognition Model Using Deep Metric Learning Considering Occlusion for Manual Assembly Work
by: Kitsukawa, Takumi, et al.
Published: (2025)
by: Kitsukawa, Takumi, et al.
Published: (2025)
Proc-GS: Procedural Building Generation for City Assembly with 3D Gaussians
by: Li, Yixuan, et al.
Published: (2024)
by: Li, Yixuan, et al.
Published: (2024)
Dataset Ownership Verification for Pre-trained Masked Models
by: Xie, Yuechen, et al.
Published: (2025)
by: Xie, Yuechen, et al.
Published: (2025)
Two-Stage Adaptive Network for Semi-Supervised Cross-Domain Crater Detection under Varying Scenario Distributions
by: Liu, Yifan, et al.
Published: (2023)
by: Liu, Yifan, et al.
Published: (2023)
Pair2Scene: Learning Local Object Relations for Procedural Scene Generation
by: Ran, Xingjian, et al.
Published: (2026)
by: Ran, Xingjian, et al.
Published: (2026)
CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation
by: Long, Yuxing, et al.
Published: (2025)
by: Long, Yuxing, et al.
Published: (2025)
SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models
by: Guo, Xianda, et al.
Published: (2024)
by: Guo, Xianda, et al.
Published: (2024)
CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario
by: Duan, Zhizhao, et al.
Published: (2024)
by: Duan, Zhizhao, et al.
Published: (2024)
GOReloc: Graph-based Object-Level Relocalization for Visual SLAM
by: Wang, Yutong, et al.
Published: (2024)
by: Wang, Yutong, et al.
Published: (2024)
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
by: Zhong, Hanwen, et al.
Published: (2025)
by: Zhong, Hanwen, et al.
Published: (2025)
WCCNet: Wavelet-context Cooperative Network for Efficient Multispectral Pedestrian Detection
by: Wang, Xingjian, et al.
Published: (2023)
by: Wang, Xingjian, et al.
Published: (2023)
Towards Accurate One-Stage Object Detection with AP-Loss
by: Chen, Kean, et al.
Published: (2019)
by: Chen, Kean, et al.
Published: (2019)
SegHist: A General Segmentation-based Framework for Chinese Historical Document Text Line Detection
by: Hu, Xingjian, et al.
Published: (2024)
by: Hu, Xingjian, et al.
Published: (2024)
Deterministic World Models for Verification of Closed-loop Vision-based Systems
by: Geng, Yuang, et al.
Published: (2025)
by: Geng, Yuang, et al.
Published: (2025)
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
by: Zhang, Xingjian, et al.
Published: (2025)
by: Zhang, Xingjian, et al.
Published: (2025)
Two-Stage Human Verification using HandCAPTCHA and Anti-Spoofed Finger Biometrics with Feature Selection
by: Bera, Asish, et al.
Published: (2024)
by: Bera, Asish, et al.
Published: (2024)
A Siamese-based Verification System for Open-set Architecture Attribution of Synthetic Images
by: Abady, Lydia, et al.
Published: (2023)
by: Abady, Lydia, et al.
Published: (2023)
Expressive Speech-driven Facial Animation with controllable emotions
by: Chen, Yutong, et al.
Published: (2023)
by: Chen, Yutong, et al.
Published: (2023)
Learning Spatial-Preserving Hierarchical Representations for Digital Pathology
by: Wu, Weiyi, et al.
Published: (2024)
by: Wu, Weiyi, et al.
Published: (2024)
Vision-based Vehicle Re-identification in Bridge Scenario using Flock Similarity
by: Zhang, Chunfeng, et al.
Published: (2024)
by: Zhang, Chunfeng, et al.
Published: (2024)
TemPose-TF-ASF: Two-Stage Bidirectional Stroke Context Fusion for Badminton Stroke Classification
by: Liu, Tzu-Yu, et al.
Published: (2026)
by: Liu, Tzu-Yu, et al.
Published: (2026)
Diffusion-based Light Field Synthesis
by: Gao, Ruisheng, et al.
Published: (2024)
by: Gao, Ruisheng, et al.
Published: (2024)
Color-Pair Guided Robust Zero-Shot 6D Pose Estimation and Tracking of Cluttered Objects on Edge Devices
by: Yang, Xingjian, et al.
Published: (2025)
by: Yang, Xingjian, et al.
Published: (2025)
Causality-based Transfer of Driving Scenarios to Unseen Intersections
by: Glasmacher, Christoph, et al.
Published: (2024)
by: Glasmacher, Christoph, et al.
Published: (2024)
Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation
by: Yan, Yichen, et al.
Published: (2024)
by: Yan, Yichen, et al.
Published: (2024)
Dynamic Risk Assessment Methodology with an LDM-based System for Parking Scenarios
by: Cañas, Paola Natalia, et al.
Published: (2024)
by: Cañas, Paola Natalia, et al.
Published: (2024)
CoReVLA: A Dual-Stage End-to-End Autonomous Driving Framework for Long-Tail Scenarios via Collect-and-Refine
by: Fang, Shiyu, et al.
Published: (2025)
by: Fang, Shiyu, et al.
Published: (2025)
EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
by: Yan, Yichen, et al.
Published: (2023)
by: Yan, Yichen, et al.
Published: (2023)
Fuse & Calibrate: A bi-directional Vision-Language Guided Framework for Referring Image Segmentation
by: Yan, Yichen, et al.
Published: (2024)
by: Yan, Yichen, et al.
Published: (2024)
Verification Mirage: Mapping the Reliability Boundary of Self-Verification in Medical VQA
by: Jin, Ruinan, et al.
Published: (2026)
by: Jin, Ruinan, et al.
Published: (2026)
PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening
by: Wu, RuoCheng, et al.
Published: (2024)
by: Wu, RuoCheng, et al.
Published: (2024)
SSPFusion: A Semantic Structure-Preserving Approach for Infrared and Visible Image Fusion
by: Yang, Qiao, et al.
Published: (2023)
by: Yang, Qiao, et al.
Published: (2023)
Add-SD: Rational Generation without Manual Reference
by: Yang, Lingfeng, et al.
Published: (2024)
by: Yang, Lingfeng, et al.
Published: (2024)
Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild
by: Wang, Xingjian, et al.
Published: (2024)
by: Wang, Xingjian, et al.
Published: (2024)
Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction
by: Shou, Yuntao, et al.
Published: (2024)
by: Shou, Yuntao, et al.
Published: (2024)
CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation
by: Wang, Wenxuan, et al.
Published: (2023)
by: Wang, Wenxuan, et al.
Published: (2023)
Similar Items
-
Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
by: Zhang, Jiahao, et al.
Published: (2024) -
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios
by: Tong, Zebei, et al.
Published: (2026) -
UniSync: Towards Generalizable and High-Fidelity Lip Synchronization for Challenging Scenarios
by: Fan, Ruidi, et al.
Published: (2026) -
Where, Not What: Compelling Video LLMs to Learn Geometric Causality for 3D-Grounding
by: Zhong, Yutong
Published: (2025) -
Anomaly Triplet-Net: Progress Recognition Model Using Deep Metric Learning Considering Occlusion for Manual Assembly Work
by: Kitsukawa, Takumi, et al.
Published: (2025)