:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, Peilin, Wu, Yuwei, Gao, Zhi, Fan, Xiaomeng, Yang, Shuo, Jia, Yunde
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.08906
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Large-Scale Riemannian Meta-Optimization via Subspace Adaptation
by: Yu, Peilin, et al.
Published: (2025)

Curvature Learning for Generalization of Hyperbolic Neural Networks
by: Fan, Xiaomeng, et al.
Published: (2025)

A Set-to-Set Distance Measure in Hyperbolic Space
by: Li, Pengxiang, et al.
Published: (2025)

Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
by: Wu, Wei, et al.
Published: (2025)

Geometry-aware Distance Measure for Diverse Hierarchical Structures in Hyperbolic Spaces
by: Li, Pengxiang, et al.
Published: (2025)

Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning
by: Fan, Xiaomeng, et al.
Published: (2025)

Adaptive Model Ensemble for Continual Learning
by: Mao, Yuchuan, et al.
Published: (2025)

Consistency of Compositional Generalization across Multiple Levels
by: Li, Chuanhao, et al.
Published: (2024)

Temporally Consistent Stereo Matching
by: Zeng, Jiaxi, et al.
Published: (2024)

Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction
by: Chen, Xu, et al.
Published: (2026)

Fine-Grained 3D Facial Reconstruction for Micro-Expressions
by: Sun, Che, et al.
Published: (2026)

Multi-Label Stereo Matching for Transparent Scene Depth Estimation
by: Liu, Zhidan, et al.
Published: (2025)

MIRROR: Multimodal Iterative Reasoning via Reflection on Visual Regions
by: Zhang, Haoyu, et al.
Published: (2026)

Diving into the Fusion of Monocular Priors for Generalized Stereo Matching
by: Yao, Chengtang, et al.
Published: (2025)

3D Visual Illusion Depth Estimation
by: Yao, Chengtang, et al.
Published: (2025)

LongSplat: Online Generalizable 3D Gaussian Splatting from Long Sequence Images
by: Huang, Guichen, et al.
Published: (2025)

Multi-Sourced Compositional Generalization in Visual Question Answering
by: Li, Chuanhao, et al.
Published: (2025)

Composition-Incremental Learning for Compositional Generalization
by: Li, Zhen, et al.
Published: (2025)

FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models
by: Li, Pengxiang, et al.
Published: (2024)

AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning Process
by: Zhang, Xintong, et al.
Published: (2026)

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
by: Gao, Zhi, et al.
Published: (2024)

Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
by: Zhang, Xintong, et al.
Published: (2025)

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
by: Li, Pengxiang, et al.
Published: (2025)

World knowledge-enhanced Reasoning Using Instruction-guided Interactor in Autonomous Driving
by: Zhai, Mingliang, et al.
Published: (2024)

IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
by: Yang, Xiaomeng, et al.
Published: (2023)

Rethinking the Encoding and Annotating of 3D Bounding Box: Corner-Aware 3D Object Detection from Point Clouds
by: Meng, Qinghao, et al.
Published: (2025)

Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition
by: Jia, Yunbing, et al.
Published: (2024)

Dual-stream Feature Augmentation for Domain Generalization
by: Wang, Shanshan, et al.
Published: (2024)

A Single-step Accurate Fingerprint Registration Method Based on Local Feature Matching
by: Jia, Yuwei, et al.
Published: (2025)

Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation
by: Yang, Xiaomeng, et al.
Published: (2025)

Masked and Permuted Implicit Context Learning for Scene Text Recognition
by: Yang, Xiaomeng, et al.
Published: (2023)

PiercingEye: Dual-Space Video Violence Detection with Hyperbolic Vision-Language Guidance
by: Leng, Jiaxu, et al.
Published: (2025)

Leveraging Segment Anything Model for Source-Free Domain Adaptation via Dual Feature Guided Auto-Prompting
by: Huai, Zheang, et al.
Published: (2025)

Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments
by: Yu, Meng, et al.
Published: (2024)

Nonlinear Bipolar Compensation: Handling Outliers in Post-Training Quantization
by: Sun, Peilin, et al.
Published: (2026)

Improving Contactless Fingerprint Recognition with Robust 3D Feature Extraction and Graph Embedding
by: Jia, Yuwei, et al.
Published: (2024)

METOR: A Unified Framework for Mutual Enhancement of Objects and Relationships in Open-vocabulary Video Visual Relationship Detection
by: Wang, Yongqi, et al.
Published: (2025)

OpenHype: Hyperbolic Embeddings for Hierarchical Open-Vocabulary Radiance Fields
by: Weijler, Lisa, et al.
Published: (2025)

Benchmarking and Improving GUI Agents in High-Dynamic Environments
by: Liu, Enqi, et al.
Published: (2026)

An Illumination-Robust Feature Extractor Augmented by Relightable 3D Reconstruction
by: Zhao, Shunyi, et al.
Published: (2024)