:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Junhang, Guo, Yu, Xian, Chuhua, He, Shengfeng
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.17649
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AtomicMotion: Learning Human Motion From Different Human Parts
by: Liu, Runzhen, et al.
Published: (2026)

InstructSAM: Segment Any Instance with Any Instructions
by: Yuan, Yuqian, et al.
Published: (2026)

Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification
by: Xiang, Tuo, et al.
Published: (2025)

InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following
by: Li, Shufan, et al.
Published: (2023)

FocalCount: Towards Class-Count Imbalance in Class-Agnostic Counting
by: Zhu, Huilin, et al.
Published: (2025)

Zero-shot Object Counting with Good Exemplars
by: Zhu, Huilin, et al.
Published: (2024)

Identity-Preserving Video Dubbing Using Motion Warping
by: Liu, Runzhen, et al.
Published: (2025)

Expanding Zero-Shot Object Counting with Rich Prompts
by: Zhu, Huilin, et al.
Published: (2025)

Connecting Giants: Synergistic Knowledge Transfer of Large Multimodal Models for Few-Shot Learning
by: Tang, Hao, et al.
Published: (2025)

DenseTrack: Drone-based Crowd Tracking via Density-aware Motion-appearance Synergy
by: Lei, Yi, et al.
Published: (2024)

Zero-Shot Video Translation via Token Warping
by: Zhu, Haiming, et al.
Published: (2024)

DA$^{2}$: Depth Anything in Any Direction
by: Li, Haodong, et al.
Published: (2025)

MixSA: Training-free Reference-based Sketch Extraction via Mixture-of-Self-Attention
by: Yang, Rui, et al.
Published: (2025)

Stroke2Sketch: Harnessing Stroke Attributes for Training-Free Sketch Generation
by: Yang, Rui, et al.
Published: (2025)

Rethinking Multi-view Representation Learning via Distilled Disentangling
by: Ke, Guanzhou, et al.
Published: (2024)

Any2Any: Unified Arbitrary Modality Translation for Remote Sensing
by: Chen, Haoyang, et al.
Published: (2026)

OneRestore: A Universal Restoration Framework for Composite Degradation
by: Guo, Yu, et al.
Published: (2024)

InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image
by: Li, Jianhui, et al.
Published: (2023)

AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario
by: Li, Yuhan, et al.
Published: (2024)

SAMCT: Segment Any CT Allowing Labor-Free Task-Indicator Prompts
by: Lin, Xian, et al.
Published: (2024)

Judge Anything: MLLM as a Judge Across Any Modality
by: Pu, Shu, et al.
Published: (2025)

Teacher-Student Diffusion Model for Text-Driven 3D Hand Motion Generation
by: Cheng, Ching-Lam, et al.
Published: (2026)

Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency
by: Xu, Yingjie, et al.
Published: (2024)

Seeing through Unclear Glass: Occlusion Removal with One Shot
by: Li, Qiang, et al.
Published: (2025)

Segment Any-Quality Images with Generative Latent Space Enhancement
by: Guo, Guangqian, et al.
Published: (2025)

Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes
by: Feng, Zhiyuan, et al.
Published: (2025)

InstructTable: Improving Table Structure Recognition Through Instructions
by: Chen, Boming, et al.
Published: (2026)

Lagrangian Motion Fields for Long-term Motion Generation
by: Yang, Yifei, et al.
Published: (2024)

Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator
by: He, Xiankang, et al.
Published: (2025)

Unfolding 3D Gaussian Splatting via Iterative Gaussian Synopsis
by: Lu, Yuqin, et al.
Published: (2026)

Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks
by: Guo, Hailong, et al.
Published: (2025)

Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection
by: Guo, Yu, et al.
Published: (2025)

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
by: Wang, Yuchi, et al.
Published: (2024)

Any-to-Any Learning in Computational Pathology via Triplet Multimodal Pretraining
by: Sun, Qichen, et al.
Published: (2025)

InstructX: Towards Unified Visual Editing with MLLM Guidance
by: Mou, Chong, et al.
Published: (2025)

Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models
by: He, Zhentao, et al.
Published: (2025)

IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts
by: Rowles, Ciara, et al.
Published: (2024)

Any-Shift Prompting for Generalization over Distributions
by: Xiao, Zehao, et al.
Published: (2024)

UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark
by: Li, Yanlin, et al.
Published: (2026)

AnyI2V: Animating Any Conditional Image with Motion Control
by: Li, Ziye, et al.
Published: (2025)