Saved in:
| Main Authors: | Li, Junhang, Guo, Yu, Xian, Chuhua, He, Shengfeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.17649 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AtomicMotion: Learning Human Motion From Different Human Parts
by: Liu, Runzhen, et al.
Published: (2026)
by: Liu, Runzhen, et al.
Published: (2026)
InstructSAM: Segment Any Instance with Any Instructions
by: Yuan, Yuqian, et al.
Published: (2026)
by: Yuan, Yuqian, et al.
Published: (2026)
Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification
by: Xiang, Tuo, et al.
Published: (2025)
by: Xiang, Tuo, et al.
Published: (2025)
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following
by: Li, Shufan, et al.
Published: (2023)
by: Li, Shufan, et al.
Published: (2023)
FocalCount: Towards Class-Count Imbalance in Class-Agnostic Counting
by: Zhu, Huilin, et al.
Published: (2025)
by: Zhu, Huilin, et al.
Published: (2025)
Zero-shot Object Counting with Good Exemplars
by: Zhu, Huilin, et al.
Published: (2024)
by: Zhu, Huilin, et al.
Published: (2024)
Identity-Preserving Video Dubbing Using Motion Warping
by: Liu, Runzhen, et al.
Published: (2025)
by: Liu, Runzhen, et al.
Published: (2025)
Expanding Zero-Shot Object Counting with Rich Prompts
by: Zhu, Huilin, et al.
Published: (2025)
by: Zhu, Huilin, et al.
Published: (2025)
Connecting Giants: Synergistic Knowledge Transfer of Large Multimodal Models for Few-Shot Learning
by: Tang, Hao, et al.
Published: (2025)
by: Tang, Hao, et al.
Published: (2025)
DenseTrack: Drone-based Crowd Tracking via Density-aware Motion-appearance Synergy
by: Lei, Yi, et al.
Published: (2024)
by: Lei, Yi, et al.
Published: (2024)
Zero-Shot Video Translation via Token Warping
by: Zhu, Haiming, et al.
Published: (2024)
by: Zhu, Haiming, et al.
Published: (2024)
DA$^{2}$: Depth Anything in Any Direction
by: Li, Haodong, et al.
Published: (2025)
by: Li, Haodong, et al.
Published: (2025)
MixSA: Training-free Reference-based Sketch Extraction via Mixture-of-Self-Attention
by: Yang, Rui, et al.
Published: (2025)
by: Yang, Rui, et al.
Published: (2025)
Stroke2Sketch: Harnessing Stroke Attributes for Training-Free Sketch Generation
by: Yang, Rui, et al.
Published: (2025)
by: Yang, Rui, et al.
Published: (2025)
Rethinking Multi-view Representation Learning via Distilled Disentangling
by: Ke, Guanzhou, et al.
Published: (2024)
by: Ke, Guanzhou, et al.
Published: (2024)
Any2Any: Unified Arbitrary Modality Translation for Remote Sensing
by: Chen, Haoyang, et al.
Published: (2026)
by: Chen, Haoyang, et al.
Published: (2026)
OneRestore: A Universal Restoration Framework for Composite Degradation
by: Guo, Yu, et al.
Published: (2024)
by: Guo, Yu, et al.
Published: (2024)
InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image
by: Li, Jianhui, et al.
Published: (2023)
by: Li, Jianhui, et al.
Published: (2023)
AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario
by: Li, Yuhan, et al.
Published: (2024)
by: Li, Yuhan, et al.
Published: (2024)
SAMCT: Segment Any CT Allowing Labor-Free Task-Indicator Prompts
by: Lin, Xian, et al.
Published: (2024)
by: Lin, Xian, et al.
Published: (2024)
Judge Anything: MLLM as a Judge Across Any Modality
by: Pu, Shu, et al.
Published: (2025)
by: Pu, Shu, et al.
Published: (2025)
Teacher-Student Diffusion Model for Text-Driven 3D Hand Motion Generation
by: Cheng, Ching-Lam, et al.
Published: (2026)
by: Cheng, Ching-Lam, et al.
Published: (2026)
Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency
by: Xu, Yingjie, et al.
Published: (2024)
by: Xu, Yingjie, et al.
Published: (2024)
Seeing through Unclear Glass: Occlusion Removal with One Shot
by: Li, Qiang, et al.
Published: (2025)
by: Li, Qiang, et al.
Published: (2025)
Segment Any-Quality Images with Generative Latent Space Enhancement
by: Guo, Guangqian, et al.
Published: (2025)
by: Guo, Guangqian, et al.
Published: (2025)
Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes
by: Feng, Zhiyuan, et al.
Published: (2025)
by: Feng, Zhiyuan, et al.
Published: (2025)
InstructTable: Improving Table Structure Recognition Through Instructions
by: Chen, Boming, et al.
Published: (2026)
by: Chen, Boming, et al.
Published: (2026)
Lagrangian Motion Fields for Long-term Motion Generation
by: Yang, Yifei, et al.
Published: (2024)
by: Yang, Yifei, et al.
Published: (2024)
Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator
by: He, Xiankang, et al.
Published: (2025)
by: He, Xiankang, et al.
Published: (2025)
Unfolding 3D Gaussian Splatting via Iterative Gaussian Synopsis
by: Lu, Yuqin, et al.
Published: (2026)
by: Lu, Yuqin, et al.
Published: (2026)
Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks
by: Guo, Hailong, et al.
Published: (2025)
by: Guo, Hailong, et al.
Published: (2025)
Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection
by: Guo, Yu, et al.
Published: (2025)
by: Guo, Yu, et al.
Published: (2025)
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
by: Wang, Yuchi, et al.
Published: (2024)
by: Wang, Yuchi, et al.
Published: (2024)
Any-to-Any Learning in Computational Pathology via Triplet Multimodal Pretraining
by: Sun, Qichen, et al.
Published: (2025)
by: Sun, Qichen, et al.
Published: (2025)
InstructX: Towards Unified Visual Editing with MLLM Guidance
by: Mou, Chong, et al.
Published: (2025)
by: Mou, Chong, et al.
Published: (2025)
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models
by: He, Zhentao, et al.
Published: (2025)
by: He, Zhentao, et al.
Published: (2025)
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts
by: Rowles, Ciara, et al.
Published: (2024)
by: Rowles, Ciara, et al.
Published: (2024)
Any-Shift Prompting for Generalization over Distributions
by: Xiao, Zehao, et al.
Published: (2024)
by: Xiao, Zehao, et al.
Published: (2024)
UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark
by: Li, Yanlin, et al.
Published: (2026)
by: Li, Yanlin, et al.
Published: (2026)
AnyI2V: Animating Any Conditional Image with Motion Control
by: Li, Ziye, et al.
Published: (2025)
by: Li, Ziye, et al.
Published: (2025)
Similar Items
-
AtomicMotion: Learning Human Motion From Different Human Parts
by: Liu, Runzhen, et al.
Published: (2026) -
InstructSAM: Segment Any Instance with Any Instructions
by: Yuan, Yuqian, et al.
Published: (2026) -
Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification
by: Xiang, Tuo, et al.
Published: (2025) -
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following
by: Li, Shufan, et al.
Published: (2023) -
FocalCount: Towards Class-Count Imbalance in Class-Agnostic Counting
by: Zhu, Huilin, et al.
Published: (2025)