Saved in:
| Main Authors: | Jiang, Hongbo, Li, Jie, Shen, Yunhang, Dai, Pingyang, Sun, Xing, Cao, Haoyu, Cao, Liujuan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.23711 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unleashing MLLMs on the Edge: A Unified Framework for Cross-Modal ReID via Adaptive SVD Distillation
by: Jiang, Hongbo, et al.
Published: (2026)
by: Jiang, Hongbo, et al.
Published: (2026)
FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification
by: Sun, Zhen, et al.
Published: (2025)
by: Sun, Zhen, et al.
Published: (2025)
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
by: Li, Lijiang, et al.
Published: (2026)
by: Li, Lijiang, et al.
Published: (2026)
Understanding What Is Not Said:Referring Remote Sensing Image Segmentation with Scarce Expressions
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
DPM++: Dynamic Masked Metric Learning for Occluded Person Re-identification
by: Tan, Lei, et al.
Published: (2026)
by: Tan, Lei, et al.
Published: (2026)
Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search
by: Tan, Lei, et al.
Published: (2024)
by: Tan, Lei, et al.
Published: (2024)
PartFormer: Awakening Latent Diverse Representation from Vision Transformer for Object Re-Identification
by: Tan, Lei, et al.
Published: (2024)
by: Tan, Lei, et al.
Published: (2024)
HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection
by: Cao, Liujuan, et al.
Published: (2024)
by: Cao, Liujuan, et al.
Published: (2024)
RIS-LAD: A Benchmark and Model for Referring Low-Altitude Drone Image Segmentation
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-identification
by: Xia, Jiaer, et al.
Published: (2023)
by: Xia, Jiaer, et al.
Published: (2023)
More Clear, More Flexible, More Precise: A Comprehensive Oriented Object Detection benchmark for UAV
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation
by: Lin, Jianghang, et al.
Published: (2025)
by: Lin, Jianghang, et al.
Published: (2025)
Pseudo-Label Quality Decoupling and Correction for Semi-Supervised Instance Segmentation
by: Lin, Jianghang, et al.
Published: (2025)
by: Lin, Jianghang, et al.
Published: (2025)
Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models
by: Li, Xudong, et al.
Published: (2024)
by: Li, Xudong, et al.
Published: (2024)
Multi-Modal Prompt Learning on Blind Image Quality Assessment
by: Pan, Wensheng, et al.
Published: (2024)
by: Pan, Wensheng, et al.
Published: (2024)
SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models
by: Tang, Zhenwei, et al.
Published: (2025)
by: Tang, Zhenwei, et al.
Published: (2025)
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
by: Gao, Timin, et al.
Published: (2024)
by: Gao, Timin, et al.
Published: (2024)
Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs
by: Dai, Shaohui, et al.
Published: (2025)
by: Dai, Shaohui, et al.
Published: (2025)
Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity
by: Li, Xudong, et al.
Published: (2023)
by: Li, Xudong, et al.
Published: (2023)
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
by: Guo, Yansong, et al.
Published: (2025)
by: Guo, Yansong, et al.
Published: (2025)
GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane
by: Qu, Yansong, et al.
Published: (2024)
by: Qu, Yansong, et al.
Published: (2024)
Feature Denoising Diffusion Model for Blind Image Quality Assessment
by: Li, Xudong, et al.
Published: (2024)
by: Li, Xudong, et al.
Published: (2024)
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
by: Li, Xin, et al.
Published: (2024)
by: Li, Xin, et al.
Published: (2024)
Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization
by: Shen, You, et al.
Published: (2025)
by: Shen, You, et al.
Published: (2025)
UniVST: A Unified Framework for Training-free Localized Video Style Transfer
by: Song, Quanjian, et al.
Published: (2024)
by: Song, Quanjian, et al.
Published: (2024)
StereoVGGT: A Training-Free Visual Geometry Transformer for Stereo Vision
by: Chen, Ziyang, et al.
Published: (2026)
by: Chen, Ziyang, et al.
Published: (2026)
UniPTS: A Unified Framework for Proficient Post-Training Sparsity
by: Xie, Jingjing, et al.
Published: (2024)
by: Xie, Jingjing, et al.
Published: (2024)
DeOcc-1-to-3: 3D De-Occlusion from a Single Image via Self-Supervised Multi-View Diffusion
by: Qu, Yansong, et al.
Published: (2025)
by: Qu, Yansong, et al.
Published: (2025)
VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation
by: Dong, Shaoqi, et al.
Published: (2025)
by: Dong, Shaoqi, et al.
Published: (2025)
Discover, Segment, and Select: A Progressive Mechanism for Zero-shot Camouflaged Object Segmentation
by: Yang, Yilong, et al.
Published: (2026)
by: Yang, Yilong, et al.
Published: (2026)
Purifying, Labeling, and Utilizing: A High-Quality Pipeline for Small Object Detection
by: Wang, Siwei, et al.
Published: (2025)
by: Wang, Siwei, et al.
Published: (2025)
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
by: Zhang, Jihai, et al.
Published: (2025)
by: Zhang, Jihai, et al.
Published: (2025)
DREAM: Document Reconstruction via End-to-end Autoregressive Model
by: Li, Xin, et al.
Published: (2025)
by: Li, Xin, et al.
Published: (2025)
Depth-Guided Semi-Supervised Instance Segmentation
by: Chen, Xin, et al.
Published: (2024)
by: Chen, Xin, et al.
Published: (2024)
RLE: A Unified Perspective of Data Augmentation for Cross-Spectral Re-identification
by: Tan, Lei, et al.
Published: (2024)
by: Tan, Lei, et al.
Published: (2024)
Solving the Catastrophic Forgetting Problem in Generalized Category Discovery
by: Cao, Xinzi, et al.
Published: (2025)
by: Cao, Xinzi, et al.
Published: (2025)
TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning
by: Xie, Jingjing, et al.
Published: (2024)
by: Xie, Jingjing, et al.
Published: (2024)
Unified Personalized Understanding, Generating and Editing
by: Zhong, Yu, et al.
Published: (2026)
by: Zhong, Yu, et al.
Published: (2026)
Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text
by: Li, Xinyang, et al.
Published: (2024)
by: Li, Xinyang, et al.
Published: (2024)
Similar Items
-
Unleashing MLLMs on the Edge: A Unified Framework for Cross-Modal ReID via Adaptive SVD Distillation
by: Jiang, Hongbo, et al.
Published: (2026) -
FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification
by: Sun, Zhen, et al.
Published: (2025) -
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
by: Li, Lijiang, et al.
Published: (2026) -
Understanding What Is Not Said:Referring Remote Sensing Image Segmentation with Scarce Expressions
by: Ye, Kai, et al.
Published: (2025) -
DPM++: Dynamic Masked Metric Learning for Occluded Person Re-identification
by: Tan, Lei, et al.
Published: (2026)