Saved in:
| Main Authors: | Ke, Shuyan, Mei, Yifan, Wu, Changli, Zheng, Yonghan, Ji, Jiayi, Cao, Liujuan, Ji, Rongrong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.15670 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
3D-DRES: Detailed 3D Referring Expression Segmentation
by: Chen, Qi, et al.
Published: (2026)
by: Chen, Qi, et al.
Published: (2026)
HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation
by: Lin, Weihuang, et al.
Published: (2025)
by: Lin, Weihuang, et al.
Published: (2025)
MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models
by: Li, Jiale, et al.
Published: (2025)
by: Li, Jiale, et al.
Published: (2025)
Test-Time Computing for Referring Multimodal Large Language Models
by: Wu, Mingrui, et al.
Published: (2026)
by: Wu, Mingrui, et al.
Published: (2026)
MVGGT: Multimodal Visual Geometry Grounded Transformer for Multiview 3D Referring Expression Segmentation
by: Wu, Changli, et al.
Published: (2026)
by: Wu, Changli, et al.
Published: (2026)
HRSAM: Efficient Interactive Segmentation in High-Resolution Images
by: Huang, You, et al.
Published: (2024)
by: Huang, You, et al.
Published: (2024)
Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation
by: Xie, Jingjing, et al.
Published: (2024)
by: Xie, Jingjing, et al.
Published: (2024)
Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive
by: Huang, You, et al.
Published: (2025)
by: Huang, You, et al.
Published: (2025)
Depth-Guided Semi-Supervised Instance Segmentation
by: Chen, Xin, et al.
Published: (2024)
by: Chen, Xin, et al.
Published: (2024)
An Efficient and Mixed Heterogeneous Model for Image Restoration
by: Gu, Yubin, et al.
Published: (2025)
by: Gu, Yubin, et al.
Published: (2025)
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
by: Ji, Jiayi, et al.
Published: (2023)
by: Ji, Jiayi, et al.
Published: (2023)
Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
More Clear, More Flexible, More Precise: A Comprehensive Oriented Object Detection benchmark for UAV
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models
by: Li, Xudong, et al.
Published: (2024)
by: Li, Xudong, et al.
Published: (2024)
Purifying, Labeling, and Utilizing: A High-Quality Pipeline for Small Object Detection
by: Wang, Siwei, et al.
Published: (2025)
by: Wang, Siwei, et al.
Published: (2025)
3D-GRES: Generalized 3D Referring Expression Segmentation
by: Wu, Changli, et al.
Published: (2024)
by: Wu, Changli, et al.
Published: (2024)
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
by: Ma, Yiwei, et al.
Published: (2024)
by: Ma, Yiwei, et al.
Published: (2024)
HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models
by: Guo, Yansong, et al.
Published: (2026)
by: Guo, Yansong, et al.
Published: (2026)
I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
by: Ma, Yiwei, et al.
Published: (2024)
by: Ma, Yiwei, et al.
Published: (2024)
MICON-Bench: Benchmarking and Enhancing Multi-Image Context Image Generation in Unified Multimodal Models
by: Wu, Mingrui, et al.
Published: (2026)
by: Wu, Mingrui, et al.
Published: (2026)
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
by: Zhou, Ziyin, et al.
Published: (2025)
by: Zhou, Ziyin, et al.
Published: (2025)
FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
by: Huang, You, et al.
Published: (2024)
by: Huang, You, et al.
Published: (2024)
Pseudo-Label Quality Decoupling and Correction for Semi-Supervised Instance Segmentation
by: Lin, Jianghang, et al.
Published: (2025)
by: Lin, Jianghang, et al.
Published: (2025)
RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
by: Wu, Changli, et al.
Published: (2024)
by: Wu, Changli, et al.
Published: (2024)
CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method
by: Lin, Mingbao, et al.
Published: (2024)
by: Lin, Mingbao, et al.
Published: (2024)
PartFormer: Awakening Latent Diverse Representation from Vision Transformer for Object Re-Identification
by: Tan, Lei, et al.
Published: (2024)
by: Tan, Lei, et al.
Published: (2024)
What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation
by: Lin, Jianghang, et al.
Published: (2025)
by: Lin, Jianghang, et al.
Published: (2025)
Understanding What Is Not Said:Referring Remote Sensing Image Segmentation with Scarce Expressions
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion
by: Li, Xinyang, et al.
Published: (2024)
by: Li, Xinyang, et al.
Published: (2024)
DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis
by: Chen, Zhongxi, et al.
Published: (2024)
by: Chen, Zhongxi, et al.
Published: (2024)
M4-BLIP: Advancing Multi-Modal Media Manipulation Detection through Face-Enhanced Local Analysis
by: Wu, Hang, et al.
Published: (2025)
by: Wu, Hang, et al.
Published: (2025)
Image Captioning via Dynamic Path Customization
by: Ma, Yiwei, et al.
Published: (2024)
by: Ma, Yiwei, et al.
Published: (2024)
$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
by: Luo, Yaxin, et al.
Published: (2024)
by: Luo, Yaxin, et al.
Published: (2024)
IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation
by: Chen, Qi, et al.
Published: (2025)
by: Chen, Qi, et al.
Published: (2025)
UniVST: A Unified Framework for Training-free Localized Video Style Transfer
by: Song, Quanjian, et al.
Published: (2024)
by: Song, Quanjian, et al.
Published: (2024)
Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search
by: Tan, Lei, et al.
Published: (2024)
by: Tan, Lei, et al.
Published: (2024)
Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models
by: Wu, Mingrui, et al.
Published: (2024)
by: Wu, Mingrui, et al.
Published: (2024)
LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation
by: Song, Quanjian, et al.
Published: (2025)
by: Song, Quanjian, et al.
Published: (2025)
UniPTS: A Unified Framework for Proficient Post-Training Sparsity
by: Xie, Jingjing, et al.
Published: (2024)
by: Xie, Jingjing, et al.
Published: (2024)
CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
by: Lin, Weihuang, et al.
Published: (2025)
by: Lin, Weihuang, et al.
Published: (2025)
Similar Items
-
3D-DRES: Detailed 3D Referring Expression Segmentation
by: Chen, Qi, et al.
Published: (2026) -
HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation
by: Lin, Weihuang, et al.
Published: (2025) -
MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models
by: Li, Jiale, et al.
Published: (2025) -
Test-Time Computing for Referring Multimodal Large Language Models
by: Wu, Mingrui, et al.
Published: (2026) -
MVGGT: Multimodal Visual Geometry Grounded Transformer for Multiview 3D Referring Expression Segmentation
by: Wu, Changli, et al.
Published: (2026)