Saved in:
| Main Authors: | Tang, Song, Jie, Guangquan, Ding, Henghui, Jiang, Yu-Gang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.14147 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
by: Ying, Kaining, et al.
Published: (2025)
by: Ying, Kaining, et al.
Published: (2025)
Multimodal Referring Segmentation: A Survey
by: Ding, Henghui, et al.
Published: (2025)
by: Ding, Henghui, et al.
Published: (2025)
ReferSplat: Referring Segmentation in 3D Gaussian Splatting
by: He, Shuting, et al.
Published: (2025)
by: He, Shuting, et al.
Published: (2025)
Ref-SAM3D: Bridging SAM3D with Text for Reference 3D Reconstruction
by: Zhou, Yun, et al.
Published: (2025)
by: Zhou, Yun, et al.
Published: (2025)
GREx: Generalized Referring Expression Segmentation, Comprehension, and Generation
by: Ding, Henghui, et al.
Published: (2026)
by: Ding, Henghui, et al.
Published: (2026)
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
by: He, Shuting, et al.
Published: (2024)
by: He, Shuting, et al.
Published: (2024)
MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes
by: Ding, Henghui, et al.
Published: (2025)
by: Ding, Henghui, et al.
Published: (2025)
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
by: He, Shuting, et al.
Published: (2024)
by: He, Shuting, et al.
Published: (2024)
MeViS: A Multi-Modal Dataset for Referring Motion Expression Video Segmentation
by: Ding, Henghui, et al.
Published: (2025)
by: Ding, Henghui, et al.
Published: (2025)
SegPoint: Segment Any Point Cloud via Large Language Model
by: He, Shuting, et al.
Published: (2024)
by: He, Shuting, et al.
Published: (2024)
Open-set Anomaly Segmentation in Complex Scenarios
by: Xia, Song, et al.
Published: (2025)
by: Xia, Song, et al.
Published: (2025)
Segment Anything Across Shots: A Method and Benchmark
by: Hu, Hengrui, et al.
Published: (2025)
by: Hu, Hengrui, et al.
Published: (2025)
MOVE: Motion-Guided Few-Shot Video Object Segmentation
by: Ying, Kaining, et al.
Published: (2025)
by: Ying, Kaining, et al.
Published: (2025)
SAM3-DMS: Decoupled Memory Selection for Multi-target Video Segmentation of SAM3
by: Shen, Ruiqi, et al.
Published: (2026)
by: Shen, Ruiqi, et al.
Published: (2026)
Mitigating the Curse of Dimensionality for Certified Robustness via Dual Randomized Smoothing
by: Xia, Song, et al.
Published: (2024)
by: Xia, Song, et al.
Published: (2024)
SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models
by: Sun, Ye, et al.
Published: (2025)
by: Sun, Ye, et al.
Published: (2025)
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models
by: Shuai, Xincheng, et al.
Published: (2024)
by: Shuai, Xincheng, et al.
Published: (2024)
PSDesigner: Automated Graphic Design with a Human-Like Creative Workflow
by: Shuai, Xincheng, et al.
Published: (2026)
by: Shuai, Xincheng, et al.
Published: (2026)
Few-Shot Segmentation with Global and Local Contrastive Learning
by: Liu, Weide, et al.
Published: (2021)
by: Liu, Weide, et al.
Published: (2021)
PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer
by: Feng, Qian, et al.
Published: (2024)
by: Feng, Qian, et al.
Published: (2024)
Evaluating SAM2 for Video Semantic Segmentation
by: Ariff, Syed Hesham Syed, et al.
Published: (2025)
by: Ariff, Syed Hesham Syed, et al.
Published: (2025)
SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation
by: Qin, Zhenyuan, et al.
Published: (2025)
by: Qin, Zhenyuan, et al.
Published: (2025)
SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow
by: Wang, Chaoyang, et al.
Published: (2024)
by: Wang, Chaoyang, et al.
Published: (2024)
Learning Local and Global Temporal Contexts for Video Semantic Segmentation
by: Sun, Guolei, et al.
Published: (2022)
by: Sun, Guolei, et al.
Published: (2022)
A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation
by: He, Shuting, et al.
Published: (2025)
by: He, Shuting, et al.
Published: (2025)
On the Provable Importance of Gradients for Language-Assisted Image Clustering
by: Peng, Bo, et al.
Published: (2025)
by: Peng, Bo, et al.
Published: (2025)
ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model
by: Han, Kunyang, et al.
Published: (2024)
by: Han, Kunyang, et al.
Published: (2024)
Explore In-Context Segmentation via Latent Diffusion Models
by: Wang, Chaoyang, et al.
Published: (2024)
by: Wang, Chaoyang, et al.
Published: (2024)
GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering
by: Shuai, Xincheng, et al.
Published: (2026)
by: Shuai, Xincheng, et al.
Published: (2026)
EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing
by: Fu, Yang, et al.
Published: (2026)
by: Fu, Yang, et al.
Published: (2026)
AnyI2V: Animating Any Conditional Image with Motion Control
by: Li, Ziye, et al.
Published: (2025)
by: Li, Ziye, et al.
Published: (2025)
Free-Form Scene Editor: Enabling Multi-Round Object Manipulation like in a 3D Engine
by: Shuai, Xincheng, et al.
Published: (2025)
by: Shuai, Xincheng, et al.
Published: (2025)
Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
by: Dong, Jiahua, et al.
Published: (2025)
by: Dong, Jiahua, et al.
Published: (2025)
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension
by: Wang, Yaxian, et al.
Published: (2025)
by: Wang, Yaxian, et al.
Published: (2025)
Transformer-Based Visual Segmentation: A Survey
by: Li, Xiangtai, et al.
Published: (2023)
by: Li, Xiangtai, et al.
Published: (2023)
Attention Itself Could Retrieve.RetrieveVGGT: Training-Free Long Context Streaming 3D Reconstruction via Query-Key Similarity Retrieval
by: Zou, Zichen, et al.
Published: (2026)
by: Zou, Zichen, et al.
Published: (2026)
3D-GRES: Generalized 3D Referring Expression Segmentation
by: Wu, Changli, et al.
Published: (2024)
by: Wu, Changli, et al.
Published: (2024)
Learning Accurate Segmentation Purely from Self-Supervision
by: You, Zuyao, et al.
Published: (2026)
by: You, Zuyao, et al.
Published: (2026)
OMG-Seg: Is One Model Good Enough For All Segmentation?
by: Li, Xiangtai, et al.
Published: (2024)
by: Li, Xiangtai, et al.
Published: (2024)
Transferable Adversarial Attacks on SAM and Its Downstream Models
by: Xia, Song, et al.
Published: (2024)
by: Xia, Song, et al.
Published: (2024)
Similar Items
-
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
by: Ying, Kaining, et al.
Published: (2025) -
Multimodal Referring Segmentation: A Survey
by: Ding, Henghui, et al.
Published: (2025) -
ReferSplat: Referring Segmentation in 3D Gaussian Splatting
by: He, Shuting, et al.
Published: (2025) -
Ref-SAM3D: Bridging SAM3D with Text for Reference 3D Reconstruction
by: Zhou, Yun, et al.
Published: (2025) -
GREx: Generalized Referring Expression Segmentation, Comprehension, and Generation
by: Ding, Henghui, et al.
Published: (2026)