Saved in:
| Main Authors: | Luo, Jingnan, Gao, Mingqi, Liu, Jun, Gao, Bin-Bin, Zheng, Feng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.21488 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation
by: Gao, Mingqi, et al.
Published: (2024)
by: Gao, Mingqi, et al.
Published: (2024)
Show Me When and Where: Towards Referring Video Object Segmentation in the Wild
by: Gao, Mingqi, et al.
Published: (2026)
by: Gao, Mingqi, et al.
Published: (2026)
Point Linguist Model: Segment Any Object via Bridged Large 3D-Language Model
by: Huang, Zhuoxu, et al.
Published: (2025)
by: Huang, Zhuoxu, et al.
Published: (2025)
Place Anything into Any Video
by: Liu, Ziling, et al.
Published: (2024)
by: Liu, Ziling, et al.
Published: (2024)
Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence Matching
by: Liu, Heng, et al.
Published: (2025)
by: Liu, Heng, et al.
Published: (2025)
BoxSeg: Quality-Aware and Peer-Assisted Learning for Box-supervised Instance Segmentation
by: Lai, Jinxiang, et al.
Published: (2025)
by: Lai, Jinxiang, et al.
Published: (2025)
MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning
by: Gao, Bin-Bin
Published: (2025)
by: Gao, Bin-Bin
Published: (2025)
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
by: Jiang, Xi, et al.
Published: (2024)
by: Jiang, Xi, et al.
Published: (2024)
THU-Warwick Submission for EPIC-KITCHEN Challenge 2025: Semi-Supervised Video Object Segmentation
by: Gao, Mingqi, et al.
Published: (2025)
by: Gao, Mingqi, et al.
Published: (2025)
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
by: Huang, Jiaxin, et al.
Published: (2025)
by: Huang, Jiaxin, et al.
Published: (2025)
Beyond the Visible: Benchmarking Occlusion Perception in Multimodal Large Language Models
by: Liu, Zhaochen, et al.
Published: (2025)
by: Liu, Zhaochen, et al.
Published: (2025)
SoftPatch+: Fully Unsupervised Anomaly Classification and Segmentation
by: Wang, Chengjie, et al.
Published: (2024)
by: Wang, Chengjie, et al.
Published: (2024)
ViLLa: Video Reasoning Segmentation with Large Language Model
by: Zheng, Rongkun, et al.
Published: (2024)
by: Zheng, Rongkun, et al.
Published: (2024)
One Language-Free Foundation Model Is Enough for Universal Vision Anomaly Detection
by: Gao, Bin-Bin, et al.
Published: (2026)
by: Gao, Bin-Bin, et al.
Published: (2026)
SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos
by: Gao, Mingqi, et al.
Published: (2025)
by: Gao, Mingqi, et al.
Published: (2025)
GSVA: Generalized Segmentation via Multimodal Large Language Models
by: Xia, Zhuofan, et al.
Published: (2023)
by: Xia, Zhuofan, et al.
Published: (2023)
Leveraging Geometric Priors for Unaligned Scene Change Detection
by: Liu, Ziling, et al.
Published: (2025)
by: Liu, Ziling, et al.
Published: (2025)
SCORP: Scene-Consistent Object Refinement via Proxy Generation and Tuning
by: Chen, Ziwei, et al.
Published: (2025)
by: Chen, Ziwei, et al.
Published: (2025)
VideoPerceiver: Enhancing Fine-Grained Temporal Perception in Video Multimodal Large Language Models
by: Zhao, Fufangchen, et al.
Published: (2025)
by: Zhao, Fufangchen, et al.
Published: (2025)
Bidirectional Uncertainty-Aware Region Learning for Semi-Supervised Medical Image Segmentation
by: Zhou, Shiwei, et al.
Published: (2025)
by: Zhou, Shiwei, et al.
Published: (2025)
Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation
by: Gui, Guan, et al.
Published: (2025)
by: Gui, Guan, et al.
Published: (2025)
Dual Semantic-Aware Network for Noise Suppressed Ultrasound Video Segmentation
by: Zhou, Ling, et al.
Published: (2025)
by: Zhou, Ling, et al.
Published: (2025)
Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning
by: Jiang, Jingjing, et al.
Published: (2025)
by: Jiang, Jingjing, et al.
Published: (2025)
VideoLLM-online: Online Video Large Language Model for Streaming Video
by: Chen, Joya, et al.
Published: (2024)
by: Chen, Joya, et al.
Published: (2024)
RSGMamba: Reliability-Aware Self-Gated State Space Model for Multimodal Semantic Segmentation
by: Xu, Guoan, et al.
Published: (2026)
by: Xu, Guoan, et al.
Published: (2026)
The 1st Solution for MOSEv2 Challenge 2025: Long-term and Concept-aware Video Segmentation via SeC
by: Gao, Mingqi, et al.
Published: (2025)
by: Gao, Mingqi, et al.
Published: (2025)
Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt
by: Gao, Bin-Bin
Published: (2025)
by: Gao, Bin-Bin
Published: (2025)
HATS: Hardness-Aware Trajectory Synthesis for GUI Agents
by: Shao, Rui, et al.
Published: (2026)
by: Shao, Rui, et al.
Published: (2026)
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation
by: Ning, Zhenhua, et al.
Published: (2025)
by: Ning, Zhenhua, et al.
Published: (2025)
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
by: Bai, Zechen, et al.
Published: (2024)
by: Bai, Zechen, et al.
Published: (2024)
StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA
by: Hu, Yuhang, et al.
Published: (2025)
by: Hu, Yuhang, et al.
Published: (2025)
Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models
by: Wang, Lu, et al.
Published: (2026)
by: Wang, Lu, et al.
Published: (2026)
Large Language Model Guided Progressive Feature Alignment for Multimodal UAV Object Detection
by: Wu, Wentao, et al.
Published: (2025)
by: Wu, Wentao, et al.
Published: (2025)
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
by: Wang, Chenyu, et al.
Published: (2024)
by: Wang, Chenyu, et al.
Published: (2024)
VISA: Reasoning Video Object Segmentation via Large Language Models
by: Yan, Cilin, et al.
Published: (2024)
by: Yan, Cilin, et al.
Published: (2024)
Annotation-Free Visual Reasoning for High-Resolution Large Multimodal Models via Reinforcement Learning
by: Yang, Jiacheng, et al.
Published: (2026)
by: Yang, Jiacheng, et al.
Published: (2026)
ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes
by: Yang, Yixuan, et al.
Published: (2025)
by: Yang, Yixuan, et al.
Published: (2025)
POLAR: A Portrait OLAT Dataset and Generative Framework for Illumination-Aware Face Modeling
by: Chen, Zhuo, et al.
Published: (2025)
by: Chen, Zhuo, et al.
Published: (2025)
VLForgery Face Triad: Detection, Localization and Attribution via Multimodal Large Language Models
by: He, Xinan, et al.
Published: (2025)
by: He, Xinan, et al.
Published: (2025)
Can Large Multimodal Models Inspect Buildings? A Hierarchical Benchmark for Structural Pathology Reasoning
by: Zhong, Hui, et al.
Published: (2026)
by: Zhong, Hui, et al.
Published: (2026)
Similar Items
-
1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation
by: Gao, Mingqi, et al.
Published: (2024) -
Show Me When and Where: Towards Referring Video Object Segmentation in the Wild
by: Gao, Mingqi, et al.
Published: (2026) -
Point Linguist Model: Segment Any Object via Bridged Large 3D-Language Model
by: Huang, Zhuoxu, et al.
Published: (2025) -
Place Anything into Any Video
by: Liu, Ziling, et al.
Published: (2024) -
Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence Matching
by: Liu, Heng, et al.
Published: (2025)