Saved in:
| Main Authors: | Liang, Tianming, Lin, Kun-Yu, Tan, Chaolei, Zhang, Jianguo, Zheng, Wei-Shi, Hu, Jian-Fang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.14607 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025
by: Liang, Tianming, et al.
Published: (2025)
by: Liang, Tianming, et al.
Published: (2025)
Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation
by: Liang, Tianming, et al.
Published: (2025)
by: Liang, Tianming, et al.
Published: (2025)
Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation
by: Jiang, Haichao, et al.
Published: (2026)
by: Jiang, Haichao, et al.
Published: (2026)
Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels
by: Liang, Tianming, et al.
Published: (2024)
by: Liang, Tianming, et al.
Published: (2024)
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
by: Tan, Chaolei, et al.
Published: (2024)
by: Tan, Chaolei, et al.
Published: (2024)
TubeRMC: Tube-conditioned Reconstruction with Mutual Constraints for Weakly-supervised Spatio-Temporal Video Grounding
by: Li, Jinxuan, et al.
Published: (2025)
by: Li, Jinxuan, et al.
Published: (2025)
Don't Guess, Just Ask: Resolving Ambiguity in Referring Segmentation via Multi-turn Clarification
by: Yang, Yuting, et al.
Published: (2026)
by: Yang, Yuting, et al.
Published: (2026)
Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
by: Li, Jinxuan, et al.
Published: (2025)
by: Li, Jinxuan, et al.
Published: (2025)
Object-centric Video Question Answering with Visual Grounding and Referring
by: Wang, Haochen, et al.
Published: (2025)
by: Wang, Haochen, et al.
Published: (2025)
SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses
by: Tan, Chaolei, et al.
Published: (2024)
by: Tan, Chaolei, et al.
Published: (2024)
GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation
by: Lin, Ci-Siang, et al.
Published: (2024)
by: Lin, Ci-Siang, et al.
Published: (2024)
Seg-ReSearch: Segmentation with Interleaved Reasoning and External Search
by: Liang, Tianming, et al.
Published: (2026)
by: Liang, Tianming, et al.
Published: (2026)
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
by: Guo, Hao, et al.
Published: (2024)
by: Guo, Hao, et al.
Published: (2024)
Temporal Grounding as a Learning Signal for Referring Video Object Segmentation
by: Lee, Seunghun, et al.
Published: (2025)
by: Lee, Seunghun, et al.
Published: (2025)
DINO-Tok: Adapting DINO for Visual Tokenizers
by: Jia, Mingkai, et al.
Published: (2025)
by: Jia, Mingkai, et al.
Published: (2025)
Temporal Prompting Matters: Rethinking Referring Video Object Segmentation
by: Lin, Ci-Siang, et al.
Published: (2025)
by: Lin, Ci-Siang, et al.
Published: (2025)
Mask Grounding for Referring Image Segmentation
by: Chng, Yong Xien, et al.
Published: (2023)
by: Chng, Yong Xien, et al.
Published: (2023)
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
by: Liu, Shilong, et al.
Published: (2023)
by: Liu, Shilong, et al.
Published: (2023)
Weakly-Supervised Referring Video Object Segmentation through Text Supervision
by: Shi, Miaojing, et al.
Published: (2026)
by: Shi, Miaojing, et al.
Published: (2026)
PET-DINO: Unifying Visual Cues into Grounding DINO with Prompt-Enriched Training
by: Fu, Weifu, et al.
Published: (2026)
by: Fu, Weifu, et al.
Published: (2026)
Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention
by: Liu, Haijing, et al.
Published: (2025)
by: Liu, Haijing, et al.
Published: (2025)
RefCut: Interactive Segmentation with Reference Guidance
by: Lin, Zheng, et al.
Published: (2025)
by: Lin, Zheng, et al.
Published: (2025)
EventRR: Event Referential Reasoning for Referring Video Object Segmentation
by: Xu, Huihui, et al.
Published: (2025)
by: Xu, Huihui, et al.
Published: (2025)
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
by: Wang, Yaoting, et al.
Published: (2024)
by: Wang, Yaoting, et al.
Published: (2024)
Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation
by: Liang, Chen, et al.
Published: (2021)
by: Liang, Chen, et al.
Published: (2021)
CoopDiff: Anticipating 3D Human-object Interactions via Contact-consistent Decoupled Diffusion
by: Lin, Xiaotong, et al.
Published: (2025)
by: Lin, Xiaotong, et al.
Published: (2025)
GuiDINO: Rethinking Vision Foundation Model in Medical Image Segmentation
by: Liang, Zhuonan, et al.
Published: (2026)
by: Liang, Zhuonan, et al.
Published: (2026)
Multimodal Reference Visual Grounding
by: Lu, Yangxiao, et al.
Published: (2025)
by: Lu, Yangxiao, et al.
Published: (2025)
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding
by: Guo, Hao, et al.
Published: (2025)
by: Guo, Hao, et al.
Published: (2025)
Show Me When and Where: Towards Referring Video Object Segmentation in the Wild
by: Gao, Mingqi, et al.
Published: (2026)
by: Gao, Mingqi, et al.
Published: (2026)
InterRVOS: Interaction-aware Referring Video Object Segmentation
by: Jin, Woojeong, et al.
Published: (2025)
by: Jin, Woojeong, et al.
Published: (2025)
Temporally Consistent Referring Video Object Segmentation with Hybrid Memory
by: Miao, Bo, et al.
Published: (2024)
by: Miao, Bo, et al.
Published: (2024)
Referring Video Object Segmentation with Cross-Modality Proxy Queries
by: Sun, Baoli, et al.
Published: (2025)
by: Sun, Baoli, et al.
Published: (2025)
Correspondence as Video: Test-Time Adaption on SAM2 for Reference Segmentation in the Wild
by: Wang, Haoran, et al.
Published: (2025)
by: Wang, Haoran, et al.
Published: (2025)
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
by: Yuan, Linfeng, et al.
Published: (2023)
by: Yuan, Linfeng, et al.
Published: (2023)
ViSpeak: Visual Instruction Feedback in Streaming Videos
by: Fu, Shenghao, et al.
Published: (2025)
by: Fu, Shenghao, et al.
Published: (2025)
SVAC: Scaling Is All You Need For Referring Video Object Segmentation
by: Zhang, Li, et al.
Published: (2025)
by: Zhang, Li, et al.
Published: (2025)
Referring Video Object Segmentation via Language-aligned Track Selection
by: Kim, Seongchan, et al.
Published: (2024)
by: Kim, Seongchan, et al.
Published: (2024)
Multi-Context Temporal Consistent Modeling for Referring Video Object Segmentation
by: Choi, Sun-Hyuk, et al.
Published: (2025)
by: Choi, Sun-Hyuk, et al.
Published: (2025)
Mitigating Query Selection Bias in Referring Video Object Segmentation
by: Zhang, Dingwei, et al.
Published: (2025)
by: Zhang, Dingwei, et al.
Published: (2025)
Similar Items
-
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025
by: Liang, Tianming, et al.
Published: (2025) -
Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation
by: Liang, Tianming, et al.
Published: (2025) -
Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation
by: Jiang, Haichao, et al.
Published: (2026) -
Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels
by: Liang, Tianming, et al.
Published: (2024) -
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
by: Tan, Chaolei, et al.
Published: (2024)