Saved in:
| Main Authors: | Zhou, Yikang, Zhang, Tao, Gong, Dengxian, Wu, Yuanzheng, Tian, Ye, Wang, Haochen, Yuan, Haobo, Wang, Jiacong, Qi, Lu, Fei, Hao, Wang, Anran, Wang, Zhuochen, Wang, Yujing, Chen, Cheng, Ji, Shunping, Li, Xiangtai |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.16093 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SaSaSaSa2VA: 2nd Place of the 5th PVUW MeViS-Text Track
by: Gong, Dengxian, et al.
Published: (2026)
by: Gong, Dengxian, et al.
Published: (2026)
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
by: Niu, Quanzhu, et al.
Published: (2025)
by: Niu, Quanzhu, et al.
Published: (2025)
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
by: Wang, Haochen, et al.
Published: (2025)
by: Wang, Haochen, et al.
Published: (2025)
DeH4R: A Decoupled and Hybrid Method for Road Network Graph Extraction
by: Gong, Dengxian, et al.
Published: (2025)
by: Gong, Dengxian, et al.
Published: (2025)
PairUni: Pairwise Training for Unified Multimodal Language Models
by: Zheng, Jiani, et al.
Published: (2025)
by: Zheng, Jiani, et al.
Published: (2025)
Open-o3-Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
by: Meng, Jiahao, et al.
Published: (2025)
by: Meng, Jiahao, et al.
Published: (2025)
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
by: Wang, Haochen, et al.
Published: (2025)
by: Wang, Haochen, et al.
Published: (2025)
DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries
by: Zhou, Yikang, et al.
Published: (2024)
by: Zhou, Yikang, et al.
Published: (2024)
Dense360: Dense Understanding from Omnidirectional Panoramas
by: Zhou, Yikang, et al.
Published: (2025)
by: Zhou, Yikang, et al.
Published: (2025)
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
by: Tian, Ye, et al.
Published: (2025)
by: Tian, Ye, et al.
Published: (2025)
Point Cloud Mamba: Point Cloud Learning via State Space Model
by: Zhang, Tao, et al.
Published: (2024)
by: Zhang, Tao, et al.
Published: (2024)
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
by: Zhang, Tao, et al.
Published: (2024)
by: Zhang, Tao, et al.
Published: (2024)
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
by: Lei, Weixian, et al.
Published: (2025)
by: Lei, Weixian, et al.
Published: (2025)
Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs
by: Zhou, Yikang, et al.
Published: (2025)
by: Zhou, Yikang, et al.
Published: (2025)
Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation
by: Niu, Quanzhu, et al.
Published: (2025)
by: Niu, Quanzhu, et al.
Published: (2025)
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
by: Yuan, Haobo, et al.
Published: (2025)
by: Yuan, Haobo, et al.
Published: (2025)
P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images
by: Zhang, Tao, et al.
Published: (2024)
by: Zhang, Tao, et al.
Published: (2024)
Innovative methods of using information technology in teaching stringed instruments in college
by: Wang, Yuanzheng
Published: (2025)
by: Wang, Yuanzheng
Published: (2025)
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
by: Yuan, Haobo, et al.
Published: (2025)
by: Yuan, Haobo, et al.
Published: (2025)
Efficiently matching random inhomogeneous graphs via degree profiles
by: Ding, Jian, et al.
Published: (2023)
by: Ding, Jian, et al.
Published: (2023)
Conditional Panoramic Image Generation via Masked Autoregressive Modeling
by: Wang, Chaoyang, et al.
Published: (2025)
by: Wang, Chaoyang, et al.
Published: (2025)
AMCEN: An Attention Masking-based Contrastive Event Network for Two-stage Temporal Knowledge Graph Reasoning
by: Yang, Jing, et al.
Published: (2024)
by: Yang, Jing, et al.
Published: (2024)
A Novel Shape Guided Transformer Network for Instance Segmentation in Remote Sensing Images
by: Yu, Dawen, et al.
Published: (2024)
by: Yu, Dawen, et al.
Published: (2024)
DriveFine: Refining-Augmented Masked Diffusion VLA for Precise and Robust Driving
by: Dang, Chenxu, et al.
Published: (2026)
by: Dang, Chenxu, et al.
Published: (2026)
Segment Any 4D Gaussians
by: Ji, Shengxiang, et al.
Published: (2024)
by: Ji, Shengxiang, et al.
Published: (2024)
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
by: Zhang, Tao, et al.
Published: (2025)
by: Zhang, Tao, et al.
Published: (2025)
Chinese ModernBERT with Whole-Word Masking
by: Zhao, Zeyu, et al.
Published: (2025)
by: Zhao, Zeyu, et al.
Published: (2025)
Lysine Acetyltransferase 6 in Health and Disease
by: Yujing Tan, et al.
Published: (2025)
by: Yujing Tan, et al.
Published: (2025)
DreamSwapV: Mask-guided Subject Swapping for Any Customized Video Editing
by: Wang, Weitao, et al.
Published: (2025)
by: Wang, Weitao, et al.
Published: (2025)
Towards Cross-Table Masked Pretraining for Web Data Mining
by: Ye, Chao, et al.
Published: (2023)
by: Ye, Chao, et al.
Published: (2023)
Preparation, Characterization and Antioxidant Effects on Processed Sausages of Ultrafine Green Tea Powder Emulsions
by: Xin Tao, et al.
Published: (2026)
by: Xin Tao, et al.
Published: (2026)
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
by: Wu, Shengqiong, et al.
Published: (2025)
by: Wu, Shengqiong, et al.
Published: (2025)
You Can't Ignore Either: Unifying Structure and Feature Denoising for Robust Graph Learning
by: Yang, Tianmeng, et al.
Published: (2024)
by: Yang, Tianmeng, et al.
Published: (2024)
A note about why deep learning is deep: A discontinuous approximation perspective
by: Yongxin Li, et al.
Published: (2024)
by: Yongxin Li, et al.
Published: (2024)
SPICE : Leveraging Soft Probabilistic Causal Intervention for Breast Ultrasound Tumor Segmentation
by: Haobo Chen, et al.
Published: (2026)
by: Haobo Chen, et al.
Published: (2026)
Any Labor Union Can Represent Any Unit
Published: (2024)
Published: (2024)
Deliberative Reasoning Network: An Uncertainty-Driven Paradigm for Belief-Tracked Inference with Pretrained Language Models
by: Xu, Anran, et al.
Published: (2025)
by: Xu, Anran, et al.
Published: (2025)
MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
by: Deng, Yufan, et al.
Published: (2025)
by: Deng, Yufan, et al.
Published: (2025)
Any Model, Any Place, Any Time: Get Remote Sensing Foundation Model Embeddings On Demand
by: Ye, Dingqi, et al.
Published: (2026)
by: Ye, Dingqi, et al.
Published: (2026)
Sharp asymptotics of disconnection time of large cylinders by simple and biased random walks
by: Li, Xinyi, et al.
Published: (2024)
by: Li, Xinyi, et al.
Published: (2024)
Similar Items
-
SaSaSaSa2VA: 2nd Place of the 5th PVUW MeViS-Text Track
by: Gong, Dengxian, et al.
Published: (2026) -
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
by: Niu, Quanzhu, et al.
Published: (2025) -
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
by: Wang, Haochen, et al.
Published: (2025) -
DeH4R: A Decoupled and Hybrid Method for Road Network Graph Extraction
by: Gong, Dengxian, et al.
Published: (2025) -
PairUni: Pairwise Training for Unified Multimodal Language Models
by: Zheng, Jiani, et al.
Published: (2025)