Saved in:
| Main Authors: | Yin, Bo-Wen, Cao, Jiao-Long, Cheng, Ming-Ming, Hou, Qibin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.04701 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
by: Yin, Bowen, et al.
Published: (2023)
by: Yin, Bowen, et al.
Published: (2023)
OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation
by: Yin, Bo-Wen, et al.
Published: (2025)
by: Yin, Bo-Wen, et al.
Published: (2025)
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
by: Zhou, Yupeng, et al.
Published: (2024)
by: Zhou, Yupeng, et al.
Published: (2024)
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
by: Li, Yunheng, et al.
Published: (2024)
by: Li, Yunheng, et al.
Published: (2024)
Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment
by: Zhang, Shi-Chen, et al.
Published: (2025)
by: Zhang, Shi-Chen, et al.
Published: (2025)
SRFormerV2: Taking a Closer Look at Permuted Self-Attention for Image Super-Resolution
by: Zhou, Yupeng, et al.
Published: (2023)
by: Zhou, Yupeng, et al.
Published: (2023)
Low-Resolution Self-Attention for Semantic Segmentation
by: Wu, Yu-Huan, et al.
Published: (2023)
by: Wu, Yu-Huan, et al.
Published: (2023)
GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation
by: Wan, Yuhao, et al.
Published: (2025)
by: Wan, Yuhao, et al.
Published: (2025)
High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation
by: Zeng, Quan-Sheng, et al.
Published: (2024)
by: Zeng, Quan-Sheng, et al.
Published: (2024)
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
by: Li, Yunheng, et al.
Published: (2025)
by: Li, Yunheng, et al.
Published: (2025)
Referring Camouflaged Object Detection
by: Zhang, Xuying, et al.
Published: (2023)
by: Zhang, Xuying, et al.
Published: (2023)
Traffic Scene Parsing through the TSP6K Dataset
by: Jiang, Peng-Tao, et al.
Published: (2023)
by: Jiang, Peng-Tao, et al.
Published: (2023)
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought
by: Li, Yunheng, et al.
Published: (2026)
by: Li, Yunheng, et al.
Published: (2026)
Geometry Depth Consistency in RGBD Relative Pose Estimation
by: Kumar, Sourav, et al.
Published: (2024)
by: Kumar, Sourav, et al.
Published: (2024)
Enhancing Representations through Heterogeneous Self-Supervised Learning
by: Li, Zhong-Yu, et al.
Published: (2023)
by: Li, Zhong-Yu, et al.
Published: (2023)
Revisiting Cross-Modal Knowledge Distillation: A Disentanglement Approach for RGBD Semantic Segmentation
by: Ferrod, Roger, et al.
Published: (2025)
by: Ferrod, Roger, et al.
Published: (2025)
MedSeg-R: Medical Image Segmentation with Clinical Reasoning
by: Shao, Hao, et al.
Published: (2025)
by: Shao, Hao, et al.
Published: (2025)
ControlSR: Taming Diffusion Models for Consistent Real-World Image Super Resolution
by: Wan, Yuhao, et al.
Published: (2024)
by: Wan, Yuhao, et al.
Published: (2024)
Multi-Scale Representations by Varying Window Attention for Semantic Segmentation
by: Yan, Haotian, et al.
Published: (2024)
by: Yan, Haotian, et al.
Published: (2024)
Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions
by: Li, Yunheng, et al.
Published: (2026)
by: Li, Yunheng, et al.
Published: (2026)
Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction
by: Li, Yunheng, et al.
Published: (2024)
by: Li, Yunheng, et al.
Published: (2024)
Zone Evaluation: Revealing Spatial Bias in Object Detection
by: Zheng, Zhaohui, et al.
Published: (2023)
by: Zheng, Zhaohui, et al.
Published: (2023)
CrossKD: Cross-Head Knowledge Distillation for Object Detection
by: Wang, Jiabao, et al.
Published: (2023)
by: Wang, Jiabao, et al.
Published: (2023)
Sora Generates Videos with Stunning Geometrical Consistency
by: Li, Xuanyi, et al.
Published: (2024)
by: Li, Xuanyi, et al.
Published: (2024)
Mixture of Style Experts for Diverse Image Stylization
by: Zhu, Shihao, et al.
Published: (2026)
by: Zhu, Shihao, et al.
Published: (2026)
AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction
by: Zhang, Xuying, et al.
Published: (2025)
by: Zhang, Xuying, et al.
Published: (2025)
Implicit Event-RGBD Neural SLAM
by: Qu, Delin, et al.
Published: (2023)
by: Qu, Delin, et al.
Published: (2023)
MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention
by: Shao, Hao, et al.
Published: (2023)
by: Shao, Hao, et al.
Published: (2023)
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection
by: Chen, Yuming, et al.
Published: (2023)
by: Chen, Yuming, et al.
Published: (2023)
Towards Stable 3D Object Detection
by: Wang, Jiabao, et al.
Published: (2024)
by: Wang, Jiabao, et al.
Published: (2024)
MS-NeRF: Multi-Space Neural Radiance Fields
by: Yin, Ze-Xin, et al.
Published: (2023)
by: Yin, Ze-Xin, et al.
Published: (2023)
KAC: Kolmogorov-Arnold Classifier for Continual Learning
by: Hu, Yusong, et al.
Published: (2025)
by: Hu, Yusong, et al.
Published: (2025)
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
by: Ouyang, Ziheng, et al.
Published: (2025)
by: Ouyang, Ziheng, et al.
Published: (2025)
Multi-Token Enhancing for Vision Representation Learning
by: Li, Zhong-Yu, et al.
Published: (2024)
by: Li, Zhong-Yu, et al.
Published: (2024)
Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation
by: Zhou, Yupeng, et al.
Published: (2026)
by: Zhou, Yupeng, et al.
Published: (2026)
Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection
by: Yuan, Xinbin, et al.
Published: (2025)
by: Yuan, Xinbin, et al.
Published: (2025)
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models
by: Zeng, Quan-Sheng, et al.
Published: (2025)
by: Zeng, Quan-Sheng, et al.
Published: (2025)
Contrastive Masked Autoencoders are Stronger Vision Learners
by: Huang, Zhicheng, et al.
Published: (2022)
by: Huang, Zhicheng, et al.
Published: (2022)
RGBD GS-ICP SLAM
by: Ha, Seongbo, et al.
Published: (2024)
by: Ha, Seongbo, et al.
Published: (2024)
Make It Up: Fake Images, Real Gains in Generalized Few-shot Semantic Segmentation
by: Xie, Guohuan, et al.
Published: (2026)
by: Xie, Guohuan, et al.
Published: (2026)
Similar Items
-
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
by: Yin, Bowen, et al.
Published: (2023) -
OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation
by: Yin, Bo-Wen, et al.
Published: (2025) -
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
by: Zhou, Yupeng, et al.
Published: (2024) -
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
by: Li, Yunheng, et al.
Published: (2024) -
Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment
by: Zhang, Shi-Chen, et al.
Published: (2025)