Saved in:
| Main Authors: | Wang, Yonghui, Liu, Shaokai, Li, Li, Zhou, Wengang, Li, Houqiang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.03521 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Revisiting Shadow Detection from a Vision-Language Perspective
by: Wang, Yonghui, et al.
Published: (2026)
by: Wang, Yonghui, et al.
Published: (2026)
RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation
by: Liao, Zhaokang, et al.
Published: (2024)
by: Liao, Zhaokang, et al.
Published: (2024)
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
by: Wang, Yonghui, et al.
Published: (2024)
by: Wang, Yonghui, et al.
Published: (2024)
LaneTCA: Enhancing Video Lane Detection with Temporal Context Aggregation
by: Zhou, Keyi, et al.
Published: (2024)
by: Zhou, Keyi, et al.
Published: (2024)
DeepEraser: Deep Iterative Context Mining for Generic Text Eraser
by: Feng, Hao, et al.
Published: (2024)
by: Feng, Hao, et al.
Published: (2024)
TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding
by: Luan, Bozhi, et al.
Published: (2024)
by: Luan, Bozhi, et al.
Published: (2024)
ROOT: VLM based System for Indoor Scene Understanding and Beyond
by: Wang, Yonghui, et al.
Published: (2024)
by: Wang, Yonghui, et al.
Published: (2024)
StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models
by: Liu, Keli, et al.
Published: (2026)
by: Liu, Keli, et al.
Published: (2026)
DocR1: Evidence Page-Guided GRPO for Multi-Page Document Understanding
by: Xiong, Junyu, et al.
Published: (2025)
by: Xiong, Junyu, et al.
Published: (2025)
MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning
by: Liu, Xiaoyang, et al.
Published: (2024)
by: Liu, Xiaoyang, et al.
Published: (2024)
GaussNav: Gaussian Splatting for Visual Navigation
by: Lei, Xiaohan, et al.
Published: (2024)
by: Lei, Xiaohan, et al.
Published: (2024)
Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction
by: Guo, Zhiyang, et al.
Published: (2024)
by: Guo, Zhiyang, et al.
Published: (2024)
Forest2Seq: Revitalizing Order Prior for Sequential Indoor Scene Synthesis
by: Sun, Qi, et al.
Published: (2024)
by: Sun, Qi, et al.
Published: (2024)
Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation
by: Lei, Xiaohan, et al.
Published: (2024)
by: Lei, Xiaohan, et al.
Published: (2024)
Exploiting GPT-4 Vision for Zero-shot Point Cloud Understanding
by: Sun, Qi, et al.
Published: (2024)
by: Sun, Qi, et al.
Published: (2024)
Self-Classification Enhancement and Correction for Weakly Supervised Object Detection
by: Yin, Yufei, et al.
Published: (2025)
by: Yin, Yufei, et al.
Published: (2025)
Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction on Monocular RGB Video
by: Zhao, Weichao, et al.
Published: (2023)
by: Zhao, Weichao, et al.
Published: (2023)
Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation
by: Cui, Xiao, et al.
Published: (2025)
by: Cui, Xiao, et al.
Published: (2025)
Cross-Modal Consistency Learning for Sign Language Recognition
by: Wu, Kepeng, et al.
Published: (2025)
by: Wu, Kepeng, et al.
Published: (2025)
Video-based Sign Language Recognition without Temporal Segmentation
by: Huang, Jie, et al.
Published: (2018)
by: Huang, Jie, et al.
Published: (2018)
Learning Generalizable Human Motion Generator with Reinforcement Learning
by: Mao, Yunyao, et al.
Published: (2024)
by: Mao, Yunyao, et al.
Published: (2024)
Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition
by: Zhao, Weichao, et al.
Published: (2024)
by: Zhao, Weichao, et al.
Published: (2024)
Progressive Multi-modal Conditional Prompt Tuning
by: Qiu, Xiaoyu, et al.
Published: (2024)
by: Qiu, Xiaoyu, et al.
Published: (2024)
Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval
by: Du, Yongchao, et al.
Published: (2024)
by: Du, Yongchao, et al.
Published: (2024)
Scaling up Multimodal Pre-training for Sign Language Understanding
by: Zhou, Wengang, et al.
Published: (2024)
by: Zhou, Wengang, et al.
Published: (2024)
Language-Driven Interactive Shadow Detection
by: Wang, Hongqiu, et al.
Published: (2024)
by: Wang, Hongqiu, et al.
Published: (2024)
Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression
by: Tu, Hanyue, et al.
Published: (2025)
by: Tu, Hanyue, et al.
Published: (2025)
MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis
by: Wang, Tianyu, et al.
Published: (2024)
by: Wang, Tianyu, et al.
Published: (2024)
Shadow Generation with Decomposed Mask Prediction and Attentive Shadow Filling
by: Tao, Xinhao, et al.
Published: (2023)
by: Tao, Xinhao, et al.
Published: (2023)
Regional Attention for Shadow Removal
by: Liu, Hengxing, et al.
Published: (2024)
by: Liu, Hengxing, et al.
Published: (2024)
Diff-Shadow: Global-guided Diffusion Model for Shadow Removal
by: Luo, Jinting, et al.
Published: (2024)
by: Luo, Jinting, et al.
Published: (2024)
Robust Multimodal Large Language Models Against Modality Conflict
by: Zhang, Zongmeng, et al.
Published: (2025)
by: Zhang, Zongmeng, et al.
Published: (2025)
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval
by: Jiang, Longtao, et al.
Published: (2024)
by: Jiang, Longtao, et al.
Published: (2024)
Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models
by: Luan, Bozhi, et al.
Published: (2025)
by: Luan, Bozhi, et al.
Published: (2025)
Structural Action Transformer for 3D Dexterous Manipulation
by: Lei, Xiaohan, et al.
Published: (2026)
by: Lei, Xiaohan, et al.
Published: (2026)
ShadowHack: Hacking Shadows via Luminance-Color Divide and Conquer
by: Hu, Jin, et al.
Published: (2024)
by: Hu, Jin, et al.
Published: (2024)
ScaleWeaver: Weaving Efficient Controllable T2I Generation with Multi-Scale Reference Attention
by: Liu, Keli, et al.
Published: (2025)
by: Liu, Keli, et al.
Published: (2025)
Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling
by: Cui, Xiao, et al.
Published: (2025)
by: Cui, Xiao, et al.
Published: (2025)
Test-Time Intensity Consistency Adaptation for Shadow Detection
by: Zhu, Leyi, et al.
Published: (2024)
by: Zhu, Leyi, et al.
Published: (2024)
ShadowMaskFormer: Mask Augmented Patch Embeddings for Shadow Removal
by: Li, Zhuohao, et al.
Published: (2024)
by: Li, Zhuohao, et al.
Published: (2024)
Similar Items
-
Revisiting Shadow Detection from a Vision-Language Perspective
by: Wang, Yonghui, et al.
Published: (2026) -
RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation
by: Liao, Zhaokang, et al.
Published: (2024) -
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
by: Wang, Yonghui, et al.
Published: (2024) -
LaneTCA: Enhancing Video Lane Detection with Temporal Context Aggregation
by: Zhou, Keyi, et al.
Published: (2024) -
DeepEraser: Deep Iterative Context Mining for Generic Text Eraser
by: Feng, Hao, et al.
Published: (2024)