Saved in:
| Main Authors: | Hu, Xiaobo, Lin, Youfang, Liu, Yue, Wang, Jinwen, Wang, Shuo, Fan, Hehe, Lv, Kai |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2312.01915 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DRFormer: A Dual-Regularized Bidirectional Transformer for Person Re-identification
by: Shu, Ying, et al.
Published: (2026)
by: Shu, Ying, et al.
Published: (2026)
Incentivizing Generative Zero-Shot Learning via Outcome-Reward Reinforcement Learning with Visual Cues
by: Hou, Wenjin, et al.
Published: (2026)
by: Hou, Wenjin, et al.
Published: (2026)
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
by: Zhang, Yue, et al.
Published: (2024)
by: Zhang, Yue, et al.
Published: (2024)
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
by: Zhu, Lianghui, et al.
Published: (2024)
by: Zhu, Lianghui, et al.
Published: (2024)
TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation
by: Li, Mingwei, et al.
Published: (2026)
by: Li, Mingwei, et al.
Published: (2026)
ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning
by: Hou, Wenjin, et al.
Published: (2024)
by: Hou, Wenjin, et al.
Published: (2024)
Improving Adversarial Robustness via Decoupled Visual Representation Masking
by: Liu, Decheng, et al.
Published: (2024)
by: Liu, Decheng, et al.
Published: (2024)
AnchorFlow: Training-Free 3D Editing via Latent Anchor-Aligned Flows
by: Zhou, Zhenglin, et al.
Published: (2025)
by: Zhou, Zhenglin, et al.
Published: (2025)
InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation
by: Zhuo, Wenjie, et al.
Published: (2024)
by: Zhuo, Wenjie, et al.
Published: (2024)
Bridging Visual Representation and Reinforcement Learning from Verifiable Rewards in Large Vision-Language Models
by: Han, Yuhang, et al.
Published: (2026)
by: Han, Yuhang, et al.
Published: (2026)
ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models
by: Zhou, Zhenglin, et al.
Published: (2025)
by: Zhou, Zhenglin, et al.
Published: (2025)
Seeing Is Believing? A Benchmark for Multimodal Large Language Models on Visual Illusions and Anomalies
by: Hou, Wenjin, et al.
Published: (2026)
by: Hou, Wenjin, et al.
Published: (2026)
Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts
by: Zhang, Yue, et al.
Published: (2025)
by: Zhang, Yue, et al.
Published: (2025)
Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation
by: Wang, Xinkun, et al.
Published: (2025)
by: Wang, Xinkun, et al.
Published: (2025)
Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement Learning
by: Wang, Wentao, et al.
Published: (2025)
by: Wang, Wentao, et al.
Published: (2025)
Prototype Learning for Micro-gesture Classification
by: Chen, Guoliang, et al.
Published: (2024)
by: Chen, Guoliang, et al.
Published: (2024)
Equivariant Representation Learning for Augmentation-based Self-Supervised Learning via Image Reconstruction
by: Wang, Qin, et al.
Published: (2024)
by: Wang, Qin, et al.
Published: (2024)
Constructing and Interpreting Digital Twin Representations for Visual Reasoning via Reinforcement Learning
by: Shen, Yiqing, et al.
Published: (2025)
by: Shen, Yiqing, et al.
Published: (2025)
Calibrated Multimodal Representation Learning with Missing Modalities
by: Liu, Xiaohao, et al.
Published: (2025)
by: Liu, Xiaohao, et al.
Published: (2025)
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
by: Lin, Wang, et al.
Published: (2025)
by: Lin, Wang, et al.
Published: (2025)
Enhancing Environmental Robustness in Few-shot Learning via Conditional Representation Learning
by: Guo, Qianyu, et al.
Published: (2025)
by: Guo, Qianyu, et al.
Published: (2025)
Robust Multimodal Learning via Representation Decoupling
by: Wei, Shicai, et al.
Published: (2024)
by: Wei, Shicai, et al.
Published: (2024)
TV-Dialogue: Crafting Theme-Aware Video Dialogues with Immersive Interaction
by: Wang, Sai, et al.
Published: (2025)
by: Wang, Sai, et al.
Published: (2025)
Dual Advancement of Representation Learning and Clustering for Sparse and Noisy Images
by: Li, Wenlin, et al.
Published: (2024)
by: Li, Wenlin, et al.
Published: (2024)
Scaling Video Understanding via Compact Latent Multi-Agent Collaboration
by: Chen, Kerui, et al.
Published: (2026)
by: Chen, Kerui, et al.
Published: (2026)
DeMoGen: Towards Decompositional Human Motion Generation with Energy-Based Diffusion Models
by: Zhang, Jianrong, et al.
Published: (2025)
by: Zhang, Jianrong, et al.
Published: (2025)
EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space
by: Zhang, Jianrong, et al.
Published: (2024)
by: Zhang, Jianrong, et al.
Published: (2024)
Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning
by: Li, Mingcheng, et al.
Published: (2024)
by: Li, Mingcheng, et al.
Published: (2024)
Visual Imitation Learning with Calibrated Contrastive Representation
by: Wang, Yunke, et al.
Published: (2024)
by: Wang, Yunke, et al.
Published: (2024)
Annotation-Free Visual Reasoning for High-Resolution Large Multimodal Models via Reinforcement Learning
by: Yang, Jiacheng, et al.
Published: (2026)
by: Yang, Jiacheng, et al.
Published: (2026)
Closed-Loop Bidirectional Prompting for Adversarial Robustness of Vision Language Models
by: Liu, Xiao, et al.
Published: (2026)
by: Liu, Xiao, et al.
Published: (2026)
Mimicking Human Visual Development for Learning Robust Image Representations
by: Raj, Ankita, et al.
Published: (2025)
by: Raj, Ankita, et al.
Published: (2025)
Visual Superordinate Abstraction for Robust Concept Learning
by: Zheng, Qi, et al.
Published: (2022)
by: Zheng, Qi, et al.
Published: (2022)
Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach
by: Chen, Yifei, et al.
Published: (2024)
by: Chen, Yifei, et al.
Published: (2024)
Scaling Language-Free Visual Representation Learning
by: Fan, David, et al.
Published: (2025)
by: Fan, David, et al.
Published: (2025)
Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning
by: Wang, Junlin, et al.
Published: (2025)
by: Wang, Junlin, et al.
Published: (2025)
Principled Multimodal Representation Learning
by: Liu, Xiaohao, et al.
Published: (2025)
by: Liu, Xiaohao, et al.
Published: (2025)
VividDreamer: Invariant Score Distillation For Hyper-Realistic Text-to-3D Generation
by: Zhuo, Wenjie, et al.
Published: (2024)
by: Zhuo, Wenjie, et al.
Published: (2024)
GraphTARIF: Linear Graph Transformer with Augmented Rank and Improved Focus
by: Hu, Zhaolin, et al.
Published: (2025)
by: Hu, Zhaolin, et al.
Published: (2025)
Neural Clustering based Visual Representation Learning
by: Chen, Guikun, et al.
Published: (2024)
by: Chen, Guikun, et al.
Published: (2024)
Similar Items
-
DRFormer: A Dual-Regularized Bidirectional Transformer for Person Re-identification
by: Shu, Ying, et al.
Published: (2026) -
Incentivizing Generative Zero-Shot Learning via Outcome-Reward Reinforcement Learning with Visual Cues
by: Hou, Wenjin, et al.
Published: (2026) -
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
by: Zhang, Yue, et al.
Published: (2024) -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
by: Zhu, Lianghui, et al.
Published: (2024) -
TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation
by: Li, Mingwei, et al.
Published: (2026)