:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hu, Xiaobo, Lin, Youfang, Liu, Yue, Wang, Jinwen, Wang, Shuo, Fan, Hehe, Lv, Kai
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2312.01915
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DRFormer: A Dual-Regularized Bidirectional Transformer for Person Re-identification
by: Shu, Ying, et al.
Published: (2026)

Incentivizing Generative Zero-Shot Learning via Outcome-Reward Reinforcement Learning with Visual Cues
by: Hou, Wenjin, et al.
Published: (2026)

Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
by: Zhang, Yue, et al.
Published: (2024)

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
by: Zhu, Lianghui, et al.
Published: (2024)

TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation
by: Li, Mingwei, et al.
Published: (2026)

ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning
by: Hou, Wenjin, et al.
Published: (2024)

Improving Adversarial Robustness via Decoupled Visual Representation Masking
by: Liu, Decheng, et al.
Published: (2024)

AnchorFlow: Training-Free 3D Editing via Latent Anchor-Aligned Flows
by: Zhou, Zhenglin, et al.
Published: (2025)

InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation
by: Zhuo, Wenjie, et al.
Published: (2024)

Bridging Visual Representation and Reinforcement Learning from Verifiable Rewards in Large Vision-Language Models
by: Han, Yuhang, et al.
Published: (2026)

ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models
by: Zhou, Zhenglin, et al.
Published: (2025)

Seeing Is Believing? A Benchmark for Multimodal Large Language Models on Visual Illusions and Anomalies
by: Hou, Wenjin, et al.
Published: (2026)

Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts
by: Zhang, Yue, et al.
Published: (2025)

Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation
by: Wang, Xinkun, et al.
Published: (2025)

Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement Learning
by: Wang, Wentao, et al.
Published: (2025)

Prototype Learning for Micro-gesture Classification
by: Chen, Guoliang, et al.
Published: (2024)

Equivariant Representation Learning for Augmentation-based Self-Supervised Learning via Image Reconstruction
by: Wang, Qin, et al.
Published: (2024)

Constructing and Interpreting Digital Twin Representations for Visual Reasoning via Reinforcement Learning
by: Shen, Yiqing, et al.
Published: (2025)

Calibrated Multimodal Representation Learning with Missing Modalities
by: Liu, Xiaohao, et al.
Published: (2025)

Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
by: Lin, Wang, et al.
Published: (2025)

Enhancing Environmental Robustness in Few-shot Learning via Conditional Representation Learning
by: Guo, Qianyu, et al.
Published: (2025)

Robust Multimodal Learning via Representation Decoupling
by: Wei, Shicai, et al.
Published: (2024)

TV-Dialogue: Crafting Theme-Aware Video Dialogues with Immersive Interaction
by: Wang, Sai, et al.
Published: (2025)

Dual Advancement of Representation Learning and Clustering for Sparse and Noisy Images
by: Li, Wenlin, et al.
Published: (2024)

Scaling Video Understanding via Compact Latent Multi-Agent Collaboration
by: Chen, Kerui, et al.
Published: (2026)

DeMoGen: Towards Decompositional Human Motion Generation with Energy-Based Diffusion Models
by: Zhang, Jianrong, et al.
Published: (2025)

EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space
by: Zhang, Jianrong, et al.
Published: (2024)

Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning
by: Li, Mingcheng, et al.
Published: (2024)

Visual Imitation Learning with Calibrated Contrastive Representation
by: Wang, Yunke, et al.
Published: (2024)

Annotation-Free Visual Reasoning for High-Resolution Large Multimodal Models via Reinforcement Learning
by: Yang, Jiacheng, et al.
Published: (2026)

Closed-Loop Bidirectional Prompting for Adversarial Robustness of Vision Language Models
by: Liu, Xiao, et al.
Published: (2026)

Mimicking Human Visual Development for Learning Robust Image Representations
by: Raj, Ankita, et al.
Published: (2025)

Visual Superordinate Abstraction for Robust Concept Learning
by: Zheng, Qi, et al.
Published: (2022)

Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach
by: Chen, Yifei, et al.
Published: (2024)

Scaling Language-Free Visual Representation Learning
by: Fan, David, et al.
Published: (2025)

Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning
by: Wang, Junlin, et al.
Published: (2025)

Principled Multimodal Representation Learning
by: Liu, Xiaohao, et al.
Published: (2025)

VividDreamer: Invariant Score Distillation For Hyper-Realistic Text-to-3D Generation
by: Zhuo, Wenjie, et al.
Published: (2024)

GraphTARIF: Linear Graph Transformer with Augmented Rank and Improved Focus
by: Hu, Zhaolin, et al.
Published: (2025)

Neural Clustering based Visual Representation Learning
by: Chen, Guikun, et al.
Published: (2024)