:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Haiying, Wang, Zihan, Dai, Song, Zhang, Zhengxuan, Dou, Kairan, Hu, Xuming
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.12166
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image Deformations
by: Hossain, Tonmoy, et al.
Published: (2023)

GMapLatent: Geometric Mapping in Latent Space
by: Zeng, Wei, et al.
Published: (2025)

LaRe: Latent Refocusing for Multimodal Reasoning
by: Ma, Jizheng, et al.
Published: (2025)

Efficient Implicit Neural Compression of Point Clouds via Learnable Activation in Latent Space
by: Zhang, Yichi, et al.
Published: (2025)

The Learnability Gap in Medical Latent Diffusion
by: Dombrowski, Mischa, et al.
Published: (2026)

Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space
by: Liu, Chengzhi, et al.
Published: (2025)

Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models
by: Pham, Tan-Hanh, et al.
Published: (2025)

PLUME: Latent Reasoning Based Universal Multimodal Embedding
by: He, Chenwei, et al.
Published: (2026)

Multimodal Latent Reasoning via Hierarchical Visual Cues Injection
by: Zhang, Yiming, et al.
Published: (2026)

Self-Consistent Latent Reasoning: Long Latent Sequence Reasoning for Vision-Language Model
by: Wang, Chenfeng, et al.
Published: (2026)

LatentUMM: Dual Latent Alignment for Unified Multimodal Models
by: Luo, Yinyi, et al.
Published: (2026)

Semantic-Enriched Latent Visual Reasoning
by: Xu, Tianrun, et al.
Published: (2026)

Monet: Reasoning in Latent Visual Space Beyond Images and Language
by: Wang, Qixun, et al.
Published: (2025)

PERL: Parameter Efficient Reasoning in CLIP Latent Space
by: Carnemolla, Simone, et al.
Published: (2026)

Nodule-Aligned Latent Space Learning with LLM-Driven Multimodal Diffusion for Lung Nodule Progression Prediction
by: Song, James, et al.
Published: (2026)

ShaLa: Multimodal Shared Latent Space Modelling
by: Cui, Jiali, et al.
Published: (2025)

Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space
by: Chen, Chao, et al.
Published: (2025)

Rectifying Latent Space for Generative Single-Image Reflection Removal
by: Li, Mingjia, et al.
Published: (2025)

Generative Human Motion Stylization in Latent Space
by: Guo, Chuan, et al.
Published: (2024)

Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs
by: Zhang, Huanyu, et al.
Published: (2025)

Calibrating Biased Distribution in VFM-derived Latent Space via Cross-Domain Geometric Consistency
by: Ma, Yanbiao, et al.
Published: (2025)

Visual Enhanced Depth Scaling for Multimodal Latent Reasoning
by: Han, Yudong, et al.
Published: (2026)

Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space
by: Meng, Zheling, et al.
Published: (2024)

LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model
by: Jin, Jiachun, et al.
Published: (2026)

Seeing Space and Motion: Enhancing Latent Actions with Geometric and Dynamic Awareness for Vision-Language-Action Models
by: Cai, Zhejia, et al.
Published: (2025)

BYOCL: Build Your Own Consistent Latent with Hierarchical Representative Latent Clustering
by: Dai, Jiayue, et al.
Published: (2024)

Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)

Improving Deep Representation Learning via Auxiliary Learnable Target Coding
by: Liu, Kangjun, et al.
Published: (2023)

Latent Transfer Attack: Adversarial Examples via Generative Latent Spaces
by: Shaar, Eitan, et al.
Published: (2026)

GeoSense: Internalizing Geometric Necessity Perception for Multimodal Reasoning
by: Liu, Ruiheng, et al.
Published: (2026)

Latent Implicit Visual Reasoning
by: Li, Kelvin, et al.
Published: (2025)

Constructing Fair Latent Space for Intersection of Fairness and Explainability
by: Joo, Hyungjun, et al.
Published: (2024)

MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
by: Mi, Yapeng, et al.
Published: (2025)

Learning Transformation-Isomorphic Latent Space for Accurate Hand Pose Estimation
by: Ren, Kaiwen, et al.
Published: (2025)

Leveraging Latent Visual Reasoning in Silence
by: Zhu, Dongyao, et al.
Published: (2026)

Conditional Latent Coding with Learnable Synthesized Reference for Deep Image Compression
by: Wu, Siqi, et al.
Published: (2025)

Learning Multimodal Latent Space with EBM Prior and MCMC Inference
by: Yuan, Shiyu, et al.
Published: (2024)

Latent Diffusion Inversion Requires Understanding the Latent Space
by: Rao, Mingxing, et al.
Published: (2025)

The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents
by: Sun, Yuwei, et al.
Published: (2026)

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
by: Dai, Yifan, et al.
Published: (2026)