Saved in:
| Main Authors: | Xu, Haiying, Wang, Zihan, Dai, Song, Zhang, Zhengxuan, Dou, Kairan, Hu, Xuming |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.12166 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image Deformations
by: Hossain, Tonmoy, et al.
Published: (2023)
by: Hossain, Tonmoy, et al.
Published: (2023)
GMapLatent: Geometric Mapping in Latent Space
by: Zeng, Wei, et al.
Published: (2025)
by: Zeng, Wei, et al.
Published: (2025)
LaRe: Latent Refocusing for Multimodal Reasoning
by: Ma, Jizheng, et al.
Published: (2025)
by: Ma, Jizheng, et al.
Published: (2025)
Efficient Implicit Neural Compression of Point Clouds via Learnable Activation in Latent Space
by: Zhang, Yichi, et al.
Published: (2025)
by: Zhang, Yichi, et al.
Published: (2025)
The Learnability Gap in Medical Latent Diffusion
by: Dombrowski, Mischa, et al.
Published: (2026)
by: Dombrowski, Mischa, et al.
Published: (2026)
Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space
by: Liu, Chengzhi, et al.
Published: (2025)
by: Liu, Chengzhi, et al.
Published: (2025)
Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models
by: Pham, Tan-Hanh, et al.
Published: (2025)
by: Pham, Tan-Hanh, et al.
Published: (2025)
PLUME: Latent Reasoning Based Universal Multimodal Embedding
by: He, Chenwei, et al.
Published: (2026)
by: He, Chenwei, et al.
Published: (2026)
Multimodal Latent Reasoning via Hierarchical Visual Cues Injection
by: Zhang, Yiming, et al.
Published: (2026)
by: Zhang, Yiming, et al.
Published: (2026)
Self-Consistent Latent Reasoning: Long Latent Sequence Reasoning for Vision-Language Model
by: Wang, Chenfeng, et al.
Published: (2026)
by: Wang, Chenfeng, et al.
Published: (2026)
LatentUMM: Dual Latent Alignment for Unified Multimodal Models
by: Luo, Yinyi, et al.
Published: (2026)
by: Luo, Yinyi, et al.
Published: (2026)
Semantic-Enriched Latent Visual Reasoning
by: Xu, Tianrun, et al.
Published: (2026)
by: Xu, Tianrun, et al.
Published: (2026)
Monet: Reasoning in Latent Visual Space Beyond Images and Language
by: Wang, Qixun, et al.
Published: (2025)
by: Wang, Qixun, et al.
Published: (2025)
PERL: Parameter Efficient Reasoning in CLIP Latent Space
by: Carnemolla, Simone, et al.
Published: (2026)
by: Carnemolla, Simone, et al.
Published: (2026)
Nodule-Aligned Latent Space Learning with LLM-Driven Multimodal Diffusion for Lung Nodule Progression Prediction
by: Song, James, et al.
Published: (2026)
by: Song, James, et al.
Published: (2026)
ShaLa: Multimodal Shared Latent Space Modelling
by: Cui, Jiali, et al.
Published: (2025)
by: Cui, Jiali, et al.
Published: (2025)
Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space
by: Chen, Chao, et al.
Published: (2025)
by: Chen, Chao, et al.
Published: (2025)
Rectifying Latent Space for Generative Single-Image Reflection Removal
by: Li, Mingjia, et al.
Published: (2025)
by: Li, Mingjia, et al.
Published: (2025)
Generative Human Motion Stylization in Latent Space
by: Guo, Chuan, et al.
Published: (2024)
by: Guo, Chuan, et al.
Published: (2024)
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs
by: Zhang, Huanyu, et al.
Published: (2025)
by: Zhang, Huanyu, et al.
Published: (2025)
Calibrating Biased Distribution in VFM-derived Latent Space via Cross-Domain Geometric Consistency
by: Ma, Yanbiao, et al.
Published: (2025)
by: Ma, Yanbiao, et al.
Published: (2025)
Visual Enhanced Depth Scaling for Multimodal Latent Reasoning
by: Han, Yudong, et al.
Published: (2026)
by: Han, Yudong, et al.
Published: (2026)
Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space
by: Meng, Zheling, et al.
Published: (2024)
by: Meng, Zheling, et al.
Published: (2024)
LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model
by: Jin, Jiachun, et al.
Published: (2026)
by: Jin, Jiachun, et al.
Published: (2026)
Seeing Space and Motion: Enhancing Latent Actions with Geometric and Dynamic Awareness for Vision-Language-Action Models
by: Cai, Zhejia, et al.
Published: (2025)
by: Cai, Zhejia, et al.
Published: (2025)
BYOCL: Build Your Own Consistent Latent with Hierarchical Representative Latent Clustering
by: Dai, Jiayue, et al.
Published: (2024)
by: Dai, Jiayue, et al.
Published: (2024)
Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)
by: Li, Bangzheng, et al.
Published: (2025)
Improving Deep Representation Learning via Auxiliary Learnable Target Coding
by: Liu, Kangjun, et al.
Published: (2023)
by: Liu, Kangjun, et al.
Published: (2023)
Latent Transfer Attack: Adversarial Examples via Generative Latent Spaces
by: Shaar, Eitan, et al.
Published: (2026)
by: Shaar, Eitan, et al.
Published: (2026)
GeoSense: Internalizing Geometric Necessity Perception for Multimodal Reasoning
by: Liu, Ruiheng, et al.
Published: (2026)
by: Liu, Ruiheng, et al.
Published: (2026)
Latent Implicit Visual Reasoning
by: Li, Kelvin, et al.
Published: (2025)
by: Li, Kelvin, et al.
Published: (2025)
Constructing Fair Latent Space for Intersection of Fairness and Explainability
by: Joo, Hyungjun, et al.
Published: (2024)
by: Joo, Hyungjun, et al.
Published: (2024)
MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
by: Mi, Yapeng, et al.
Published: (2025)
by: Mi, Yapeng, et al.
Published: (2025)
Learning Transformation-Isomorphic Latent Space for Accurate Hand Pose Estimation
by: Ren, Kaiwen, et al.
Published: (2025)
by: Ren, Kaiwen, et al.
Published: (2025)
Leveraging Latent Visual Reasoning in Silence
by: Zhu, Dongyao, et al.
Published: (2026)
by: Zhu, Dongyao, et al.
Published: (2026)
Conditional Latent Coding with Learnable Synthesized Reference for Deep Image Compression
by: Wu, Siqi, et al.
Published: (2025)
by: Wu, Siqi, et al.
Published: (2025)
Learning Multimodal Latent Space with EBM Prior and MCMC Inference
by: Yuan, Shiyu, et al.
Published: (2024)
by: Yuan, Shiyu, et al.
Published: (2024)
Latent Diffusion Inversion Requires Understanding the Latent Space
by: Rao, Mingxing, et al.
Published: (2025)
by: Rao, Mingxing, et al.
Published: (2025)
The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents
by: Sun, Yuwei, et al.
Published: (2026)
by: Sun, Yuwei, et al.
Published: (2026)
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
by: Dai, Yifan, et al.
Published: (2026)
by: Dai, Yifan, et al.
Published: (2026)
Similar Items
-
MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image Deformations
by: Hossain, Tonmoy, et al.
Published: (2023) -
GMapLatent: Geometric Mapping in Latent Space
by: Zeng, Wei, et al.
Published: (2025) -
LaRe: Latent Refocusing for Multimodal Reasoning
by: Ma, Jizheng, et al.
Published: (2025) -
Efficient Implicit Neural Compression of Point Clouds via Learnable Activation in Latent Space
by: Zhang, Yichi, et al.
Published: (2025) -
The Learnability Gap in Medical Latent Diffusion
by: Dombrowski, Mischa, et al.
Published: (2026)