Saved in:
| Main Authors: | Mo, Shentong, Li, Lanqing |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.06706 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs
by: Mo, Shentong
Published: (2024)
by: Mo, Shentong
Published: (2024)
Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation
by: Mo, Shentong, et al.
Published: (2024)
by: Mo, Shentong, et al.
Published: (2024)
Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning
by: Mo, Shentong
Published: (2026)
by: Mo, Shentong
Published: (2026)
On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)
by: Hu, Jerry Yao-Chieh, et al.
Published: (2024)
by: Hu, Jerry Yao-Chieh, et al.
Published: (2024)
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning
by: Mo, Shentong
Published: (2024)
by: Mo, Shentong
Published: (2024)
Improving Visual Representation Alignment Generation with GRPO
by: Mo, Shentong, et al.
Published: (2026)
by: Mo, Shentong, et al.
Published: (2026)
pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation
by: Mo, Shentong, et al.
Published: (2026)
by: Mo, Shentong, et al.
Published: (2026)
Distilled Protein Backbone Generation
by: Xie, Liyang, et al.
Published: (2025)
by: Xie, Liyang, et al.
Published: (2025)
GMAIL: Generative Modality Alignment for generated Image Learning
by: Mo, Shentong, et al.
Published: (2026)
by: Mo, Shentong, et al.
Published: (2026)
LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generation
by: Mo, Shentong, et al.
Published: (2026)
by: Mo, Shentong, et al.
Published: (2026)
DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture
by: Mo, Shentong, et al.
Published: (2024)
by: Mo, Shentong, et al.
Published: (2024)
Protein Counterfactuals via Diffusion-Guided Latent Optimization
by: Kłos, Weronika, et al.
Published: (2026)
by: Kłos, Weronika, et al.
Published: (2026)
Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows
by: Mo, Shentong, et al.
Published: (2026)
by: Mo, Shentong, et al.
Published: (2026)
Aligning Audio-Visual Joint Representations with an Agentic Workflow
by: Mo, Shentong, et al.
Published: (2024)
by: Mo, Shentong, et al.
Published: (2024)
Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
by: Mo, Shentong, et al.
Published: (2024)
by: Mo, Shentong, et al.
Published: (2024)
MultiMed: Massively Multimodal and Multitask Medical Understanding
by: Mo, Shentong, et al.
Published: (2024)
by: Mo, Shentong, et al.
Published: (2024)
LSPT: Long-term Spatial Prompt Tuning for Visual Representation Learning
by: Mo, Shentong, et al.
Published: (2024)
by: Mo, Shentong, et al.
Published: (2024)
A Large-scale Medical Visual Task Adaptation Benchmark
by: Mo, Shentong, et al.
Published: (2024)
by: Mo, Shentong, et al.
Published: (2024)
ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation
by: Yue, Angxiao, et al.
Published: (2025)
by: Yue, Angxiao, et al.
Published: (2025)
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
by: Shen, Xuan, et al.
Published: (2024)
by: Shen, Xuan, et al.
Published: (2024)
Backbone-Equated Diffusion OOD via Sparse Internal Snapshots
by: Rouzoumka, Yadang Alexis, et al.
Published: (2026)
by: Rouzoumka, Yadang Alexis, et al.
Published: (2026)
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
by: Mo, Shentong, et al.
Published: (2025)
by: Mo, Shentong, et al.
Published: (2025)
Constraint Decoupled Latent Diffusion for Protein Backmapping
by: Han, Xu, et al.
Published: (2024)
by: Han, Xu, et al.
Published: (2024)
JTreeformer: Graph-Transformer via Latent-Diffusion Model for Molecular Generation
by: Shi, Ji, et al.
Published: (2025)
by: Shi, Ji, et al.
Published: (2025)
SE(3)-Stochastic Flow Matching for Protein Backbone Generation
by: Bose, Avishek Joey, et al.
Published: (2023)
by: Bose, Avishek Joey, et al.
Published: (2023)
Efficient and Adaptive Human Activity Recognition via LLM Backbones
by: Bredikhin, Aleksandr, et al.
Published: (2026)
by: Bredikhin, Aleksandr, et al.
Published: (2026)
Joint Design of Protein Surface and Structure Using a Diffusion Bridge Model
by: Li, Guanlue, et al.
Published: (2025)
by: Li, Guanlue, et al.
Published: (2025)
Constrained Diffusion for Protein Design with Hard Structural Constraints
by: Christopher, Jacob K., et al.
Published: (2025)
by: Christopher, Jacob K., et al.
Published: (2025)
DiSK: A Diffusion Model for Structured Knowledge
by: Kitouni, Ouail, et al.
Published: (2023)
by: Kitouni, Ouail, et al.
Published: (2023)
From Tokenizer Bias to Backbone Capability: A Controlled Study of LLMs for Time Series Forecasting
by: Zhang, Xinyu, et al.
Published: (2025)
by: Zhang, Xinyu, et al.
Published: (2025)
Exploring Diffusion Transformer Designs via Grafting
by: Chandrasegaran, Keshigeyan, et al.
Published: (2025)
by: Chandrasegaran, Keshigeyan, et al.
Published: (2025)
LaDi-RL: Latent Diffusion Reasoning Prevents Entropy Collapse in Reinforcement Learning
by: Kang, Haoqiang, et al.
Published: (2026)
by: Kang, Haoqiang, et al.
Published: (2026)
GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Contrastive and Generative Pretraining
by: Mo, Shentong, et al.
Published: (2026)
by: Mo, Shentong, et al.
Published: (2026)
DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling
by: Lemercier, Jean-Marie, et al.
Published: (2026)
by: Lemercier, Jean-Marie, et al.
Published: (2026)
LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning
by: Kang, Haoqiang, et al.
Published: (2025)
by: Kang, Haoqiang, et al.
Published: (2025)
Structure-based RNA Design by Step-wise Optimization of Latent Diffusion Model
by: Si, Qi, et al.
Published: (2026)
by: Si, Qi, et al.
Published: (2026)
PI-Mamba: Linear-Time Protein Backbone Generation via Spectrally Initialized Flow Matching
by: Wu, Tianyu, et al.
Published: (2026)
by: Wu, Tianyu, et al.
Published: (2026)
Text-to-Audio Generation Synchronized with Videos
by: Mo, Shentong, et al.
Published: (2024)
by: Mo, Shentong, et al.
Published: (2024)
ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design
by: Zhang, Yulin, et al.
Published: (2026)
by: Zhang, Yulin, et al.
Published: (2026)
LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation
by: Huang, Yuhang, et al.
Published: (2025)
by: Huang, Yuhang, et al.
Published: (2025)
Similar Items
-
Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs
by: Mo, Shentong
Published: (2024) -
Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation
by: Mo, Shentong, et al.
Published: (2024) -
Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning
by: Mo, Shentong
Published: (2026) -
On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)
by: Hu, Jerry Yao-Chieh, et al.
Published: (2024) -
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning
by: Mo, Shentong
Published: (2024)