:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mo, Shentong, Li, Lanqing
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.06706
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs
by: Mo, Shentong
Published: (2024)

Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation
by: Mo, Shentong, et al.
Published: (2024)

Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning
by: Mo, Shentong
Published: (2026)

On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)
by: Hu, Jerry Yao-Chieh, et al.
Published: (2024)

The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning
by: Mo, Shentong
Published: (2024)

Improving Visual Representation Alignment Generation with GRPO
by: Mo, Shentong, et al.
Published: (2026)

pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation
by: Mo, Shentong, et al.
Published: (2026)

Distilled Protein Backbone Generation
by: Xie, Liyang, et al.
Published: (2025)

GMAIL: Generative Modality Alignment for generated Image Learning
by: Mo, Shentong, et al.
Published: (2026)

LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generation
by: Mo, Shentong, et al.
Published: (2026)

DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture
by: Mo, Shentong, et al.
Published: (2024)

Protein Counterfactuals via Diffusion-Guided Latent Optimization
by: Kłos, Weronika, et al.
Published: (2026)

Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows
by: Mo, Shentong, et al.
Published: (2026)

Aligning Audio-Visual Joint Representations with an Agentic Workflow
by: Mo, Shentong, et al.
Published: (2024)

Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
by: Mo, Shentong, et al.
Published: (2024)

MultiMed: Massively Multimodal and Multitask Medical Understanding
by: Mo, Shentong, et al.
Published: (2024)

LSPT: Long-term Spatial Prompt Tuning for Visual Representation Learning
by: Mo, Shentong, et al.
Published: (2024)

A Large-scale Medical Visual Task Adaptation Benchmark
by: Mo, Shentong, et al.
Published: (2024)

ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation
by: Yue, Angxiao, et al.
Published: (2025)

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
by: Shen, Xuan, et al.
Published: (2024)

Backbone-Equated Diffusion OOD via Sparse Internal Snapshots
by: Rouzoumka, Yadang Alexis, et al.
Published: (2026)

DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
by: Mo, Shentong, et al.
Published: (2025)

Constraint Decoupled Latent Diffusion for Protein Backmapping
by: Han, Xu, et al.
Published: (2024)

JTreeformer: Graph-Transformer via Latent-Diffusion Model for Molecular Generation
by: Shi, Ji, et al.
Published: (2025)

SE(3)-Stochastic Flow Matching for Protein Backbone Generation
by: Bose, Avishek Joey, et al.
Published: (2023)

Efficient and Adaptive Human Activity Recognition via LLM Backbones
by: Bredikhin, Aleksandr, et al.
Published: (2026)

Joint Design of Protein Surface and Structure Using a Diffusion Bridge Model
by: Li, Guanlue, et al.
Published: (2025)

Constrained Diffusion for Protein Design with Hard Structural Constraints
by: Christopher, Jacob K., et al.
Published: (2025)

DiSK: A Diffusion Model for Structured Knowledge
by: Kitouni, Ouail, et al.
Published: (2023)

From Tokenizer Bias to Backbone Capability: A Controlled Study of LLMs for Time Series Forecasting
by: Zhang, Xinyu, et al.
Published: (2025)

Exploring Diffusion Transformer Designs via Grafting
by: Chandrasegaran, Keshigeyan, et al.
Published: (2025)

LaDi-RL: Latent Diffusion Reasoning Prevents Entropy Collapse in Reinforcement Learning
by: Kang, Haoqiang, et al.
Published: (2026)

GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Contrastive and Generative Pretraining
by: Mo, Shentong, et al.
Published: (2026)

DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling
by: Lemercier, Jean-Marie, et al.
Published: (2026)

LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning
by: Kang, Haoqiang, et al.
Published: (2025)

Structure-based RNA Design by Step-wise Optimization of Latent Diffusion Model
by: Si, Qi, et al.
Published: (2026)

PI-Mamba: Linear-Time Protein Backbone Generation via Spectrally Initialized Flow Matching
by: Wu, Tianyu, et al.
Published: (2026)

Text-to-Audio Generation Synchronized with Videos
by: Mo, Shentong, et al.
Published: (2024)

ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design
by: Zhang, Yulin, et al.
Published: (2026)

LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation
by: Huang, Yuhang, et al.
Published: (2025)