Guardado en:
| Autores principales: | Subbaraman, Pranav, Li, Shufan, Zhao, Siyan, Grover, Aditya |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2512.01094 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
PopAlign: Population-Level Alignment for Fair Text-to-Image Generation
por: Li, Shufan, et al.
Publicado: (2024)
por: Li, Shufan, et al.
Publicado: (2024)
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following
por: Li, Shufan, et al.
Publicado: (2023)
por: Li, Shufan, et al.
Publicado: (2023)
MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants
por: Bansal, Hritik, et al.
Publicado: (2024)
por: Bansal, Hritik, et al.
Publicado: (2024)
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
por: Li, Shufan, et al.
Publicado: (2024)
por: Li, Shufan, et al.
Publicado: (2024)
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
por: Li, Shufan, et al.
Publicado: (2025)
por: Li, Shufan, et al.
Publicado: (2025)
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
por: Li, Shufan, et al.
Publicado: (2025)
por: Li, Shufan, et al.
Publicado: (2025)
SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation
por: Li, Shufan, et al.
Publicado: (2026)
por: Li, Shufan, et al.
Publicado: (2026)
Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization
por: Li, Shufan, et al.
Publicado: (2026)
por: Li, Shufan, et al.
Publicado: (2026)
Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models
por: Li, Shufan, et al.
Publicado: (2025)
por: Li, Shufan, et al.
Publicado: (2025)
Accelerating Masked Image Generation by Learning Latent Controlled Dynamics
por: Zhu, Kaiwen, et al.
Publicado: (2026)
por: Zhu, Kaiwen, et al.
Publicado: (2026)
Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation
por: Luo, Yifu, et al.
Publicado: (2025)
por: Luo, Yifu, et al.
Publicado: (2025)
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
por: Li, Shufan, et al.
Publicado: (2024)
por: Li, Shufan, et al.
Publicado: (2024)
LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models
por: Li, Shufan, et al.
Publicado: (2026)
por: Li, Shufan, et al.
Publicado: (2026)
Accelerating Conditional Prompt Learning via Masked Image Modeling for Vision-Language Models
por: Bui, Phuoc-Nguyen, et al.
Publicado: (2025)
por: Bui, Phuoc-Nguyen, et al.
Publicado: (2025)
EpiMask: Leveraging Epipolar Distance Based Masks in Cross-Attention for Satellite Image Matching
por: Deshmukh, Rahul, et al.
Publicado: (2026)
por: Deshmukh, Rahul, et al.
Publicado: (2026)
CoSimGen: Controllable Diffusion Model for Simultaneous Image and Mask Generation
por: Bose, Rupak, et al.
Publicado: (2025)
por: Bose, Rupak, et al.
Publicado: (2025)
LaViDa: A Large Diffusion Language Model for Multimodal Understanding
por: Li, Shufan, et al.
Publicado: (2025)
por: Li, Shufan, et al.
Publicado: (2025)
DogWeave: High-Fidelity 3D Canine Reconstruction from a Single Image via Normal Fusion and Conditional Inpainting
por: Sun, Shufan, et al.
Publicado: (2026)
por: Sun, Shufan, et al.
Publicado: (2026)
Generation then Reconstruction: Accelerating Masked Autoregressive Models via Two-Stage Sampling
por: Yan, Feihong, et al.
Publicado: (2025)
por: Yan, Feihong, et al.
Publicado: (2025)
SimGen: A Diffusion-Based Framework for Simultaneous Surgical Image and Segmentation Mask Generation
por: Bhat, Aditya, et al.
Publicado: (2025)
por: Bhat, Aditya, et al.
Publicado: (2025)
LlamaSeg: Image Segmentation via Autoregressive Mask Generation
por: Deng, Jiru, et al.
Publicado: (2025)
por: Deng, Jiru, et al.
Publicado: (2025)
Conditional Panoramic Image Generation via Masked Autoregressive Modeling
por: Wang, Chaoyang, et al.
Publicado: (2025)
por: Wang, Chaoyang, et al.
Publicado: (2025)
MaskFocus: Focusing Policy Optimization on Critical Steps for Masked Image Generation
por: Zhang, Guohui, et al.
Publicado: (2025)
por: Zhang, Guohui, et al.
Publicado: (2025)
Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference
por: Yu, Zihao, et al.
Publicado: (2023)
por: Yu, Zihao, et al.
Publicado: (2023)
Few-shot Image Generation via Masked Discrimination
por: Zhu, Jingyuan, et al.
Publicado: (2022)
por: Zhu, Jingyuan, et al.
Publicado: (2022)
Mask2IV: Interaction-Centric Video Generation via Mask Trajectories
por: Li, Gen, et al.
Publicado: (2025)
por: Li, Gen, et al.
Publicado: (2025)
Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning
por: Wei, Fanyue, et al.
Publicado: (2024)
por: Wei, Fanyue, et al.
Publicado: (2024)
LazyMAR: Accelerating Masked Autoregressive Models via Feature Caching
por: Yan, Feihong, et al.
Publicado: (2025)
por: Yan, Feihong, et al.
Publicado: (2025)
Scaling Vision-and-Language Navigation With Offline RL
por: Bundele, Valay, et al.
Publicado: (2024)
por: Bundele, Valay, et al.
Publicado: (2024)
Mask Image Watermarking
por: Hu, Runyi, et al.
Publicado: (2025)
por: Hu, Runyi, et al.
Publicado: (2025)
Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and Visible Images
por: Pei, Shufan, et al.
Publicado: (2024)
por: Pei, Shufan, et al.
Publicado: (2024)
RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
por: Xu, Jingqi, et al.
Publicado: (2025)
por: Xu, Jingqi, et al.
Publicado: (2025)
Membership Inference Attack Against Masked Image Modeling
por: Li, Zheng, et al.
Publicado: (2024)
por: Li, Zheng, et al.
Publicado: (2024)
Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt
por: Huang, Zhiqi, et al.
Publicado: (2024)
por: Huang, Zhiqi, et al.
Publicado: (2024)
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning
por: Li, Weiqi, et al.
Publicado: (2025)
por: Li, Weiqi, et al.
Publicado: (2025)
Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking
por: Zheng, Zirui, et al.
Publicado: (2025)
por: Zheng, Zirui, et al.
Publicado: (2025)
Adaptive Language-Aware Image Reflection Removal Network
por: Fang, Siyan, et al.
Publicado: (2026)
por: Fang, Siyan, et al.
Publicado: (2026)
Masked Gamma-SSL: Learning Uncertainty Estimation via Masked Image Modeling
por: Williams, David S. W., et al.
Publicado: (2024)
por: Williams, David S. W., et al.
Publicado: (2024)
Augmented Efficiency: Reducing Memory Footprint and Accelerating Inference for 3D Semantic Segmentation through Hybrid Vision
por: Krishnan, Aditya, et al.
Publicado: (2024)
por: Krishnan, Aditya, et al.
Publicado: (2024)
Generative Model-Based Feature Attention Module for Video Action Analysis
por: Wang, Guiqin, et al.
Publicado: (2025)
por: Wang, Guiqin, et al.
Publicado: (2025)
Ejemplares similares
-
PopAlign: Population-Level Alignment for Fair Text-to-Image Generation
por: Li, Shufan, et al.
Publicado: (2024) -
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following
por: Li, Shufan, et al.
Publicado: (2023) -
MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants
por: Bansal, Hritik, et al.
Publicado: (2024) -
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
por: Li, Shufan, et al.
Publicado: (2024) -
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
por: Li, Shufan, et al.
Publicado: (2025)