:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Zhuoling, Rahmani, Hossein, Zhang, Jiarui, Xue, Yu, Mirmehdi, Majid, Kuen, Jason, Gu, Jiuxiang, Liu, Jun
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.20470
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Automatic Method Illustration Generation for AI Scientific Papers via Drawing Middleware Creation, Evolution, and Orchestration
by: Li, Zhuoling, et al.
Published: (2026)

LongDiff: Training-Free Long Video Generation in One Go
by: Li, Zhuoling, et al.
Published: (2025)

DiffGraph: Heterogeneous Graph Diffusion Model
by: Li, Zongwei, et al.
Published: (2025)

Learning to Generate Cross-Task Unexploitable Examples
by: Qu, Haoxuan, et al.
Published: (2025)

When Visual Privacy Protection Meets Multimodal Large Language Models
by: Hui, Xiaofei, et al.
Published: (2026)

ToolFG: Towards Well-Grounded Fine-Grained Image Classification
by: Xue, Yu, et al.
Published: (2026)

SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation
by: Li, Shufan, et al.
Published: (2026)

Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers
by: Zhang, Zhengbo, et al.
Published: (2024)

ImageFolder: Autoregressive Image Generation with Folded Tokens
by: Li, Xiang, et al.
Published: (2024)

XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
by: Li, Xiang, et al.
Published: (2024)

DisC-GS: Discontinuity-aware Gaussian Splatting
by: Qu, Haoxuan, et al.
Published: (2024)

Automated Radiology Report Generation: A Review of Recent Advances
by: Sloan, Phillip, et al.
Published: (2024)

Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
by: Li, Shufan, et al.
Published: (2025)

Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
by: Qiu, Kai, et al.
Published: (2025)

SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation
by: Zhang, Hang, et al.
Published: (2024)

Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models
by: Li, Shufan, et al.
Published: (2025)

Customization Assistant for Text-to-image Generation
by: Zhou, Yufan, et al.
Published: (2023)

Visual-textual Dermatoglyphic Animal Biometrics: A First Case Study on Panthera tigris
by: Li, Wenshuo, et al.
Published: (2025)

Co-STAR: Collaborative Curriculum Self-Training with Adaptive Regularization for Source-Free Video Domain Adaptation
by: Dadashzadeh, Amirhossein, et al.
Published: (2025)

Unsupervised View-Invariant Human Posture Representation
by: Sardari, Faegheh, et al.
Published: (2021)

Clinically-aligned Multi-modal Chest X-ray Classification
by: Sloan, Phillip, et al.
Published: (2025)

Image Tokenizer Needs Post-Training
by: Qiu, Kai, et al.
Published: (2025)

Prediction of Thrombectomy Functional Outcomes using Multimodal Data
by: Samak, Zeynel A., et al.
Published: (2020)

TranSOP: Transformer-based Multimodal Classification for Stroke Treatment Outcome Prediction
by: Samak, Zeynel A., et al.
Published: (2023)

Automatic Prediction of Stroke Treatment Outcomes: Latest Advances and Perspectives
by: Samak, Zeynel A., et al.
Published: (2024)

METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling
by: Li, Bingxuan, et al.
Published: (2025)

KG-FairDiff: Knowledge Graph-Guided Prompt Refinement for Demographically Fair Text-to-Image Generation
by: Davoodi, Farbod, et al.
Published: (2026)

Is Monitoring Enough? Strategic Agent Selection For Stealthy Attack in Multi-Agent Discussions
by: Xiang, Qiuchi, et al.
Published: (2026)

ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition
by: Brookes, Otto, et al.
Published: (2024)

Trajectory-guided Motion Perception for Facial Expression Quality Assessment in Neurological Disorders
by: Duan, Shuchao, et al.
Published: (2025)

High‐Velocity Impact Response and Comfort Properties of Discrete‐Droplet‐Coated Cushioning Composite Fabrics
by: Zhuoling Yu, et al.
Published: (2026)

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
by: Zhou, Shijie, et al.
Published: (2025)

ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models
by: Zhang, Jianyi, et al.
Published: (2024)

LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models
by: Li, Shufan, et al.
Published: (2026)

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation
by: Li, Zhuoling, et al.
Published: (2026)

AI-Generated Content (AIGC) for Various Data Modalities: A Survey
by: Foo, Lin Geng, et al.
Published: (2023)

An Image-like Diffusion Method for Human-Object Interaction Detection
by: Hui, Xiaofei, et al.
Published: (2025)

Refer to Any Segmentation Mask Group With Vision-Language Prompts
by: Cao, Shengcao, et al.
Published: (2025)

TSTMotion: Training-free Scene-aware Text-to-motion Generation
by: Guo, Ziyan, et al.
Published: (2025)

Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation
by: Zeng, Chengxi, et al.
Published: (2023)