Saved in:
| Main Authors: | Li, Zhuoling, Rahmani, Hossein, Zhang, Jiarui, Xue, Yu, Mirmehdi, Majid, Kuen, Jason, Gu, Jiuxiang, Liu, Jun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.20470 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automatic Method Illustration Generation for AI Scientific Papers via Drawing Middleware Creation, Evolution, and Orchestration
by: Li, Zhuoling, et al.
Published: (2026)
by: Li, Zhuoling, et al.
Published: (2026)
LongDiff: Training-Free Long Video Generation in One Go
by: Li, Zhuoling, et al.
Published: (2025)
by: Li, Zhuoling, et al.
Published: (2025)
DiffGraph: Heterogeneous Graph Diffusion Model
by: Li, Zongwei, et al.
Published: (2025)
by: Li, Zongwei, et al.
Published: (2025)
Learning to Generate Cross-Task Unexploitable Examples
by: Qu, Haoxuan, et al.
Published: (2025)
by: Qu, Haoxuan, et al.
Published: (2025)
When Visual Privacy Protection Meets Multimodal Large Language Models
by: Hui, Xiaofei, et al.
Published: (2026)
by: Hui, Xiaofei, et al.
Published: (2026)
ToolFG: Towards Well-Grounded Fine-Grained Image Classification
by: Xue, Yu, et al.
Published: (2026)
by: Xue, Yu, et al.
Published: (2026)
SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation
by: Li, Shufan, et al.
Published: (2026)
by: Li, Shufan, et al.
Published: (2026)
Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers
by: Zhang, Zhengbo, et al.
Published: (2024)
by: Zhang, Zhengbo, et al.
Published: (2024)
ImageFolder: Autoregressive Image Generation with Folded Tokens
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
DisC-GS: Discontinuity-aware Gaussian Splatting
by: Qu, Haoxuan, et al.
Published: (2024)
by: Qu, Haoxuan, et al.
Published: (2024)
Automated Radiology Report Generation: A Review of Recent Advances
by: Sloan, Phillip, et al.
Published: (2024)
by: Sloan, Phillip, et al.
Published: (2024)
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
by: Li, Shufan, et al.
Published: (2025)
by: Li, Shufan, et al.
Published: (2025)
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
by: Qiu, Kai, et al.
Published: (2025)
by: Qiu, Kai, et al.
Published: (2025)
SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation
by: Zhang, Hang, et al.
Published: (2024)
by: Zhang, Hang, et al.
Published: (2024)
Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models
by: Li, Shufan, et al.
Published: (2025)
by: Li, Shufan, et al.
Published: (2025)
Customization Assistant for Text-to-image Generation
by: Zhou, Yufan, et al.
Published: (2023)
by: Zhou, Yufan, et al.
Published: (2023)
Visual-textual Dermatoglyphic Animal Biometrics: A First Case Study on Panthera tigris
by: Li, Wenshuo, et al.
Published: (2025)
by: Li, Wenshuo, et al.
Published: (2025)
Co-STAR: Collaborative Curriculum Self-Training with Adaptive Regularization for Source-Free Video Domain Adaptation
by: Dadashzadeh, Amirhossein, et al.
Published: (2025)
by: Dadashzadeh, Amirhossein, et al.
Published: (2025)
Unsupervised View-Invariant Human Posture Representation
by: Sardari, Faegheh, et al.
Published: (2021)
by: Sardari, Faegheh, et al.
Published: (2021)
Clinically-aligned Multi-modal Chest X-ray Classification
by: Sloan, Phillip, et al.
Published: (2025)
by: Sloan, Phillip, et al.
Published: (2025)
Image Tokenizer Needs Post-Training
by: Qiu, Kai, et al.
Published: (2025)
by: Qiu, Kai, et al.
Published: (2025)
Prediction of Thrombectomy Functional Outcomes using Multimodal Data
by: Samak, Zeynel A., et al.
Published: (2020)
by: Samak, Zeynel A., et al.
Published: (2020)
TranSOP: Transformer-based Multimodal Classification for Stroke Treatment Outcome Prediction
by: Samak, Zeynel A., et al.
Published: (2023)
by: Samak, Zeynel A., et al.
Published: (2023)
Automatic Prediction of Stroke Treatment Outcomes: Latest Advances and Perspectives
by: Samak, Zeynel A., et al.
Published: (2024)
by: Samak, Zeynel A., et al.
Published: (2024)
METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling
by: Li, Bingxuan, et al.
Published: (2025)
by: Li, Bingxuan, et al.
Published: (2025)
KG-FairDiff: Knowledge Graph-Guided Prompt Refinement for Demographically Fair Text-to-Image Generation
by: Davoodi, Farbod, et al.
Published: (2026)
by: Davoodi, Farbod, et al.
Published: (2026)
Is Monitoring Enough? Strategic Agent Selection For Stealthy Attack in Multi-Agent Discussions
by: Xiang, Qiuchi, et al.
Published: (2026)
by: Xiang, Qiuchi, et al.
Published: (2026)
ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition
by: Brookes, Otto, et al.
Published: (2024)
by: Brookes, Otto, et al.
Published: (2024)
Trajectory-guided Motion Perception for Facial Expression Quality Assessment in Neurological Disorders
by: Duan, Shuchao, et al.
Published: (2025)
by: Duan, Shuchao, et al.
Published: (2025)
High‐Velocity Impact Response and Comfort Properties of Discrete‐Droplet‐Coated Cushioning Composite Fabrics
by: Zhuoling Yu, et al.
Published: (2026)
by: Zhuoling Yu, et al.
Published: (2026)
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
by: Zhou, Shijie, et al.
Published: (2025)
by: Zhou, Shijie, et al.
Published: (2025)
ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models
by: Zhang, Jianyi, et al.
Published: (2024)
by: Zhang, Jianyi, et al.
Published: (2024)
LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models
by: Li, Shufan, et al.
Published: (2026)
by: Li, Shufan, et al.
Published: (2026)
XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation
by: Li, Zhuoling, et al.
Published: (2026)
by: Li, Zhuoling, et al.
Published: (2026)
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
by: Foo, Lin Geng, et al.
Published: (2023)
by: Foo, Lin Geng, et al.
Published: (2023)
An Image-like Diffusion Method for Human-Object Interaction Detection
by: Hui, Xiaofei, et al.
Published: (2025)
by: Hui, Xiaofei, et al.
Published: (2025)
Refer to Any Segmentation Mask Group With Vision-Language Prompts
by: Cao, Shengcao, et al.
Published: (2025)
by: Cao, Shengcao, et al.
Published: (2025)
TSTMotion: Training-free Scene-aware Text-to-motion Generation
by: Guo, Ziyan, et al.
Published: (2025)
by: Guo, Ziyan, et al.
Published: (2025)
Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation
by: Zeng, Chengxi, et al.
Published: (2023)
by: Zeng, Chengxi, et al.
Published: (2023)
Similar Items
-
Automatic Method Illustration Generation for AI Scientific Papers via Drawing Middleware Creation, Evolution, and Orchestration
by: Li, Zhuoling, et al.
Published: (2026) -
LongDiff: Training-Free Long Video Generation in One Go
by: Li, Zhuoling, et al.
Published: (2025) -
DiffGraph: Heterogeneous Graph Diffusion Model
by: Li, Zongwei, et al.
Published: (2025) -
Learning to Generate Cross-Task Unexploitable Examples
by: Qu, Haoxuan, et al.
Published: (2025) -
When Visual Privacy Protection Meets Multimodal Large Language Models
by: Hui, Xiaofei, et al.
Published: (2026)