Saved in:
| Main Authors: | Zhou, Milton, Qin, Sizhong, Li, Yongzhi, Chen, Quan, Jiang, Peng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.28366 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MedGEN-Bench: Contextually entangled benchmark for open-ended multimodal medical generation
by: Yang, Junjie, et al.
Published: (2025)
by: Yang, Junjie, et al.
Published: (2025)
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
by: Zhou, Jiaming, et al.
Published: (2023)
by: Zhou, Jiaming, et al.
Published: (2023)
MM2Latent: Text-to-facial image generation and editing in GANs with multimodal assistance
by: Meng, Debin, et al.
Published: (2024)
by: Meng, Debin, et al.
Published: (2024)
VDT-Auto: End-to-end Autonomous Driving with VLM-Guided Diffusion Transformers
by: Guo, Ziang, et al.
Published: (2025)
by: Guo, Ziang, et al.
Published: (2025)
Narrative Weaver: Towards Controllable Long-Range Visual Consistency with Multi-Modal Conditioning
by: Yao, Zhengjian, et al.
Published: (2026)
by: Yao, Zhengjian, et al.
Published: (2026)
Adaptive Video Distillation: Mitigating Oversaturation and Temporal Collapse in Few-Step Generation
by: You, Yuyang, et al.
Published: (2026)
by: You, Yuyang, et al.
Published: (2026)
Auto-Regressive Surface Cutting
by: Li, Yang, et al.
Published: (2025)
by: Li, Yang, et al.
Published: (2025)
Tokenization Allows Multimodal Large Language Models to Understand, Generate and Edit Architectural Floor Plans
by: Qin, Sizhong, et al.
Published: (2026)
by: Qin, Sizhong, et al.
Published: (2026)
DreamScene: 3D Gaussian-based End-to-end Text-to-3D Scene Generation
by: Li, Haoran, et al.
Published: (2025)
by: Li, Haoran, et al.
Published: (2025)
UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving
by: Lu, Hao, et al.
Published: (2025)
by: Lu, Hao, et al.
Published: (2025)
DREAM: Document Reconstruction via End-to-end Autoregressive Model
by: Li, Xin, et al.
Published: (2025)
by: Li, Xin, et al.
Published: (2025)
End2end-ALARA: Approaching the ALARA Law in CT Imaging with End-to-end Learning
by: Tao, Xi, et al.
Published: (2025)
by: Tao, Xi, et al.
Published: (2025)
RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network
by: Luu, Van-Tin, et al.
Published: (2025)
by: Luu, Van-Tin, et al.
Published: (2025)
ParkingE2E: Camera-based End-to-end Parking Network, from Images to Planning
by: Li, Changze, et al.
Published: (2024)
by: Li, Changze, et al.
Published: (2024)
What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards
by: Le, Minh-Quan, et al.
Published: (2025)
by: Le, Minh-Quan, et al.
Published: (2025)
CorrectAD: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving
by: Ma, Enhui, et al.
Published: (2025)
by: Ma, Enhui, et al.
Published: (2025)
RID-TWIN: An end-to-end pipeline for automatic face de-identification in videos
by: Mukherjee, Anirban, et al.
Published: (2024)
by: Mukherjee, Anirban, et al.
Published: (2024)
ECHOPulse: ECG controlled echocardio-grams video generation
by: Li, Yiwei, et al.
Published: (2024)
by: Li, Yiwei, et al.
Published: (2024)
VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing
by: Aitrouga, Abdelilah, et al.
Published: (2025)
by: Aitrouga, Abdelilah, et al.
Published: (2025)
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
by: Cong, Yuren, et al.
Published: (2023)
by: Cong, Yuren, et al.
Published: (2023)
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
by: Zhou, Zewei, et al.
Published: (2025)
by: Zhou, Zewei, et al.
Published: (2025)
End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection
by: Wang, Fei, et al.
Published: (2025)
by: Wang, Fei, et al.
Published: (2025)
STAR: Scale-wise Text-conditioned AutoRegressive image generation
by: Ma, Xiaoxiao, et al.
Published: (2024)
by: Ma, Xiaoxiao, et al.
Published: (2024)
Adversarial AutoMixup
by: Qin, Huafeng, et al.
Published: (2023)
by: Qin, Huafeng, et al.
Published: (2023)
End-to-end autoencoding architecture for the simultaneous generation of medical images and corresponding segmentation masks
by: Kebaili, Aghiles, et al.
Published: (2023)
by: Kebaili, Aghiles, et al.
Published: (2023)
Can video generation replace cinematographers? Research on the cinematic language of generated video
by: Li, Xiaozhe, et al.
Published: (2024)
by: Li, Xiaozhe, et al.
Published: (2024)
Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos
by: Pan, Yulin, et al.
Published: (2023)
by: Pan, Yulin, et al.
Published: (2023)
Generalized Trajectory Scoring for End-to-end Multimodal Planning
by: Li, Zhenxin, et al.
Published: (2025)
by: Li, Zhenxin, et al.
Published: (2025)
SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder
by: Kamenetsky, Ronen, et al.
Published: (2025)
by: Kamenetsky, Ronen, et al.
Published: (2025)
PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving
by: Chen, Zhili, et al.
Published: (2023)
by: Chen, Zhili, et al.
Published: (2023)
Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset
by: Ancarani, Elisa, et al.
Published: (2025)
by: Ancarani, Elisa, et al.
Published: (2025)
Scaling medical imaging report generation with multimodal reinforcement learning
by: Liu, Qianchu, et al.
Published: (2026)
by: Liu, Qianchu, et al.
Published: (2026)
End-to-end Surface Optimization for Light Control
by: Sun, Yuou, et al.
Published: (2024)
by: Sun, Yuou, et al.
Published: (2024)
MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
by: Fang, Xi, et al.
Published: (2024)
by: Fang, Xi, et al.
Published: (2024)
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
by: Li, Zhenxin, et al.
Published: (2024)
by: Li, Zhenxin, et al.
Published: (2024)
2D bidirectional gated recurrent unit convolutional Neural networks for end-to-end violence detection In videos
by: Traoré, Abdarahmane, et al.
Published: (2024)
by: Traoré, Abdarahmane, et al.
Published: (2024)
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection
by: Chao, Yuhao, et al.
Published: (2025)
by: Chao, Yuhao, et al.
Published: (2025)
HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction
by: Zhou, Yi, et al.
Published: (2024)
by: Zhou, Yi, et al.
Published: (2024)
GraphAD: Interaction Scene Graph for End-to-end Autonomous Driving
by: Zhang, Yunpeng, et al.
Published: (2024)
by: Zhang, Yunpeng, et al.
Published: (2024)
Referring Expression Instance Retrieval and A Strong End-to-End Baseline
by: Hao, Xiangzhao, et al.
Published: (2025)
by: Hao, Xiangzhao, et al.
Published: (2025)
Similar Items
-
MedGEN-Bench: Contextually entangled benchmark for open-ended multimodal medical generation
by: Yang, Junjie, et al.
Published: (2025) -
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
by: Zhou, Jiaming, et al.
Published: (2023) -
MM2Latent: Text-to-facial image generation and editing in GANs with multimodal assistance
by: Meng, Debin, et al.
Published: (2024) -
VDT-Auto: End-to-end Autonomous Driving with VLM-Guided Diffusion Transformers
by: Guo, Ziang, et al.
Published: (2025) -
Narrative Weaver: Towards Controllable Long-Range Visual Consistency with Multi-Modal Conditioning
by: Yao, Zhengjian, et al.
Published: (2026)