:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhou, Milton, Qin, Sizhong, Li, Yongzhi, Chen, Quan, Jiang, Peng
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.28366
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MedGEN-Bench: Contextually entangled benchmark for open-ended multimodal medical generation
by: Yang, Junjie, et al.
Published: (2025)

Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
by: Zhou, Jiaming, et al.
Published: (2023)

MM2Latent: Text-to-facial image generation and editing in GANs with multimodal assistance
by: Meng, Debin, et al.
Published: (2024)

VDT-Auto: End-to-end Autonomous Driving with VLM-Guided Diffusion Transformers
by: Guo, Ziang, et al.
Published: (2025)

Narrative Weaver: Towards Controllable Long-Range Visual Consistency with Multi-Modal Conditioning
by: Yao, Zhengjian, et al.
Published: (2026)

Adaptive Video Distillation: Mitigating Oversaturation and Temporal Collapse in Few-Step Generation
by: You, Yuyang, et al.
Published: (2026)

Auto-Regressive Surface Cutting
by: Li, Yang, et al.
Published: (2025)

Tokenization Allows Multimodal Large Language Models to Understand, Generate and Edit Architectural Floor Plans
by: Qin, Sizhong, et al.
Published: (2026)

DreamScene: 3D Gaussian-based End-to-end Text-to-3D Scene Generation
by: Li, Haoran, et al.
Published: (2025)

UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving
by: Lu, Hao, et al.
Published: (2025)

DREAM: Document Reconstruction via End-to-end Autoregressive Model
by: Li, Xin, et al.
Published: (2025)

End2end-ALARA: Approaching the ALARA Law in CT Imaging with End-to-end Learning
by: Tao, Xi, et al.
Published: (2025)

RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network
by: Luu, Van-Tin, et al.
Published: (2025)

ParkingE2E: Camera-based End-to-end Parking Network, from Images to Planning
by: Li, Changze, et al.
Published: (2024)

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards
by: Le, Minh-Quan, et al.
Published: (2025)

CorrectAD: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving
by: Ma, Enhui, et al.
Published: (2025)

RID-TWIN: An end-to-end pipeline for automatic face de-identification in videos
by: Mukherjee, Anirban, et al.
Published: (2024)

ECHOPulse: ECG controlled echocardio-grams video generation
by: Li, Yiwei, et al.
Published: (2024)

VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing
by: Aitrouga, Abdelilah, et al.
Published: (2025)

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
by: Cong, Yuren, et al.
Published: (2023)

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
by: Zhou, Zewei, et al.
Published: (2025)

End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection
by: Wang, Fei, et al.
Published: (2025)

STAR: Scale-wise Text-conditioned AutoRegressive image generation
by: Ma, Xiaoxiao, et al.
Published: (2024)

Adversarial AutoMixup
by: Qin, Huafeng, et al.
Published: (2023)

End-to-end autoencoding architecture for the simultaneous generation of medical images and corresponding segmentation masks
by: Kebaili, Aghiles, et al.
Published: (2023)

Can video generation replace cinematographers? Research on the cinematic language of generated video
by: Li, Xiaozhe, et al.
Published: (2024)

Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos
by: Pan, Yulin, et al.
Published: (2023)

Generalized Trajectory Scoring for End-to-end Multimodal Planning
by: Li, Zhenxin, et al.
Published: (2025)

SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder
by: Kamenetsky, Ronen, et al.
Published: (2025)

PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving
by: Chen, Zhili, et al.
Published: (2023)

Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset
by: Ancarani, Elisa, et al.
Published: (2025)

Scaling medical imaging report generation with multimodal reinforcement learning
by: Liu, Qianchu, et al.
Published: (2026)

End-to-end Surface Optimization for Light Control
by: Sun, Yuou, et al.
Published: (2024)

MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
by: Fang, Xi, et al.
Published: (2024)

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
by: Li, Zhenxin, et al.
Published: (2024)

2D bidirectional gated recurrent unit convolutional Neural networks for end-to-end violence detection In videos
by: Traoré, Abdarahmane, et al.
Published: (2024)

AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection
by: Chao, Yuhao, et al.
Published: (2025)

HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction
by: Zhou, Yi, et al.
Published: (2024)

GraphAD: Interaction Scene Graph for End-to-end Autonomous Driving
by: Zhang, Yunpeng, et al.
Published: (2024)

Referring Expression Instance Retrieval and A Strong End-to-End Baseline
by: Hao, Xiangzhao, et al.
Published: (2025)