:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhu, Jiaying, Zhu, Yurui, Lu, Xin, Yan, Wenrui, Li, Dong, Liu, Kunlin, Fu, Xueyang, Zha, Zheng-Jun
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.16598
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera
by: Xu, Senyan, et al.
Published: (2024)

Elucidating and Endowing the Diffusion Training Paradigm for General Image Restoration
by: Lu, Xin, et al.
Published: (2025)

Generative Recommender with End-to-End Learnable Item Tokenization
by: Liu, Enze, et al.
Published: (2024)

End-to-End Vision Tokenizer Tuning
by: Wang, Wenxuan, et al.
Published: (2025)

Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder
by: Ma, Yiyang, et al.
Published: (2024)

FROST-Drive: Scalable and Efficient End-to-End Driving with a Frozen Vision Encoder
by: Dong, Zeyu, et al.
Published: (2026)

AdaTok: Adaptive Token Compression with Object-Aware Representations for Efficient Multimodal LLMs
by: Zhang, Xinliang, et al.
Published: (2025)

From Pixels to Nucleotides: End-to-End Token-Based Video Compression for DNA Storage
by: Ruan, Cihan, et al.
Published: (2026)

End-to-end Learnable Clustering for Intent Learning in Recommendation
by: Liu, Yue, et al.
Published: (2024)

REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching
by: Nie, Han, et al.
Published: (2024)

Efficient Multi-Camera Tokenization with Triplanes for End-to-End Driving
by: Ivanovic, Boris, et al.
Published: (2025)

Efficient End-to-End Visual Document Understanding with Rationale Distillation
by: Zhu, Wang, et al.
Published: (2023)

Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning
by: Wang, Kunyu, et al.
Published: (2025)

SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving
by: Zheng, Peiru, et al.
Published: (2024)

End-to-End Spatial-Temporal Transformer for Real-time 4D HOI Reconstruction
by: Zhang, Haoyu, et al.
Published: (2026)

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
by: Wu, Jiannan, et al.
Published: (2024)

FourierMamba: Fourier Learning Integration with State Space Models for Image Deraining
by: Li, Dong, et al.
Published: (2024)

ForgeryGPT: A Multimodal LLM for Interpretable Image Forgery Detection and Localization
by: Zhang, Fanrui, et al.
Published: (2024)

MaskFuser: Masked Fusion of Joint Multi-Modal Tokenization for End-to-End Autonomous Driving
by: Duan, Yiqun, et al.
Published: (2024)

3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding
by: Xiong, Haomiao, et al.
Published: (2025)

Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs
by: Dong, Zeyu, et al.
Published: (2024)

TokenSeg: Efficient 3D Medical Image Segmentation via Hierarchical Visual Token Compression
by: Zeng, Sen, et al.
Published: (2026)

SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
by: Ahn, Young Jin, et al.
Published: (2024)

Vision without Images: End-to-End Computer Vision from Single Compressive Measurements
by: Pan, Fengpu, et al.
Published: (2025)

Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction
by: Rogers, Ethan G., et al.
Published: (2025)

Efficient Visual Transformer by Learnable Token Merging
by: Wang, Yancheng, et al.
Published: (2024)

LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models
by: Mozaffari, Mohammad, et al.
Published: (2026)

EVE: Towards End-to-End Video Subtitle Extraction with Vision-Language Models
by: Yu, Haiyang, et al.
Published: (2025)

From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
by: Ni, Jiliang, et al.
Published: (2025)

Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference
by: Yubeaton, Patrick, et al.
Published: (2025)

Tracking by Detection and Query: An Efficient End-to-End Framework for Multi-Object Tracking
by: Jia, Shukun, et al.
Published: (2024)

ResearchGPT: Benchmarking and Training LLMs for End-to-End Computer Science Research Workflows
by: Wang, Penghao, et al.
Published: (2025)

End‐to‐End Compressed Meshlet Rendering
by: D. Mlakar, et al.
Published: (2024)

EVA: Efficient Reinforcement Learning for End-to-End Video Agent
by: Zhang, Yaolun, et al.
Published: (2026)

RelationVLM: Making Large Vision-Language Models Understand Visual Relations
by: Huang, Zhipeng, et al.
Published: (2024)

LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
by: Song, Nan, et al.
Published: (2025)

Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)

ClusterRCA: An End-to-End Approach for Network Fault Localization and Classification for HPC System
by: Sun, Yongqian, et al.
Published: (2025)

End-to-End Multi-Modal Diffusion Mamba
by: Lu, Chunhao, et al.
Published: (2025)

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
by: Chen, Yu, et al.
Published: (2026)