Saved in:
| Main Authors: | Zhu, Jiaying, Zhu, Yurui, Lu, Xin, Yan, Wenrui, Li, Dong, Liu, Kunlin, Fu, Xueyang, Zha, Zheng-Jun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.16598 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera
by: Xu, Senyan, et al.
Published: (2024)
by: Xu, Senyan, et al.
Published: (2024)
Elucidating and Endowing the Diffusion Training Paradigm for General Image Restoration
by: Lu, Xin, et al.
Published: (2025)
by: Lu, Xin, et al.
Published: (2025)
Generative Recommender with End-to-End Learnable Item Tokenization
by: Liu, Enze, et al.
Published: (2024)
by: Liu, Enze, et al.
Published: (2024)
End-to-End Vision Tokenizer Tuning
by: Wang, Wenxuan, et al.
Published: (2025)
by: Wang, Wenxuan, et al.
Published: (2025)
Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder
by: Ma, Yiyang, et al.
Published: (2024)
by: Ma, Yiyang, et al.
Published: (2024)
FROST-Drive: Scalable and Efficient End-to-End Driving with a Frozen Vision Encoder
by: Dong, Zeyu, et al.
Published: (2026)
by: Dong, Zeyu, et al.
Published: (2026)
AdaTok: Adaptive Token Compression with Object-Aware Representations for Efficient Multimodal LLMs
by: Zhang, Xinliang, et al.
Published: (2025)
by: Zhang, Xinliang, et al.
Published: (2025)
From Pixels to Nucleotides: End-to-End Token-Based Video Compression for DNA Storage
by: Ruan, Cihan, et al.
Published: (2026)
by: Ruan, Cihan, et al.
Published: (2026)
End-to-end Learnable Clustering for Intent Learning in Recommendation
by: Liu, Yue, et al.
Published: (2024)
by: Liu, Yue, et al.
Published: (2024)
REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching
by: Nie, Han, et al.
Published: (2024)
by: Nie, Han, et al.
Published: (2024)
Efficient Multi-Camera Tokenization with Triplanes for End-to-End Driving
by: Ivanovic, Boris, et al.
Published: (2025)
by: Ivanovic, Boris, et al.
Published: (2025)
Efficient End-to-End Visual Document Understanding with Rationale Distillation
by: Zhu, Wang, et al.
Published: (2023)
by: Zhu, Wang, et al.
Published: (2023)
Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning
by: Wang, Kunyu, et al.
Published: (2025)
by: Wang, Kunyu, et al.
Published: (2025)
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving
by: Zheng, Peiru, et al.
Published: (2024)
by: Zheng, Peiru, et al.
Published: (2024)
End-to-End Spatial-Temporal Transformer for Real-time 4D HOI Reconstruction
by: Zhang, Haoyu, et al.
Published: (2026)
by: Zhang, Haoyu, et al.
Published: (2026)
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
by: Wu, Jiannan, et al.
Published: (2024)
by: Wu, Jiannan, et al.
Published: (2024)
FourierMamba: Fourier Learning Integration with State Space Models for Image Deraining
by: Li, Dong, et al.
Published: (2024)
by: Li, Dong, et al.
Published: (2024)
ForgeryGPT: A Multimodal LLM for Interpretable Image Forgery Detection and Localization
by: Zhang, Fanrui, et al.
Published: (2024)
by: Zhang, Fanrui, et al.
Published: (2024)
MaskFuser: Masked Fusion of Joint Multi-Modal Tokenization for End-to-End Autonomous Driving
by: Duan, Yiqun, et al.
Published: (2024)
by: Duan, Yiqun, et al.
Published: (2024)
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding
by: Xiong, Haomiao, et al.
Published: (2025)
by: Xiong, Haomiao, et al.
Published: (2025)
Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs
by: Dong, Zeyu, et al.
Published: (2024)
by: Dong, Zeyu, et al.
Published: (2024)
TokenSeg: Efficient 3D Medical Image Segmentation via Hierarchical Visual Token Compression
by: Zeng, Sen, et al.
Published: (2026)
by: Zeng, Sen, et al.
Published: (2026)
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
by: Ahn, Young Jin, et al.
Published: (2024)
by: Ahn, Young Jin, et al.
Published: (2024)
Vision without Images: End-to-End Computer Vision from Single Compressive Measurements
by: Pan, Fengpu, et al.
Published: (2025)
by: Pan, Fengpu, et al.
Published: (2025)
Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction
by: Rogers, Ethan G., et al.
Published: (2025)
by: Rogers, Ethan G., et al.
Published: (2025)
Efficient Visual Transformer by Learnable Token Merging
by: Wang, Yancheng, et al.
Published: (2024)
by: Wang, Yancheng, et al.
Published: (2024)
LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models
by: Mozaffari, Mohammad, et al.
Published: (2026)
by: Mozaffari, Mohammad, et al.
Published: (2026)
EVE: Towards End-to-End Video Subtitle Extraction with Vision-Language Models
by: Yu, Haiyang, et al.
Published: (2025)
by: Yu, Haiyang, et al.
Published: (2025)
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
by: Ni, Jiliang, et al.
Published: (2025)
by: Ni, Jiliang, et al.
Published: (2025)
Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference
by: Yubeaton, Patrick, et al.
Published: (2025)
by: Yubeaton, Patrick, et al.
Published: (2025)
Tracking by Detection and Query: An Efficient End-to-End Framework for Multi-Object Tracking
by: Jia, Shukun, et al.
Published: (2024)
by: Jia, Shukun, et al.
Published: (2024)
ResearchGPT: Benchmarking and Training LLMs for End-to-End Computer Science Research Workflows
by: Wang, Penghao, et al.
Published: (2025)
by: Wang, Penghao, et al.
Published: (2025)
End‐to‐End Compressed Meshlet Rendering
by: D. Mlakar, et al.
Published: (2024)
by: D. Mlakar, et al.
Published: (2024)
EVA: Efficient Reinforcement Learning for End-to-End Video Agent
by: Zhang, Yaolun, et al.
Published: (2026)
by: Zhang, Yaolun, et al.
Published: (2026)
RelationVLM: Making Large Vision-Language Models Understand Visual Relations
by: Huang, Zhipeng, et al.
Published: (2024)
by: Huang, Zhipeng, et al.
Published: (2024)
LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
by: Song, Nan, et al.
Published: (2025)
by: Song, Nan, et al.
Published: (2025)
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
ClusterRCA: An End-to-End Approach for Network Fault Localization and Classification for HPC System
by: Sun, Yongqian, et al.
Published: (2025)
by: Sun, Yongqian, et al.
Published: (2025)
End-to-End Multi-Modal Diffusion Mamba
by: Lu, Chunhao, et al.
Published: (2025)
by: Lu, Chunhao, et al.
Published: (2025)
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
by: Chen, Yu, et al.
Published: (2026)
by: Chen, Yu, et al.
Published: (2026)
Similar Items
-
DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera
by: Xu, Senyan, et al.
Published: (2024) -
Elucidating and Endowing the Diffusion Training Paradigm for General Image Restoration
by: Lu, Xin, et al.
Published: (2025) -
Generative Recommender with End-to-End Learnable Item Tokenization
by: Liu, Enze, et al.
Published: (2024) -
End-to-End Vision Tokenizer Tuning
by: Wang, Wenxuan, et al.
Published: (2025) -
Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder
by: Ma, Yiyang, et al.
Published: (2024)