Guardado en:
| Autores principales: | Yu, Haiyang, Zhao, Mengyang, Lu, Jinghui, Niu, Ke, Wang, Yanjie, Yin, Weijie, Jia, Weitao, Fu, Teng, Liu, Yang, Liu, Jun, Chen, Hong |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2503.04058 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context Learning
por: Zhao, Mengyang, et al.
Publicado: (2025)
por: Zhao, Mengyang, et al.
Publicado: (2025)
OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding
por: Fu, Teng, et al.
Publicado: (2025)
por: Fu, Teng, et al.
Publicado: (2025)
ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models
por: Niu, Ke, et al.
Publicado: (2025)
por: Niu, Ke, et al.
Publicado: (2025)
CLEAR: Context-Aware Learning with End-to-End Mask-Free Inference for Adaptive Video Subtitle Removal
por: He, Qingdong, et al.
Publicado: (2026)
por: He, Qingdong, et al.
Publicado: (2026)
From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation
por: Niu, Ke, et al.
Publicado: (2025)
por: Niu, Ke, et al.
Publicado: (2025)
Text-to-Edit: Controllable End-to-End Video Ad Creation via Multimodal LLMs
por: Cheng, Dabing, et al.
Publicado: (2025)
por: Cheng, Dabing, et al.
Publicado: (2025)
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
por: Zhong, Yufeng, et al.
Publicado: (2026)
por: Zhong, Yufeng, et al.
Publicado: (2026)
VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs
por: Zhu, Jiaying, et al.
Publicado: (2025)
por: Zhu, Jiaying, et al.
Publicado: (2025)
CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning
por: Niu, Ke, et al.
Publicado: (2025)
por: Niu, Ke, et al.
Publicado: (2025)
End-to-End Vision Tokenizer Tuning
por: Wang, Wenxuan, et al.
Publicado: (2025)
por: Wang, Wenxuan, et al.
Publicado: (2025)
ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay
por: Lu, Fanbin, et al.
Publicado: (2025)
por: Lu, Fanbin, et al.
Publicado: (2025)
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
por: Jiang, Bo, et al.
Publicado: (2024)
por: Jiang, Bo, et al.
Publicado: (2024)
Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
por: Ge, Xuri, et al.
Publicado: (2024)
por: Ge, Xuri, et al.
Publicado: (2024)
Interpretable Oracle Bone Script Decipherment through Radical and Pictographic Analysis with LVLMs
por: Peng, Kaixin, et al.
Publicado: (2025)
por: Peng, Kaixin, et al.
Publicado: (2025)
End-to-End Beam Retrieval for Multi-Hop Question Answering
por: Zhang, Jiahao, et al.
Publicado: (2023)
por: Zhang, Jiahao, et al.
Publicado: (2023)
iPad: Iterative Proposal-centric End-to-End Autonomous Driving
por: Guo, Ke, et al.
Publicado: (2025)
por: Guo, Ke, et al.
Publicado: (2025)
End-To-End Underwater Video Enhancement: Dataset and Model
por: Du, Dazhao, et al.
Publicado: (2024)
por: Du, Dazhao, et al.
Publicado: (2024)
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
por: Wu, Jiannan, et al.
Publicado: (2024)
por: Wu, Jiannan, et al.
Publicado: (2024)
Self-Retrieval: End-to-End Information Retrieval with One Large Language Model
por: Tang, Qiaoyu, et al.
Publicado: (2024)
por: Tang, Qiaoyu, et al.
Publicado: (2024)
Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
por: Ma, Enhui, et al.
Publicado: (2024)
por: Ma, Enhui, et al.
Publicado: (2024)
End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost
por: Tan, Qitao, et al.
Publicado: (2025)
por: Tan, Qitao, et al.
Publicado: (2025)
MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation
por: Jia, Weinan, et al.
Publicado: (2025)
por: Jia, Weinan, et al.
Publicado: (2025)
Dynamical Mass Loss at the End of TP-AGB stars
por: Cui, Yingzhen, et al.
Publicado: (2026)
por: Cui, Yingzhen, et al.
Publicado: (2026)
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
por: Chen, Guangke, et al.
Publicado: (2025)
por: Chen, Guangke, et al.
Publicado: (2025)
ImaginationPolicy: Towards Generalizable, Precise and Reliable End-to-End Policy for Robotic Manipulation
por: Lu, Dekun, et al.
Publicado: (2025)
por: Lu, Dekun, et al.
Publicado: (2025)
VECTOR-Drive: Tightly Coupled Vision-Language and Trajectory Expert Routing for End-to-End Autonomous Driving
por: Zhao, Rui, et al.
Publicado: (2026)
por: Zhao, Rui, et al.
Publicado: (2026)
REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching
por: Nie, Han, et al.
Publicado: (2024)
por: Nie, Han, et al.
Publicado: (2024)
Navigating the Deep: End-to-End Extraction on Deep Neural Networks
por: Liu, Haolin, et al.
Publicado: (2025)
por: Liu, Haolin, et al.
Publicado: (2025)
Structured Labeling Enables Faster Vision-Language Models for End-to-End Autonomous Driving
por: Jiang, Hao, et al.
Publicado: (2025)
por: Jiang, Hao, et al.
Publicado: (2025)
United We Stand: Towards End-to-End Log-based Fault Diagnosis via Interactive Multi-Task Learning
por: He, Minghua, et al.
Publicado: (2025)
por: He, Minghua, et al.
Publicado: (2025)
End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling
por: Liang, Jianxin, et al.
Publicado: (2024)
por: Liang, Jianxin, et al.
Publicado: (2024)
Towards End-to-End Alignment of User Satisfaction via Questionnaire in Video Recommendation
por: Li, Na, et al.
Publicado: (2026)
por: Li, Na, et al.
Publicado: (2026)
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
por: He, Mao-Kui, et al.
Publicado: (2024)
por: He, Mao-Kui, et al.
Publicado: (2024)
Bridging Perception and Planning: Towards End-to-End Planning for Signal Temporal Logic Tasks
por: Ye, Bowen, et al.
Publicado: (2025)
por: Ye, Bowen, et al.
Publicado: (2025)
DSDrive: Distilling Large Language Model for Lightweight End-to-End Autonomous Driving with Unified Reasoning and Planning
por: Liu, Wenru, et al.
Publicado: (2025)
por: Liu, Wenru, et al.
Publicado: (2025)
The End of Manual Decoding: Towards Truly End-to-End Language Models
por: Wang, Zhichao, et al.
Publicado: (2025)
por: Wang, Zhichao, et al.
Publicado: (2025)
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving
por: Zheng, Peiru, et al.
Publicado: (2024)
por: Zheng, Peiru, et al.
Publicado: (2024)
Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems
por: Yu, Ye, et al.
Publicado: (2026)
por: Yu, Ye, et al.
Publicado: (2026)
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
por: Fu, Haoyu, et al.
Publicado: (2025)
por: Fu, Haoyu, et al.
Publicado: (2025)
Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models
por: Feng, Sheng, et al.
Publicado: (2024)
por: Feng, Sheng, et al.
Publicado: (2024)
Ejemplares similares
-
IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context Learning
por: Zhao, Mengyang, et al.
Publicado: (2025) -
OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding
por: Fu, Teng, et al.
Publicado: (2025) -
ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models
por: Niu, Ke, et al.
Publicado: (2025) -
CLEAR: Context-Aware Learning with End-to-End Mask-Free Inference for Adaptive Video Subtitle Removal
por: He, Qingdong, et al.
Publicado: (2026) -
From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation
por: Niu, Ke, et al.
Publicado: (2025)