Saved in:
| Main Authors: | Lu, Xudong, Chen, Yinghao, Wu, Renshou, Gao, Haohao, Chen, Xi, Yang, Xue, Zhao, Xiangyu, Zhou, Aojun, Li, Fangyuan, Wen, Yafei, Chen, Xiaoxin, Ren, Shuai, Li, Hongsheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.06019 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?
by: Lu, Xudong, et al.
Published: (2025)
by: Lu, Xudong, et al.
Published: (2025)
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
by: Lu, Xudong, et al.
Published: (2024)
by: Lu, Xudong, et al.
Published: (2024)
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
by: Xiao, Han, et al.
Published: (2025)
by: Xiao, Han, et al.
Published: (2025)
EdgeInfinite: A Memory-Efficient Infinite-Context Transformer for Edge Devices
by: Chen, Jiyu, et al.
Published: (2025)
by: Chen, Jiyu, et al.
Published: (2025)
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
by: Xiao, Han, et al.
Published: (2025)
by: Xiao, Han, et al.
Published: (2025)
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs
by: Lu, Zimu, et al.
Published: (2024)
by: Lu, Zimu, et al.
Published: (2024)
NODI: Out-Of-Distribution Detection with Noise from Diffusion
by: Zhou, Jingqiu, et al.
Published: (2024)
by: Zhou, Jingqiu, et al.
Published: (2024)
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
by: Lu, Xudong, et al.
Published: (2024)
by: Lu, Xudong, et al.
Published: (2024)
UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents
by: Xiao, Han, et al.
Published: (2026)
by: Xiao, Han, et al.
Published: (2026)
Imp: Highly Capable Large Multimodal Models for Mobile Devices
by: Shao, Zhenwei, et al.
Published: (2024)
by: Shao, Zhenwei, et al.
Published: (2024)
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
by: Chen, Xinyan, et al.
Published: (2025)
by: Chen, Xinyan, et al.
Published: (2025)
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
by: Lu, Xudong, et al.
Published: (2024)
by: Lu, Xudong, et al.
Published: (2024)
PhoStream: Benchmarking Real-World Streaming for Omnimodal Assistants in Mobile Scenarios
by: Lu, Xudong, et al.
Published: (2026)
by: Lu, Xudong, et al.
Published: (2026)
TerDiT: Ternary Diffusion Models with Transformers
by: Lu, Xudong, et al.
Published: (2024)
by: Lu, Xudong, et al.
Published: (2024)
QuesGenie: Intelligent Multimodal Question Generation
by: Mubarak, Ahmed, et al.
Published: (2025)
by: Mubarak, Ahmed, et al.
Published: (2025)
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation
by: Ren, Houxing, et al.
Published: (2024)
by: Ren, Houxing, et al.
Published: (2024)
ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language Models
by: Yang, Jackie Junrui, et al.
Published: (2023)
by: Yang, Jackie Junrui, et al.
Published: (2023)
QueryGenie: Making LLM-Based Database Querying Transparent and Controllable
by: Chen, Longfei, et al.
Published: (2025)
by: Chen, Longfei, et al.
Published: (2025)
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
by: Wang, Ke, et al.
Published: (2025)
by: Wang, Ke, et al.
Published: (2025)
Reflect to Inform: Boosting Multimodal Reasoning via Information-Gain-Driven Verification
by: Lv, Shuai, et al.
Published: (2026)
by: Lv, Shuai, et al.
Published: (2026)
Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning
by: Dogan, Mustafa, et al.
Published: (2024)
by: Dogan, Mustafa, et al.
Published: (2024)
MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs
by: Gao, Yufei, et al.
Published: (2025)
by: Gao, Yufei, et al.
Published: (2025)
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
by: Lu, Zimu, et al.
Published: (2024)
by: Lu, Zimu, et al.
Published: (2024)
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
by: Chen, Zhanpeng, et al.
Published: (2025)
by: Chen, Zhanpeng, et al.
Published: (2025)
London Blue Light Collaboration Evaluation: A Comparative Analysis of Spatio temporal Patterns on Emergency Services by London Ambulance Service and London Fire Brigade
by: Li, Fangyuan, et al.
Published: (2025)
by: Li, Fangyuan, et al.
Published: (2025)
CodingGenie: A Proactive LLM-Powered Programming Assistant
by: Zhao, Sebastian, et al.
Published: (2025)
by: Zhao, Sebastian, et al.
Published: (2025)
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
by: Shaker, Abdelrahman, et al.
Published: (2026)
by: Shaker, Abdelrahman, et al.
Published: (2026)
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning
by: Lu, Zimu, et al.
Published: (2024)
by: Lu, Zimu, et al.
Published: (2024)
Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling
by: Zou, Hongjian, et al.
Published: (2026)
by: Zou, Hongjian, et al.
Published: (2026)
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning
by: Shi, Weikang, et al.
Published: (2025)
by: Shi, Weikang, et al.
Published: (2025)
EdgeInfinite-Instruct: Bridging SFT-Based Optimization and NPU-Level Efficiency for Edge Devices
by: Chen, Jiyu, et al.
Published: (2025)
by: Chen, Jiyu, et al.
Published: (2025)
Mask What Matters: Mitigating Object Hallucinations in Multimodal Large Language Models with Object-Aligned Visual Contrastive Decoding
by: Chen, Boqi, et al.
Published: (2026)
by: Chen, Boqi, et al.
Published: (2026)
NOAH: Learning Pairwise Object Category Attentions for Image Classification
by: Li, Chao, et al.
Published: (2024)
by: Li, Chao, et al.
Published: (2024)
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
by: Liao, Yue, et al.
Published: (2025)
by: Liao, Yue, et al.
Published: (2025)
Probability-Consistent Preference Optimization for Enhanced LLM Reasoning
by: Yang, Yunqiao, et al.
Published: (2025)
by: Yang, Yunqiao, et al.
Published: (2025)
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
by: Lv, Jiaxi, et al.
Published: (2023)
by: Lv, Jiaxi, et al.
Published: (2023)
Memory-Efficient Split Federated Learning for LLM Fine-Tuning on Heterogeneous Mobile Devices
by: Chen, Xiaopei, et al.
Published: (2025)
by: Chen, Xiaopei, et al.
Published: (2025)
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
by: Fang, Rongyao, et al.
Published: (2025)
by: Fang, Rongyao, et al.
Published: (2025)
Combining Discrepancy-Confusion Uncertainty and Calibration Diversity for Active Fine-Grained Image Classification
by: Jin, Yinghao, et al.
Published: (2025)
by: Jin, Yinghao, et al.
Published: (2025)
Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy
by: Zeng, Min, et al.
Published: (2024)
by: Zeng, Min, et al.
Published: (2024)
Similar Items
-
SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?
by: Lu, Xudong, et al.
Published: (2025) -
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
by: Lu, Xudong, et al.
Published: (2024) -
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
by: Xiao, Han, et al.
Published: (2025) -
EdgeInfinite: A Memory-Efficient Infinite-Context Transformer for Edge Devices
by: Chen, Jiyu, et al.
Published: (2025) -
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
by: Xiao, Han, et al.
Published: (2025)