:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lu, Xudong, Chen, Yinghao, Wu, Renshou, Gao, Haohao, Chen, Xi, Yang, Xue, Zhao, Xiangyu, Zhou, Aojun, Li, Fangyuan, Wen, Yafei, Chen, Xiaoxin, Ren, Shuai, Li, Hongsheng
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.06019
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?
by: Lu, Xudong, et al.
Published: (2025)

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
by: Lu, Xudong, et al.
Published: (2024)

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
by: Xiao, Han, et al.
Published: (2025)

EdgeInfinite: A Memory-Efficient Infinite-Context Transformer for Edge Devices
by: Chen, Jiyu, et al.
Published: (2025)

Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
by: Xiao, Han, et al.
Published: (2025)

MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs
by: Lu, Zimu, et al.
Published: (2024)

NODI: Out-Of-Distribution Detection with Noise from Diffusion
by: Zhou, Jingqiu, et al.
Published: (2024)

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
by: Lu, Xudong, et al.
Published: (2024)

UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents
by: Xiao, Han, et al.
Published: (2026)

Imp: Highly Capable Large Multimodal Models for Mobile Devices
by: Shao, Zhenwei, et al.
Published: (2024)

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
by: Chen, Xinyan, et al.
Published: (2025)

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
by: Lu, Xudong, et al.
Published: (2024)

PhoStream: Benchmarking Real-World Streaming for Omnimodal Assistants in Mobile Scenarios
by: Lu, Xudong, et al.
Published: (2026)

TerDiT: Ternary Diffusion Models with Transformers
by: Lu, Xudong, et al.
Published: (2024)

QuesGenie: Intelligent Multimodal Question Generation
by: Mubarak, Ahmed, et al.
Published: (2025)

ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation
by: Ren, Houxing, et al.
Published: (2024)

ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language Models
by: Yang, Jackie Junrui, et al.
Published: (2023)

QueryGenie: Making LLM-Based Database Querying Transparent and Controllable
by: Chen, Longfei, et al.
Published: (2025)

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
by: Wang, Ke, et al.
Published: (2025)

Reflect to Inform: Boosting Multimodal Reasoning via Information-Gain-Driven Verification
by: Lv, Shuai, et al.
Published: (2026)

Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning
by: Dogan, Mustafa, et al.
Published: (2024)

MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs
by: Gao, Yufei, et al.
Published: (2025)

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
by: Lu, Zimu, et al.
Published: (2024)

Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
by: Chen, Zhanpeng, et al.
Published: (2025)

London Blue Light Collaboration Evaluation: A Comparative Analysis of Spatio temporal Patterns on Emergency Services by London Ambulance Service and London Fire Brigade
by: Li, Fangyuan, et al.
Published: (2025)

CodingGenie: A Proactive LLM-Powered Programming Assistant
by: Zhao, Sebastian, et al.
Published: (2025)

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
by: Shaker, Abdelrahman, et al.
Published: (2026)

Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning
by: Lu, Zimu, et al.
Published: (2024)

Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling
by: Zou, Hongjian, et al.
Published: (2026)

MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning
by: Shi, Weikang, et al.
Published: (2025)

EdgeInfinite-Instruct: Bridging SFT-Based Optimization and NPU-Level Efficiency for Edge Devices
by: Chen, Jiyu, et al.
Published: (2025)

Mask What Matters: Mitigating Object Hallucinations in Multimodal Large Language Models with Object-Aligned Visual Contrastive Decoding
by: Chen, Boqi, et al.
Published: (2026)

NOAH: Learning Pairwise Object Category Attentions for Image Classification
by: Li, Chao, et al.
Published: (2024)

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
by: Liao, Yue, et al.
Published: (2025)

Probability-Consistent Preference Optimization for Enhanced LLM Reasoning
by: Yang, Yunqiao, et al.
Published: (2025)

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
by: Lv, Jiaxi, et al.
Published: (2023)

Memory-Efficient Split Federated Learning for LLM Fine-Tuning on Heterogeneous Mobile Devices
by: Chen, Xiaopei, et al.
Published: (2025)

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
by: Fang, Rongyao, et al.
Published: (2025)

Combining Discrepancy-Confusion Uncertainty and Calibration Diversity for Active Fine-Grained Image Classification
by: Jin, Yinghao, et al.
Published: (2025)

Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy
by: Zeng, Min, et al.
Published: (2024)