Saved in:
Bibliographic Details
Main Authors: Cohen, Danielle, Halpern, Yoni, Kahlon, Noam, Oren, Joel, Berkovitch, Omri, Caduri, Sapir, Dagan, Ido, Efros, Anatoly
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.12423
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Understanding user intents from UI interaction trajectories remains a challenging, yet crucial, frontier in intelligent agent development. While massive, datacenter-based, multi-modal large language models (MLLMs) possess greater capacity to handle the complexities of such sequences, smaller models which can run on-device to provide a privacy-preserving, low-cost, and low-latency user experience, struggle with accurate intent inference. We address these limitations by introducing a novel decomposed approach: first, we perform structured interaction summarization, capturing key information from each user action. Second, we perform intent extraction using a fine-tuned model operating on the aggregated summaries. This method improves intent understanding in resource-constrained models, even surpassing the base performance of large MLLMs.