Saved in:
Bibliographic Details
Main Authors: Zeng, Runjia, Wang, Qifan, Guan, Qiang, Tang, Ruixiang, Huang, Lifu, Wang, Zhenting, Zhang, Xueling, Han, Cheng, Liu, Dongfang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.19739
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917225508634624
author Zeng, Runjia
Wang, Qifan
Guan, Qiang
Tang, Ruixiang
Huang, Lifu
Wang, Zhenting
Zhang, Xueling
Han, Cheng
Liu, Dongfang
author_facet Zeng, Runjia
Wang, Qifan
Guan, Qiang
Tang, Ruixiang
Huang, Lifu
Wang, Zhenting
Zhang, Xueling
Han, Cheng
Liu, Dongfang
contents Fine tuning has been regarded as a de facto approach for adapting large language models (LLMs) to downstream tasks, but the high training memory consumption inherited from LLMs makes this process inefficient. Among existing memory efficient approaches, activation-related optimization has proven particularly effective, as activations consistently dominate overall memory consumption. Although prior arts offer various activation optimization strategies, their data-agnostic nature ultimately results in ineffective and unstable fine tuning. In this paper, we propose TokenSeek, a universal plugin solution for various transformer-based models through instance-aware token seeking and ditching, achieving significant fine-tuning memory savings (e.g., requiring only 14.8% of the memory on Llama3.2 1B) with on-par or even better performance. Furthermore, our interpretable token seeking process reveals the underlying reasons for its effectiveness, offering valuable insights for future research on token efficiency. Homepage: https://runjia.tech/iclr_tokenseek/
format Preprint
id arxiv_https___arxiv_org_abs_2601_19739
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching
Zeng, Runjia
Wang, Qifan
Guan, Qiang
Tang, Ruixiang
Huang, Lifu
Wang, Zhenting
Zhang, Xueling
Han, Cheng
Liu, Dongfang
Computation and Language
Artificial Intelligence
Fine tuning has been regarded as a de facto approach for adapting large language models (LLMs) to downstream tasks, but the high training memory consumption inherited from LLMs makes this process inefficient. Among existing memory efficient approaches, activation-related optimization has proven particularly effective, as activations consistently dominate overall memory consumption. Although prior arts offer various activation optimization strategies, their data-agnostic nature ultimately results in ineffective and unstable fine tuning. In this paper, we propose TokenSeek, a universal plugin solution for various transformer-based models through instance-aware token seeking and ditching, achieving significant fine-tuning memory savings (e.g., requiring only 14.8% of the memory on Llama3.2 1B) with on-par or even better performance. Furthermore, our interpretable token seeking process reveals the underlying reasons for its effectiveness, offering valuable insights for future research on token efficiency. Homepage: https://runjia.tech/iclr_tokenseek/
title TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2601.19739