_version_ 1866913169632395264
author Jiang, Shali
Zheng, Hua
Liu, Boyang
Chen, Laming
Lov, Kenny
Xu, Chuanqi
Ding, Lisang
Zhou, Qinghai
Cui, Can
Liu, Xiaolong
Liu, Xiaoyi
Badr, Yasmine
Xu, Xin
Yang, Jiyan
Wen, Ellie Dingqiao
Akkerhuis, Gerard Jonathan Mugisha
Guan, Chenxiao
Jin, Rong
Qiu, Ruichao
Chen, Xian
Xu, Shifu
Zhou, Zhehui
Chen, Ping
Yang, Rui
Chen, Haicheng
Meng, Xiangge
Zhou, Song
Kharod, Dharak
Xu, Shuyu
Jin, Qiang
Yang, Qiao
Zhu, Wankun
Huang, Qin
Huang, Yuzhen
Liu, Darren
Aggarwal, Parish
Zhou, Hui
Wang, Erzhuo
Chang, Shuo
Gan, Xiaorui
Chen, Wenlin
Kolay, Santanu
Li, Huayu
author_facet Jiang, Shali
Zheng, Hua
Liu, Boyang
Chen, Laming
Lov, Kenny
Xu, Chuanqi
Ding, Lisang
Zhou, Qinghai
Cui, Can
Liu, Xiaolong
Liu, Xiaoyi
Badr, Yasmine
Xu, Xin
Yang, Jiyan
Wen, Ellie Dingqiao
Akkerhuis, Gerard Jonathan Mugisha
Guan, Chenxiao
Jin, Rong
Qiu, Ruichao
Chen, Xian
Xu, Shifu
Zhou, Zhehui
Chen, Ping
Yang, Rui
Chen, Haicheng
Meng, Xiangge
Zhou, Song
Kharod, Dharak
Xu, Shuyu
Jin, Qiang
Yang, Qiao
Zhu, Wankun
Huang, Qin
Huang, Yuzhen
Liu, Darren
Aggarwal, Parish
Zhou, Hui
Wang, Erzhuo
Chang, Shuo
Gan, Xiaorui
Chen, Wenlin
Kolay, Santanu
Li, Huayu
contents Knowledge distillation (KD) transfers a single scalar prediction from a large foundation model (FM) to compact vertical models (VMs), suffering from diminishing transfer ratio -- the fraction of FM improvement captured by the VM -- as a single scalar cannot convey the rich intermediate knowledge that larger FMs learn. To address this bottleneck, we propose LoopFM (Learning frOm HistOrical ReP*resentations of FM), a framework that opens a high-bandwidth transfer channel by structuring FM intermediate embeddings as input features (e.g., user history sequence) for downstream VMs, without requiring real-time FM inference at serving and architectural coupling between FM and VM. We provide a theoretical framework for LoopFM with a gain decomposition and transfer-ratio analysis. On three public benchmarks, LoopFM demonstrates strong AUC improvements (e.g., 6\%+ on TaobaoAd) and complementary knowledge transfer capability with KD. On industrial-scale systems (billions of examples, trillion-parameter FMs), LoopFM approximately doubles the knowledge transfer ratio on top of KD, delivering a +0.5\% conversion improvement in Y1H1, and a +1.03\% and +1.22\% conversion improvement from two individual launches respectively in Y1H2.
format Preprint
id arxiv_https___arxiv_org_abs_2605_29280
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle LoopFM: Learning frOm HistOrical RePresentations of Foundation Model for Recommendation
Jiang, Shali
Zheng, Hua
Liu, Boyang
Chen, Laming
Lov, Kenny
Xu, Chuanqi
Ding, Lisang
Zhou, Qinghai
Cui, Can
Liu, Xiaolong
Liu, Xiaoyi
Badr, Yasmine
Xu, Xin
Yang, Jiyan
Wen, Ellie Dingqiao
Akkerhuis, Gerard Jonathan Mugisha
Guan, Chenxiao
Jin, Rong
Qiu, Ruichao
Chen, Xian
Xu, Shifu
Zhou, Zhehui
Chen, Ping
Yang, Rui
Chen, Haicheng
Meng, Xiangge
Zhou, Song
Kharod, Dharak
Xu, Shuyu
Jin, Qiang
Yang, Qiao
Zhu, Wankun
Huang, Qin
Huang, Yuzhen
Liu, Darren
Aggarwal, Parish
Zhou, Hui
Wang, Erzhuo
Chang, Shuo
Gan, Xiaorui
Chen, Wenlin
Kolay, Santanu
Li, Huayu
Machine Learning
Artificial Intelligence
Information Retrieval
Knowledge distillation (KD) transfers a single scalar prediction from a large foundation model (FM) to compact vertical models (VMs), suffering from diminishing transfer ratio -- the fraction of FM improvement captured by the VM -- as a single scalar cannot convey the rich intermediate knowledge that larger FMs learn. To address this bottleneck, we propose LoopFM (Learning frOm HistOrical ReP*resentations of FM), a framework that opens a high-bandwidth transfer channel by structuring FM intermediate embeddings as input features (e.g., user history sequence) for downstream VMs, without requiring real-time FM inference at serving and architectural coupling between FM and VM. We provide a theoretical framework for LoopFM with a gain decomposition and transfer-ratio analysis. On three public benchmarks, LoopFM demonstrates strong AUC improvements (e.g., 6\%+ on TaobaoAd) and complementary knowledge transfer capability with KD. On industrial-scale systems (billions of examples, trillion-parameter FMs), LoopFM approximately doubles the knowledge transfer ratio on top of KD, delivering a +0.5\% conversion improvement in Y1H1, and a +1.03\% and +1.22\% conversion improvement from two individual launches respectively in Y1H2.
title LoopFM: Learning frOm HistOrical RePresentations of Foundation Model for Recommendation
topic Machine Learning
Artificial Intelligence
Information Retrieval
url https://arxiv.org/abs/2605.29280