Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.29280 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913169632395264 |
|---|---|
| author | Jiang, Shali Zheng, Hua Liu, Boyang Chen, Laming Lov, Kenny Xu, Chuanqi Ding, Lisang Zhou, Qinghai Cui, Can Liu, Xiaolong Liu, Xiaoyi Badr, Yasmine Xu, Xin Yang, Jiyan Wen, Ellie Dingqiao Akkerhuis, Gerard Jonathan Mugisha Guan, Chenxiao Jin, Rong Qiu, Ruichao Chen, Xian Xu, Shifu Zhou, Zhehui Chen, Ping Yang, Rui Chen, Haicheng Meng, Xiangge Zhou, Song Kharod, Dharak Xu, Shuyu Jin, Qiang Yang, Qiao Zhu, Wankun Huang, Qin Huang, Yuzhen Liu, Darren Aggarwal, Parish Zhou, Hui Wang, Erzhuo Chang, Shuo Gan, Xiaorui Chen, Wenlin Kolay, Santanu Li, Huayu |
| author_facet | Jiang, Shali Zheng, Hua Liu, Boyang Chen, Laming Lov, Kenny Xu, Chuanqi Ding, Lisang Zhou, Qinghai Cui, Can Liu, Xiaolong Liu, Xiaoyi Badr, Yasmine Xu, Xin Yang, Jiyan Wen, Ellie Dingqiao Akkerhuis, Gerard Jonathan Mugisha Guan, Chenxiao Jin, Rong Qiu, Ruichao Chen, Xian Xu, Shifu Zhou, Zhehui Chen, Ping Yang, Rui Chen, Haicheng Meng, Xiangge Zhou, Song Kharod, Dharak Xu, Shuyu Jin, Qiang Yang, Qiao Zhu, Wankun Huang, Qin Huang, Yuzhen Liu, Darren Aggarwal, Parish Zhou, Hui Wang, Erzhuo Chang, Shuo Gan, Xiaorui Chen, Wenlin Kolay, Santanu Li, Huayu |
| contents | Knowledge distillation (KD) transfers a single scalar prediction from a large foundation model (FM) to compact vertical models (VMs), suffering from diminishing transfer ratio -- the fraction of FM improvement captured by the VM -- as a single scalar cannot convey the rich intermediate knowledge that larger FMs learn. To address this bottleneck, we propose LoopFM (Learning frOm HistOrical ReP*resentations of FM), a framework that opens a high-bandwidth transfer channel by structuring FM intermediate embeddings as input features (e.g., user history sequence) for downstream VMs, without requiring real-time FM inference at serving and architectural coupling between FM and VM. We provide a theoretical framework for LoopFM with a gain decomposition and transfer-ratio analysis. On three public benchmarks, LoopFM demonstrates strong AUC improvements (e.g., 6\%+ on TaobaoAd) and complementary knowledge transfer capability with KD. On industrial-scale systems (billions of examples, trillion-parameter FMs), LoopFM approximately doubles the knowledge transfer ratio on top of KD, delivering a +0.5\% conversion improvement in Y1H1, and a +1.03\% and +1.22\% conversion improvement from two individual launches respectively in Y1H2. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_29280 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | LoopFM: Learning frOm HistOrical RePresentations of Foundation Model for Recommendation Jiang, Shali Zheng, Hua Liu, Boyang Chen, Laming Lov, Kenny Xu, Chuanqi Ding, Lisang Zhou, Qinghai Cui, Can Liu, Xiaolong Liu, Xiaoyi Badr, Yasmine Xu, Xin Yang, Jiyan Wen, Ellie Dingqiao Akkerhuis, Gerard Jonathan Mugisha Guan, Chenxiao Jin, Rong Qiu, Ruichao Chen, Xian Xu, Shifu Zhou, Zhehui Chen, Ping Yang, Rui Chen, Haicheng Meng, Xiangge Zhou, Song Kharod, Dharak Xu, Shuyu Jin, Qiang Yang, Qiao Zhu, Wankun Huang, Qin Huang, Yuzhen Liu, Darren Aggarwal, Parish Zhou, Hui Wang, Erzhuo Chang, Shuo Gan, Xiaorui Chen, Wenlin Kolay, Santanu Li, Huayu Machine Learning Artificial Intelligence Information Retrieval Knowledge distillation (KD) transfers a single scalar prediction from a large foundation model (FM) to compact vertical models (VMs), suffering from diminishing transfer ratio -- the fraction of FM improvement captured by the VM -- as a single scalar cannot convey the rich intermediate knowledge that larger FMs learn. To address this bottleneck, we propose LoopFM (Learning frOm HistOrical ReP*resentations of FM), a framework that opens a high-bandwidth transfer channel by structuring FM intermediate embeddings as input features (e.g., user history sequence) for downstream VMs, without requiring real-time FM inference at serving and architectural coupling between FM and VM. We provide a theoretical framework for LoopFM with a gain decomposition and transfer-ratio analysis. On three public benchmarks, LoopFM demonstrates strong AUC improvements (e.g., 6\%+ on TaobaoAd) and complementary knowledge transfer capability with KD. On industrial-scale systems (billions of examples, trillion-parameter FMs), LoopFM approximately doubles the knowledge transfer ratio on top of KD, delivering a +0.5\% conversion improvement in Y1H1, and a +1.03\% and +1.22\% conversion improvement from two individual launches respectively in Y1H2. |
| title | LoopFM: Learning frOm HistOrical RePresentations of Foundation Model for Recommendation |
| topic | Machine Learning Artificial Intelligence Information Retrieval |
| url | https://arxiv.org/abs/2605.29280 |