Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.01673 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866916985388924928 |
|---|---|
| author | Zhu, Hanqing Zhou, Zhican Ning, Shupeng Wu, Xuhao Chen, Ray Wan, Yating Pan, David |
| author_facet | Zhu, Hanqing Zhou, Zhican Ning, Shupeng Wu, Xuhao Chen, Ray Wan, Yating Pan, David |
| contents | Photonic computing has emerged as a promising substrate for accelerating the dense linear-algebra operations at the heart of AI, yet adoption for large Transformer models remains in its infancy. We identify two bottlenecks: (1) costly electro--optic conversions and data-movement overheads that erode energy efficiency as model sizes scale; (2) a mismatch between limited on-chip photonic resources and Transformer scale, which forces frequent reuse of photonic tensor cores and dilutes throughput gains. To address these challenges, we introduce a hardware--software co-design framework. First, we propose \texttt{Lighten}, a PTC-aware compression flow that post-hoc decomposes each Transformer weight matrix into a low-rank component plus a structured-sparse component aligned to photonic tensor-core granularity, without lengthy retraining. Second, we present \texttt{ENLighten}, a reconfigurable photonic accelerator with dynamically adaptive tensor cores, driven by broadband light redistribution, enabling fine-grained sparsity support and full power gating of inactive parts. On ImageNet, \texttt{Lighten} prunes a Base-scale Vision Transformer by 50\% with $\approx$1\% accuracy drop after only 3 epochs (about 1 hour) of fine-tuning. Deployed on \texttt{ENLighten}, it achieves a $2.5\times$ improvement in energy--delay product over the state-of-the-art photonic Transformer accelerator. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2510_01673 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | ENLighten: Lighten the Transformer, Enable Efficient Optical Acceleration Zhu, Hanqing Zhou, Zhican Ning, Shupeng Wu, Xuhao Chen, Ray Wan, Yating Pan, David Emerging Technologies Photonic computing has emerged as a promising substrate for accelerating the dense linear-algebra operations at the heart of AI, yet adoption for large Transformer models remains in its infancy. We identify two bottlenecks: (1) costly electro--optic conversions and data-movement overheads that erode energy efficiency as model sizes scale; (2) a mismatch between limited on-chip photonic resources and Transformer scale, which forces frequent reuse of photonic tensor cores and dilutes throughput gains. To address these challenges, we introduce a hardware--software co-design framework. First, we propose \texttt{Lighten}, a PTC-aware compression flow that post-hoc decomposes each Transformer weight matrix into a low-rank component plus a structured-sparse component aligned to photonic tensor-core granularity, without lengthy retraining. Second, we present \texttt{ENLighten}, a reconfigurable photonic accelerator with dynamically adaptive tensor cores, driven by broadband light redistribution, enabling fine-grained sparsity support and full power gating of inactive parts. On ImageNet, \texttt{Lighten} prunes a Base-scale Vision Transformer by 50\% with $\approx$1\% accuracy drop after only 3 epochs (about 1 hour) of fine-tuning. Deployed on \texttt{ENLighten}, it achieves a $2.5\times$ improvement in energy--delay product over the state-of-the-art photonic Transformer accelerator. |
| title | ENLighten: Lighten the Transformer, Enable Efficient Optical Acceleration |
| topic | Emerging Technologies |
| url | https://arxiv.org/abs/2510.01673 |