Saved in:
| Main Authors: | Ralambomihanta, Tokiniaina Raharison, Mohammadzadeh, Shahrad, Islam, Mohammad Sami Nur, Jabbour, Wassim, Liang, Laurence |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.17574 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding
by: Biswas, Subrata, et al.
Published: (2025)
by: Biswas, Subrata, et al.
Published: (2025)
Dynamic layer selection in decoder-only transformers
by: Glavas, Theodore, et al.
Published: (2024)
by: Glavas, Theodore, et al.
Published: (2024)
Short Data, Long Context: Distilling Positional Knowledge in Transformers
by: Huber, Patrick, et al.
Published: (2026)
by: Huber, Patrick, et al.
Published: (2026)
Stacking Small Language Models for Generalizability
by: Liang, Laurence
Published: (2024)
by: Liang, Laurence
Published: (2024)
RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language
by: Biswas, Subrata, et al.
Published: (2025)
by: Biswas, Subrata, et al.
Published: (2025)
Learning to Skip the Middle Layers of Transformers
by: Lawson, Tim, et al.
Published: (2025)
by: Lawson, Tim, et al.
Published: (2025)
Detecting Suicidal Ideation in Text with Interpretable Deep Learning: A CNN-BiGRU with Attention Mechanism
by: Bhuiyan, Mohaiminul Islam, et al.
Published: (2025)
by: Bhuiyan, Mohaiminul Islam, et al.
Published: (2025)
CoGate-LSTM: Prototype-Guided Feature-Space Gating for Mitigating Gradient Dilution in Imbalanced Toxic Comment Classification
by: Mohammad, Noor Islam S.
Published: (2025)
by: Mohammad, Noor Islam S.
Published: (2025)
Learning From the Past with Cascading Eligibility Traces
by: Ralambomihanta, Tokiniaina Raharison, et al.
Published: (2025)
by: Ralambomihanta, Tokiniaina Raharison, et al.
Published: (2025)
On the Power of Convolution Augmented Transformer
by: Li, Mingchen, et al.
Published: (2024)
by: Li, Mingchen, et al.
Published: (2024)
Long-context Reference-based MT Quality Estimation
by: Haq, Sami Ul, et al.
Published: (2025)
by: Haq, Sami Ul, et al.
Published: (2025)
LoCoCo: Dropping In Convolutions for Long Context Compression
by: Cai, Ruisi, et al.
Published: (2024)
by: Cai, Ruisi, et al.
Published: (2024)
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
by: Dong, Zican, et al.
Published: (2025)
by: Dong, Zican, et al.
Published: (2025)
Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training
by: Mohammadzadeh, Shahrad, et al.
Published: (2024)
by: Mohammadzadeh, Shahrad, et al.
Published: (2024)
Towards Infinite-Long Prefix in Transformer
by: Liang, Yingyu, et al.
Published: (2024)
by: Liang, Yingyu, et al.
Published: (2024)
Distilling Large Language Models for Text-Attributed Graph Learning
by: Pan, Bo, et al.
Published: (2024)
by: Pan, Bo, et al.
Published: (2024)
Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline
by: Lee, Tony, et al.
Published: (2026)
by: Lee, Tony, et al.
Published: (2026)
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
by: Yan, Shaotian, et al.
Published: (2026)
by: Yan, Shaotian, et al.
Published: (2026)
Core Context Aware Transformers for Long Context Language Modeling
by: Chen, Yaofo, et al.
Published: (2024)
by: Chen, Yaofo, et al.
Published: (2024)
Curse of High Dimensionality Issue in Transformer for Long-context Modeling
by: Zhang, Shuhai, et al.
Published: (2025)
by: Zhang, Shuhai, et al.
Published: (2025)
Unsupervised Domain Adaptation Approaches for Chessboard Recognition
by: Jabbour, Wassim, et al.
Published: (2024)
by: Jabbour, Wassim, et al.
Published: (2024)
LongEmbed: Extending Embedding Models for Long Context Retrieval
by: Zhu, Dawei, et al.
Published: (2024)
by: Zhu, Dawei, et al.
Published: (2024)
BanglaEmbed: Efficient Sentence Embedding Models for a Low-Resource Language Using Cross-Lingual Distillation Techniques
by: Kabir, Muhammad Rafsan, et al.
Published: (2024)
by: Kabir, Muhammad Rafsan, et al.
Published: (2024)
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
by: Zhao, Siyan, et al.
Published: (2026)
by: Zhao, Siyan, et al.
Published: (2026)
GRPO++: Enhancing Dermatological Reasoning under Low Resource Settings
by: Swapnil, Ismam Nur, et al.
Published: (2025)
by: Swapnil, Ismam Nur, et al.
Published: (2025)
TabDistill: Distilling Transformers into Neural Nets for Few-Shot Tabular Classification
by: Dissanayake, Pasan, et al.
Published: (2025)
by: Dissanayake, Pasan, et al.
Published: (2025)
Bangla Grammatical Error Detection Leveraging Transformer-based Token Classification
by: Islam, Shayekh Bin, et al.
Published: (2024)
by: Islam, Shayekh Bin, et al.
Published: (2024)
Multi-modal Anchor Gated Transformer with Knowledge Distillation for Emotion Recognition in Conversation
by: Li, Jie, et al.
Published: (2025)
by: Li, Jie, et al.
Published: (2025)
DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation
by: Maekawa, Aru, et al.
Published: (2024)
by: Maekawa, Aru, et al.
Published: (2024)
Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models
by: Yang, Junjie, et al.
Published: (2025)
by: Yang, Junjie, et al.
Published: (2025)
Transformer-Driven Triple Fusion Framework for Enhanced Multimodal Author Intent Classification in Low-Resource Bangla
by: Islam, Ariful, et al.
Published: (2025)
by: Islam, Ariful, et al.
Published: (2025)
Freely Long-Thinking Transformer (FraiLT)
by: Tabak, Akbay
Published: (2024)
by: Tabak, Akbay
Published: (2024)
The NLP Task Effectiveness of Long-Range Transformers
by: Qin, Guanghui, et al.
Published: (2022)
by: Qin, Guanghui, et al.
Published: (2022)
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
by: Yamada, Yoshihiro
Published: (2025)
by: Yamada, Yoshihiro
Published: (2025)
Compact Language Models via Pruning and Knowledge Distillation
by: Muralidharan, Saurav, et al.
Published: (2024)
by: Muralidharan, Saurav, et al.
Published: (2024)
Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions
by: Fang, Luyang, et al.
Published: (2025)
by: Fang, Luyang, et al.
Published: (2025)
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
by: Li, Zichong, et al.
Published: (2025)
by: Li, Zichong, et al.
Published: (2025)
Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility
by: Murphy, Brendan, et al.
Published: (2025)
by: Murphy, Brendan, et al.
Published: (2025)
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers
by: Song, Woomin, et al.
Published: (2025)
by: Song, Woomin, et al.
Published: (2025)
Summarize-Exemplify-Reflect: Data-driven Insight Distillation Empowers LLMs for Few-shot Tabular Classification
by: Yuan, Yifei, et al.
Published: (2025)
by: Yuan, Yifei, et al.
Published: (2025)
Similar Items
-
QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding
by: Biswas, Subrata, et al.
Published: (2025) -
Dynamic layer selection in decoder-only transformers
by: Glavas, Theodore, et al.
Published: (2024) -
Short Data, Long Context: Distilling Positional Knowledge in Transformers
by: Huber, Patrick, et al.
Published: (2026) -
Stacking Small Language Models for Generalizability
by: Liang, Laurence
Published: (2024) -
RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language
by: Biswas, Subrata, et al.
Published: (2025)