:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ralambomihanta, Tokiniaina Raharison, Mohammadzadeh, Shahrad, Islam, Mohammad Sami Nur, Jabbour, Wassim, Liang, Laurence
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2401.17574
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding
by: Biswas, Subrata, et al.
Published: (2025)

Dynamic layer selection in decoder-only transformers
by: Glavas, Theodore, et al.
Published: (2024)

Short Data, Long Context: Distilling Positional Knowledge in Transformers
by: Huber, Patrick, et al.
Published: (2026)

Stacking Small Language Models for Generalizability
by: Liang, Laurence
Published: (2024)

RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language
by: Biswas, Subrata, et al.
Published: (2025)

Learning to Skip the Middle Layers of Transformers
by: Lawson, Tim, et al.
Published: (2025)

Detecting Suicidal Ideation in Text with Interpretable Deep Learning: A CNN-BiGRU with Attention Mechanism
by: Bhuiyan, Mohaiminul Islam, et al.
Published: (2025)

CoGate-LSTM: Prototype-Guided Feature-Space Gating for Mitigating Gradient Dilution in Imbalanced Toxic Comment Classification
by: Mohammad, Noor Islam S.
Published: (2025)

Learning From the Past with Cascading Eligibility Traces
by: Ralambomihanta, Tokiniaina Raharison, et al.
Published: (2025)

On the Power of Convolution Augmented Transformer
by: Li, Mingchen, et al.
Published: (2024)

Long-context Reference-based MT Quality Estimation
by: Haq, Sami Ul, et al.
Published: (2025)

LoCoCo: Dropping In Convolutions for Long Context Compression
by: Cai, Ruisi, et al.
Published: (2024)

LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
by: Dong, Zican, et al.
Published: (2025)

Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training
by: Mohammadzadeh, Shahrad, et al.
Published: (2024)

Towards Infinite-Long Prefix in Transformer
by: Liang, Yingyu, et al.
Published: (2024)

Distilling Large Language Models for Text-Attributed Graph Learning
by: Pan, Bo, et al.
Published: (2024)

Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline
by: Lee, Tony, et al.
Published: (2026)

Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
by: Yan, Shaotian, et al.
Published: (2026)

Core Context Aware Transformers for Long Context Language Modeling
by: Chen, Yaofo, et al.
Published: (2024)

Curse of High Dimensionality Issue in Transformer for Long-context Modeling
by: Zhang, Shuhai, et al.
Published: (2025)

Unsupervised Domain Adaptation Approaches for Chessboard Recognition
by: Jabbour, Wassim, et al.
Published: (2024)

LongEmbed: Extending Embedding Models for Long Context Retrieval
by: Zhu, Dawei, et al.
Published: (2024)

BanglaEmbed: Efficient Sentence Embedding Models for a Low-Resource Language Using Cross-Lingual Distillation Techniques
by: Kabir, Muhammad Rafsan, et al.
Published: (2024)

Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
by: Zhao, Siyan, et al.
Published: (2026)

GRPO++: Enhancing Dermatological Reasoning under Low Resource Settings
by: Swapnil, Ismam Nur, et al.
Published: (2025)

TabDistill: Distilling Transformers into Neural Nets for Few-Shot Tabular Classification
by: Dissanayake, Pasan, et al.
Published: (2025)

Bangla Grammatical Error Detection Leveraging Transformer-based Token Classification
by: Islam, Shayekh Bin, et al.
Published: (2024)

Multi-modal Anchor Gated Transformer with Knowledge Distillation for Emotion Recognition in Conversation
by: Li, Jie, et al.
Published: (2025)

DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation
by: Maekawa, Aru, et al.
Published: (2024)

Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models
by: Yang, Junjie, et al.
Published: (2025)

Transformer-Driven Triple Fusion Framework for Enhanced Multimodal Author Intent Classification in Low-Resource Bangla
by: Islam, Ariful, et al.
Published: (2025)

Freely Long-Thinking Transformer (FraiLT)
by: Tabak, Akbay
Published: (2024)

The NLP Task Effectiveness of Long-Range Transformers
by: Qin, Guanghui, et al.
Published: (2022)

CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
by: Yamada, Yoshihiro
Published: (2025)

Compact Language Models via Pruning and Knowledge Distillation
by: Muralidharan, Saurav, et al.
Published: (2024)

Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions
by: Fang, Luyang, et al.
Published: (2025)

SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
by: Li, Zichong, et al.
Published: (2025)

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility
by: Murphy, Brendan, et al.
Published: (2025)

Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers
by: Song, Woomin, et al.
Published: (2025)

Summarize-Exemplify-Reflect: Data-driven Insight Distillation Empowers LLMs for Few-shot Tabular Classification
by: Yuan, Yifei, et al.
Published: (2025)