Saved in:
| Main Authors: | Wang, Peiqi, Shen, Yikang, Guo, Zhen, Stallone, Matthew, Kim, Yoon, Golland, Polina, Panda, Rameswar |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.02318 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Gated Linear Attention Transformers with Hardware-Efficient Training
by: Yang, Songlin, et al.
Published: (2023)
by: Yang, Songlin, et al.
Published: (2023)
API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
by: Guo, Zhen, et al.
Published: (2024)
by: Guo, Zhen, et al.
Published: (2024)
Calibrating Expressions of Certainty
by: Wang, Peiqi, et al.
Published: (2024)
by: Wang, Peiqi, et al.
Published: (2024)
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
by: Shen, Yikang, et al.
Published: (2024)
by: Shen, Yikang, et al.
Published: (2024)
FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference
by: Nrusimha, Aniruddha, et al.
Published: (2025)
by: Nrusimha, Aniruddha, et al.
Published: (2025)
PaTH Attention: Position Encoding via Accumulating Householder Transformations
by: Yang, Songlin, et al.
Published: (2025)
by: Yang, Songlin, et al.
Published: (2025)
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
by: Tan, Shawn, et al.
Published: (2024)
by: Tan, Shawn, et al.
Published: (2024)
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
by: Nrusimha, Aniruddha, et al.
Published: (2024)
by: Nrusimha, Aniruddha, et al.
Published: (2024)
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
by: Pan, Bowen, et al.
Published: (2024)
by: Pan, Bowen, et al.
Published: (2024)
Scattered Mixture-of-Experts Implementation
by: Tan, Shawn, et al.
Published: (2024)
by: Tan, Shawn, et al.
Published: (2024)
Parallelizing Linear Transformers with the Delta Rule over Sequence Length
by: Yang, Songlin, et al.
Published: (2024)
by: Yang, Songlin, et al.
Published: (2024)
SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning
by: He, Yexiao, et al.
Published: (2024)
by: He, Yexiao, et al.
Published: (2024)
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
by: Brandon, William, et al.
Published: (2024)
by: Brandon, William, et al.
Published: (2024)
Data Diversity Matters for Robust Instruction Tuning
by: Bukharin, Alexander, et al.
Published: (2023)
by: Bukharin, Alexander, et al.
Published: (2023)
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
by: Kang, Junmo, et al.
Published: (2024)
by: Kang, Junmo, et al.
Published: (2024)
LCSB: Layer-Cyclic Selective Backpropagation for Memory-Efficient On-Device LLM Fine-Tuning
by: Park, Juneyoung, et al.
Published: (2026)
by: Park, Juneyoung, et al.
Published: (2026)
Instruction Mining: Instruction Data Selection for Tuning Large Language Models
by: Cao, Yihan, et al.
Published: (2023)
by: Cao, Yihan, et al.
Published: (2023)
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
by: Li, Ming, et al.
Published: (2024)
by: Li, Ming, et al.
Published: (2024)
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
by: Nayak, Nihal V., et al.
Published: (2024)
by: Nayak, Nihal V., et al.
Published: (2024)
Synthetic Data RL: Task Definition Is All You Need
by: Guo, Yiduo, et al.
Published: (2025)
by: Guo, Yiduo, et al.
Published: (2025)
SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection
by: Liu, Liangxin, et al.
Published: (2024)
by: Liu, Liangxin, et al.
Published: (2024)
Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities
by: Dai, Qirun, et al.
Published: (2025)
by: Dai, Qirun, et al.
Published: (2025)
LESS: Selecting Influential Data for Targeted Instruction Tuning
by: Xia, Mengzhou, et al.
Published: (2024)
by: Xia, Mengzhou, et al.
Published: (2024)
ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure
by: Yoon, Hee Suk, et al.
Published: (2023)
by: Yoon, Hee Suk, et al.
Published: (2023)
MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions
by: Köksal, Abdullatif, et al.
Published: (2024)
by: Köksal, Abdullatif, et al.
Published: (2024)
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
by: Xu, Zhangchen, et al.
Published: (2025)
by: Xu, Zhangchen, et al.
Published: (2025)
Federated Data-Efficient Instruction Tuning for Large Language Models
by: Qin, Zhen, et al.
Published: (2024)
by: Qin, Zhen, et al.
Published: (2024)
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models
by: Stahlberg, Felix, et al.
Published: (2024)
by: Stahlberg, Felix, et al.
Published: (2024)
Improving Multilingual Instruction Finetuning via Linguistically Natural and Diverse Datasets
by: Indurthi, Sathish Reddy, et al.
Published: (2024)
by: Indurthi, Sathish Reddy, et al.
Published: (2024)
Contrastive Instruction Tuning
by: Yan, Tianyi Lorena, et al.
Published: (2024)
by: Yan, Tianyi Lorena, et al.
Published: (2024)
Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding
by: Guo, Gabe, et al.
Published: (2025)
by: Guo, Gabe, et al.
Published: (2025)
Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning
by: Yuan, Zhihang, et al.
Published: (2026)
by: Yuan, Zhihang, et al.
Published: (2026)
CoMMIT: Coordinated Multimodal Instruction Tuning
by: Li, Xintong, et al.
Published: (2024)
by: Li, Xintong, et al.
Published: (2024)
Generative Representational Instruction Tuning
by: Muennighoff, Niklas, et al.
Published: (2024)
by: Muennighoff, Niklas, et al.
Published: (2024)
TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data
by: Zhang, Jipeng, et al.
Published: (2024)
by: Zhang, Jipeng, et al.
Published: (2024)
DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition
by: Kim, Siun, et al.
Published: (2026)
by: Kim, Siun, et al.
Published: (2026)
Finding the Minimal Parameter Budget for Implicit Reasoning: A Data Complexity Driven Scaling Law for Language Models
by: Wang, Xinyi, et al.
Published: (2025)
by: Wang, Xinyi, et al.
Published: (2025)
Multi-Scale Heterogeneous Text-Attributed Graph Datasets From Diverse Domains
by: Liu, Yunhui, et al.
Published: (2024)
by: Liu, Yunhui, et al.
Published: (2024)
Less is More: Rethinking Few-Shot Learning and Recurrent Neural Nets
by: Pereg, Deborah, et al.
Published: (2022)
by: Pereg, Deborah, et al.
Published: (2022)
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
by: Guo, Haiyang, et al.
Published: (2025)
by: Guo, Haiyang, et al.
Published: (2025)
Similar Items
-
Gated Linear Attention Transformers with Hardware-Efficient Training
by: Yang, Songlin, et al.
Published: (2023) -
API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
by: Guo, Zhen, et al.
Published: (2024) -
Calibrating Expressions of Certainty
by: Wang, Peiqi, et al.
Published: (2024) -
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
by: Shen, Yikang, et al.
Published: (2024) -
FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference
by: Nrusimha, Aniruddha, et al.
Published: (2025)