:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Peiqi, Shen, Yikang, Guo, Zhen, Stallone, Matthew, Kim, Yoon, Golland, Polina, Panda, Rameswar
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2402.02318
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Gated Linear Attention Transformers with Hardware-Efficient Training
by: Yang, Songlin, et al.
Published: (2023)

API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
by: Guo, Zhen, et al.
Published: (2024)

Calibrating Expressions of Certainty
by: Wang, Peiqi, et al.
Published: (2024)

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
by: Shen, Yikang, et al.
Published: (2024)

FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference
by: Nrusimha, Aniruddha, et al.
Published: (2025)

PaTH Attention: Position Encoding via Accumulating Householder Transformations
by: Yang, Songlin, et al.
Published: (2025)

Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
by: Tan, Shawn, et al.
Published: (2024)

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
by: Nrusimha, Aniruddha, et al.
Published: (2024)

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
by: Pan, Bowen, et al.
Published: (2024)

Scattered Mixture-of-Experts Implementation
by: Tan, Shawn, et al.
Published: (2024)

Parallelizing Linear Transformers with the Delta Rule over Sequence Length
by: Yang, Songlin, et al.
Published: (2024)

SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning
by: He, Yexiao, et al.
Published: (2024)

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
by: Brandon, William, et al.
Published: (2024)

Data Diversity Matters for Robust Instruction Tuning
by: Bukharin, Alexander, et al.
Published: (2023)

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
by: Kang, Junmo, et al.
Published: (2024)

LCSB: Layer-Cyclic Selective Backpropagation for Memory-Efficient On-Device LLM Fine-Tuning
by: Park, Juneyoung, et al.
Published: (2026)

Instruction Mining: Instruction Data Selection for Tuning Large Language Models
by: Cao, Yihan, et al.
Published: (2023)

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
by: Li, Ming, et al.
Published: (2024)

Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
by: Nayak, Nihal V., et al.
Published: (2024)

Synthetic Data RL: Task Definition Is All You Need
by: Guo, Yiduo, et al.
Published: (2025)

SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection
by: Liu, Liangxin, et al.
Published: (2024)

Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities
by: Dai, Qirun, et al.
Published: (2025)

LESS: Selecting Influential Data for Targeted Instruction Tuning
by: Xia, Mengzhou, et al.
Published: (2024)

ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure
by: Yoon, Hee Suk, et al.
Published: (2023)

MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions
by: Köksal, Abdullatif, et al.
Published: (2024)

TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
by: Xu, Zhangchen, et al.
Published: (2025)

Federated Data-Efficient Instruction Tuning for Large Language Models
by: Qin, Zhen, et al.
Published: (2024)

Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models
by: Stahlberg, Felix, et al.
Published: (2024)

Improving Multilingual Instruction Finetuning via Linguistically Natural and Diverse Datasets
by: Indurthi, Sathish Reddy, et al.
Published: (2024)

Contrastive Instruction Tuning
by: Yan, Tianyi Lorena, et al.
Published: (2024)

Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding
by: Guo, Gabe, et al.
Published: (2025)

Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning
by: Yuan, Zhihang, et al.
Published: (2026)

CoMMIT: Coordinated Multimodal Instruction Tuning
by: Li, Xintong, et al.
Published: (2024)

Generative Representational Instruction Tuning
by: Muennighoff, Niklas, et al.
Published: (2024)

TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data
by: Zhang, Jipeng, et al.
Published: (2024)

DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition
by: Kim, Siun, et al.
Published: (2026)

Finding the Minimal Parameter Budget for Implicit Reasoning: A Data Complexity Driven Scaling Law for Language Models
by: Wang, Xinyi, et al.
Published: (2025)

Multi-Scale Heterogeneous Text-Attributed Graph Datasets From Diverse Domains
by: Liu, Yunhui, et al.
Published: (2024)

Less is More: Rethinking Few-Shot Learning and Recurrent Neural Nets
by: Pereg, Deborah, et al.
Published: (2022)

HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
by: Guo, Haiyang, et al.
Published: (2025)