:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Qin, Tian, Saphra, Naomi, Alvarez-Melis, David
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2412.04619
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Can Interpretation Predict Behavior on Unseen Data?
by: Li, Victoria R., et al.
Published: (2025)

Mechanistic?
by: Saphra, Naomi, et al.
Published: (2024)

Random Scaling of Emergent Capabilities
by: Zhao, Rosie, et al.
Published: (2025)

Fast Forwarding Low-Rank Training
by: Rahamim, Adir, et al.
Published: (2024)

TRAM: Bridging Trust Regions and Sharpness Aware Minimization
by: Sherborne, Tom, et al.
Published: (2023)

Continuous Language Model Interpolation for Dynamic and Controllable Text Generation
by: Kangaslahti, Sara, et al.
Published: (2024)

Do Activation Verbalization Methods Convey Privileged Information?
by: Li, Millicent, et al.
Published: (2025)

Attribute Diversity Determines the Systematicity Gap in VQA
by: Berlot-Attwell, Ian, et al.
Published: (2023)

PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
by: van der Wal, Oskar, et al.
Published: (2025)

Using Shapley interactions to understand how models use structure
by: Singhvi, Divyansh, et al.
Published: (2024)

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?
by: Kim, Jeonghye, et al.
Published: (2026)

Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
by: Shen, Junhong, et al.
Published: (2024)

Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
by: Qin, Tian, et al.
Published: (2025)

A Label is Worth a Thousand Images in Dataset Distillation
by: Qin, Tian, et al.
Published: (2024)

CharED: Character-wise Ensemble Decoding for Large Language Models
by: Gu, Kevin, et al.
Published: (2024)

Adapting Language Models via Token Translation
by: Feng, Zhili, et al.
Published: (2024)

Adaptation Odyssey in LLMs: Why Does Additional Pretraining Sometimes Fail to Improve?
by: Öncel, Fırat, et al.
Published: (2024)

Why Prompt Optimization Works, and Why It Sometimes Doesn't: A Causal-Inspired Edit-Level Analysis
by: Gong, Shuzhi, et al.
Published: (2026)

ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining
by: Bal, Melis Ilayda, et al.
Published: (2025)

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
by: Sun, Hanshi, et al.
Published: (2024)

Synthetic Data Generation for Intersectional Fairness by Leveraging Hierarchical Group Structure
by: Maheshwari, Gaurav, et al.
Published: (2024)

Hidden Breakthroughs in Language Model Training
by: Kangaslahti, Sara, et al.
Published: (2025)

Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers
by: Ahuja, Kabir, et al.
Published: (2024)

Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale
by: Rohweder, Jonas, et al.
Published: (2026)

Let's (not) just put things in Context: Test-Time Training for Long-Context LLMs
by: Bansal, Rachit, et al.
Published: (2025)

Distributional Dataset Distillation with Subtask Decomposition
by: Qin, Tian, et al.
Published: (2024)

When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents
by: Hadeliya, Tsimur, et al.
Published: (2025)

Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning
by: Lin, Chaofan, et al.
Published: (2025)

HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing
by: He, Zifan, et al.
Published: (2024)

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
by: Huang, Jing, et al.
Published: (2026)

Hierarchical Embedding Fusion for Retrieval-Augmented Code Generation
by: Sorokin, Nikita, et al.
Published: (2026)

ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context
by: Li, Victoria R., et al.
Published: (2024)

Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasoning and Hierarchical Labeling
by: Li, Anqi, et al.
Published: (2025)

First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
by: Saphra, Naomi, et al.
Published: (2023)

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
by: Zeng, Zhiyuan, et al.
Published: (2025)

HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text Classification
by: Jain, Vidit, et al.
Published: (2024)

Unraveling the Mystery of Scaling Laws: Part I
by: Su, Hui, et al.
Published: (2024)

Data Augmentations for Improved (Large) Language Model Generalization
by: Feder, Amir, et al.
Published: (2023)

Dissecting Linear Recurrent Models: How Different Gating Strategies Drive Selectivity and Generalization
by: Bouhadjar, Younes, et al.
Published: (2026)

Instruction Diversity Drives Generalization To Unseen Tasks
by: Zhang, Dylan, et al.
Published: (2024)