Saved in:
| Main Authors: | Qin, Tian, Saphra, Naomi, Alvarez-Melis, David |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.04619 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Can Interpretation Predict Behavior on Unseen Data?
by: Li, Victoria R., et al.
Published: (2025)
by: Li, Victoria R., et al.
Published: (2025)
Mechanistic?
by: Saphra, Naomi, et al.
Published: (2024)
by: Saphra, Naomi, et al.
Published: (2024)
Random Scaling of Emergent Capabilities
by: Zhao, Rosie, et al.
Published: (2025)
by: Zhao, Rosie, et al.
Published: (2025)
Fast Forwarding Low-Rank Training
by: Rahamim, Adir, et al.
Published: (2024)
by: Rahamim, Adir, et al.
Published: (2024)
TRAM: Bridging Trust Regions and Sharpness Aware Minimization
by: Sherborne, Tom, et al.
Published: (2023)
by: Sherborne, Tom, et al.
Published: (2023)
Continuous Language Model Interpolation for Dynamic and Controllable Text Generation
by: Kangaslahti, Sara, et al.
Published: (2024)
by: Kangaslahti, Sara, et al.
Published: (2024)
Do Activation Verbalization Methods Convey Privileged Information?
by: Li, Millicent, et al.
Published: (2025)
by: Li, Millicent, et al.
Published: (2025)
Attribute Diversity Determines the Systematicity Gap in VQA
by: Berlot-Attwell, Ian, et al.
Published: (2023)
by: Berlot-Attwell, Ian, et al.
Published: (2023)
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
by: van der Wal, Oskar, et al.
Published: (2025)
by: van der Wal, Oskar, et al.
Published: (2025)
Using Shapley interactions to understand how models use structure
by: Singhvi, Divyansh, et al.
Published: (2024)
by: Singhvi, Divyansh, et al.
Published: (2024)
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?
by: Kim, Jeonghye, et al.
Published: (2026)
by: Kim, Jeonghye, et al.
Published: (2026)
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
by: Shen, Junhong, et al.
Published: (2024)
by: Shen, Junhong, et al.
Published: (2024)
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
by: Qin, Tian, et al.
Published: (2025)
by: Qin, Tian, et al.
Published: (2025)
A Label is Worth a Thousand Images in Dataset Distillation
by: Qin, Tian, et al.
Published: (2024)
by: Qin, Tian, et al.
Published: (2024)
CharED: Character-wise Ensemble Decoding for Large Language Models
by: Gu, Kevin, et al.
Published: (2024)
by: Gu, Kevin, et al.
Published: (2024)
Adapting Language Models via Token Translation
by: Feng, Zhili, et al.
Published: (2024)
by: Feng, Zhili, et al.
Published: (2024)
Adaptation Odyssey in LLMs: Why Does Additional Pretraining Sometimes Fail to Improve?
by: Öncel, Fırat, et al.
Published: (2024)
by: Öncel, Fırat, et al.
Published: (2024)
Why Prompt Optimization Works, and Why It Sometimes Doesn't: A Causal-Inspired Edit-Level Analysis
by: Gong, Shuzhi, et al.
Published: (2026)
by: Gong, Shuzhi, et al.
Published: (2026)
ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining
by: Bal, Melis Ilayda, et al.
Published: (2025)
by: Bal, Melis Ilayda, et al.
Published: (2025)
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
Synthetic Data Generation for Intersectional Fairness by Leveraging Hierarchical Group Structure
by: Maheshwari, Gaurav, et al.
Published: (2024)
by: Maheshwari, Gaurav, et al.
Published: (2024)
Hidden Breakthroughs in Language Model Training
by: Kangaslahti, Sara, et al.
Published: (2025)
by: Kangaslahti, Sara, et al.
Published: (2025)
Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers
by: Ahuja, Kabir, et al.
Published: (2024)
by: Ahuja, Kabir, et al.
Published: (2024)
Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale
by: Rohweder, Jonas, et al.
Published: (2026)
by: Rohweder, Jonas, et al.
Published: (2026)
Let's (not) just put things in Context: Test-Time Training for Long-Context LLMs
by: Bansal, Rachit, et al.
Published: (2025)
by: Bansal, Rachit, et al.
Published: (2025)
Distributional Dataset Distillation with Subtask Decomposition
by: Qin, Tian, et al.
Published: (2024)
by: Qin, Tian, et al.
Published: (2024)
When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents
by: Hadeliya, Tsimur, et al.
Published: (2025)
by: Hadeliya, Tsimur, et al.
Published: (2025)
Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning
by: Lin, Chaofan, et al.
Published: (2025)
by: Lin, Chaofan, et al.
Published: (2025)
HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing
by: He, Zifan, et al.
Published: (2024)
by: He, Zifan, et al.
Published: (2024)
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
by: Huang, Jing, et al.
Published: (2026)
by: Huang, Jing, et al.
Published: (2026)
Hierarchical Embedding Fusion for Retrieval-Augmented Code Generation
by: Sorokin, Nikita, et al.
Published: (2026)
by: Sorokin, Nikita, et al.
Published: (2026)
ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context
by: Li, Victoria R., et al.
Published: (2024)
by: Li, Victoria R., et al.
Published: (2024)
Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasoning and Hierarchical Labeling
by: Li, Anqi, et al.
Published: (2025)
by: Li, Anqi, et al.
Published: (2025)
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
by: Saphra, Naomi, et al.
Published: (2023)
by: Saphra, Naomi, et al.
Published: (2023)
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
by: Zeng, Zhiyuan, et al.
Published: (2025)
by: Zeng, Zhiyuan, et al.
Published: (2025)
HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text Classification
by: Jain, Vidit, et al.
Published: (2024)
by: Jain, Vidit, et al.
Published: (2024)
Unraveling the Mystery of Scaling Laws: Part I
by: Su, Hui, et al.
Published: (2024)
by: Su, Hui, et al.
Published: (2024)
Data Augmentations for Improved (Large) Language Model Generalization
by: Feder, Amir, et al.
Published: (2023)
by: Feder, Amir, et al.
Published: (2023)
Dissecting Linear Recurrent Models: How Different Gating Strategies Drive Selectivity and Generalization
by: Bouhadjar, Younes, et al.
Published: (2026)
by: Bouhadjar, Younes, et al.
Published: (2026)
Instruction Diversity Drives Generalization To Unseen Tasks
by: Zhang, Dylan, et al.
Published: (2024)
by: Zhang, Dylan, et al.
Published: (2024)
Similar Items
-
Can Interpretation Predict Behavior on Unseen Data?
by: Li, Victoria R., et al.
Published: (2025) -
Mechanistic?
by: Saphra, Naomi, et al.
Published: (2024) -
Random Scaling of Emergent Capabilities
by: Zhao, Rosie, et al.
Published: (2025) -
Fast Forwarding Low-Rank Training
by: Rahamim, Adir, et al.
Published: (2024) -
TRAM: Bridging Trust Regions and Sharpness Aware Minimization
by: Sherborne, Tom, et al.
Published: (2023)