:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Emmy, Bertsch, Amanda, Sutawika, Lintang, Tjuatja, Lindia, Fernandes, Patrick, Marinov, Lara, Chen, Michael, Singhal, Shreya, Lawrence, Carolin, Raghunathan, Aditi, Gashteovski, Kiril, Neubig, Graham
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2503.03862
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models
by: Tjuatja, Lindia, et al.
Published: (2025)

What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length
by: Tjuatja, Lindia, et al.
Published: (2024)

Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning
by: Sutawika, Lintang, et al.
Published: (2026)

What do Language Models Learn and When? The Implicit Curriculum Hypothesis
by: Liu, Emmy, et al.
Published: (2026)

Do LLMs exhibit human-like response biases? A case study in survey design
by: Tjuatja, Lindia, et al.
Published: (2023)

CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models
by: Sheikh, Zaid, et al.
Published: (2024)

Better Instruction-Following Through Minimum Bayes Risk
by: Wu, Ian, et al.
Published: (2024)

GlossLM: A Massively Multilingual Corpus and Pretrained Model for Interlinear Glossed Text
by: Ginn, Michael, et al.
Published: (2024)

Compositional Steering of Large Language Models with Steering Tokens
by: Radevski, Gorjan, et al.
Published: (2026)

Evaluating Language Models as Synthetic Data Generators
by: Kim, Seungone, et al.
Published: (2024)

Massively Multilingual Joint Segmentation and Glossing
by: Ginn, Michael, et al.
Published: (2026)

Scaling Evaluation-time Compute with Reasoning Models as Evaluators
by: Kim, Seungone, et al.
Published: (2025)

MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis
by: Rose, Daniel, et al.
Published: (2025)

AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents
by: Gioacchini, Luca, et al.
Published: (2024)

An Incomplete Loop: Instruction Inference, Instruction Following, and In-context Learning in Language Models
by: Liu, Emmy, et al.
Published: (2024)

Midtraining Bridges Pretraining and Posttraining Distributions
by: Liu, Emmy, et al.
Published: (2025)

Wav2Gloss: Generating Interlinear Glossed Text from Speech
by: He, Taiqi, et al.
Published: (2024)

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
by: Yue, Xiang, et al.
Published: (2024)

Repetition Improves Language Model Embeddings
by: Springer, Jacob Mitchell, et al.
Published: (2024)

Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities
by: Bertsch, Amanda, et al.
Published: (2025)

Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
by: Xiao, Emily, et al.
Published: (2025)

Multitask Learning Can Improve Worst-Group Outcomes
by: Kulkarni, Atharva, et al.
Published: (2023)

Leveraging Open Information Extraction for More Robust Domain Transfer of Event Trigger Detection
by: Dukić, David, et al.
Published: (2023)

Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics
by: Liu, Jiarui, et al.
Published: (2025)

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents
by: Sutawika, Lintang, et al.
Published: (2026)

Prompt-MII: Meta-Learning Instruction Induction for LLMs
by: Xiao, Emily, et al.
Published: (2025)

TextMineX: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine Action
by: Zhou, Chenyue, et al.
Published: (2025)

LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization
by: Enomoto, Masafumi, et al.
Published: (2024)

Robust Text Classification: Analyzing Prototype-Based Networks
by: Sourati, Zhivar, et al.
Published: (2023)

In-Context Learning with Long-Context Models: An In-Depth Exploration
by: Bertsch, Amanda, et al.
Published: (2024)

Divergences between Language Models and Human Brains
by: Zhou, Yuchen, et al.
Published: (2023)

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
by: Welleck, Sean, et al.
Published: (2024)

Overtrained Language Models Are Harder to Fine-Tune
by: Springer, Jacob Mitchell, et al.
Published: (2025)

Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
by: Zhong, Ziqian, et al.
Published: (2025)

Language Modeling with Editable External Knowledge
by: Li, Belinda Z., et al.
Published: (2024)

Penetrating School Strata through Career Education. Program Evaluation.
by: Lindia, Albert, et al.
Published: (1976)

Better Synthetic Data by Retrieving and Transforming Existing Datasets
by: Gandhi, Saumya, et al.
Published: (2024)

Self-Trained Verification for Training- and Test-Time Self-Improvement
by: Wu, Chen Henry, et al.
Published: (2026)

CO 2 Electroreduction to CO Using Cu‐Supported NiO Catalyst: XPS Evidence of Redox Interaction Between Metal and Support
by: Akanksha Sharma, et al.
Published: (2025)

Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic
by: Goyal, Sachin, et al.
Published: (2024)