Saved in:
| Main Authors: | Liu, Emmy, Bertsch, Amanda, Sutawika, Lintang, Tjuatja, Lindia, Fernandes, Patrick, Marinov, Lara, Chen, Michael, Singhal, Shreya, Lawrence, Carolin, Raghunathan, Aditi, Gashteovski, Kiril, Neubig, Graham |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.03862 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models
by: Tjuatja, Lindia, et al.
Published: (2025)
by: Tjuatja, Lindia, et al.
Published: (2025)
What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length
by: Tjuatja, Lindia, et al.
Published: (2024)
by: Tjuatja, Lindia, et al.
Published: (2024)
Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning
by: Sutawika, Lintang, et al.
Published: (2026)
by: Sutawika, Lintang, et al.
Published: (2026)
What do Language Models Learn and When? The Implicit Curriculum Hypothesis
by: Liu, Emmy, et al.
Published: (2026)
by: Liu, Emmy, et al.
Published: (2026)
Do LLMs exhibit human-like response biases? A case study in survey design
by: Tjuatja, Lindia, et al.
Published: (2023)
by: Tjuatja, Lindia, et al.
Published: (2023)
CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models
by: Sheikh, Zaid, et al.
Published: (2024)
by: Sheikh, Zaid, et al.
Published: (2024)
Better Instruction-Following Through Minimum Bayes Risk
by: Wu, Ian, et al.
Published: (2024)
by: Wu, Ian, et al.
Published: (2024)
GlossLM: A Massively Multilingual Corpus and Pretrained Model for Interlinear Glossed Text
by: Ginn, Michael, et al.
Published: (2024)
by: Ginn, Michael, et al.
Published: (2024)
Compositional Steering of Large Language Models with Steering Tokens
by: Radevski, Gorjan, et al.
Published: (2026)
by: Radevski, Gorjan, et al.
Published: (2026)
Evaluating Language Models as Synthetic Data Generators
by: Kim, Seungone, et al.
Published: (2024)
by: Kim, Seungone, et al.
Published: (2024)
Massively Multilingual Joint Segmentation and Glossing
by: Ginn, Michael, et al.
Published: (2026)
by: Ginn, Michael, et al.
Published: (2026)
Scaling Evaluation-time Compute with Reasoning Models as Evaluators
by: Kim, Seungone, et al.
Published: (2025)
by: Kim, Seungone, et al.
Published: (2025)
MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis
by: Rose, Daniel, et al.
Published: (2025)
by: Rose, Daniel, et al.
Published: (2025)
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents
by: Gioacchini, Luca, et al.
Published: (2024)
by: Gioacchini, Luca, et al.
Published: (2024)
An Incomplete Loop: Instruction Inference, Instruction Following, and In-context Learning in Language Models
by: Liu, Emmy, et al.
Published: (2024)
by: Liu, Emmy, et al.
Published: (2024)
Midtraining Bridges Pretraining and Posttraining Distributions
by: Liu, Emmy, et al.
Published: (2025)
by: Liu, Emmy, et al.
Published: (2025)
Wav2Gloss: Generating Interlinear Glossed Text from Speech
by: He, Taiqi, et al.
Published: (2024)
by: He, Taiqi, et al.
Published: (2024)
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
by: Yue, Xiang, et al.
Published: (2024)
by: Yue, Xiang, et al.
Published: (2024)
Repetition Improves Language Model Embeddings
by: Springer, Jacob Mitchell, et al.
Published: (2024)
by: Springer, Jacob Mitchell, et al.
Published: (2024)
Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities
by: Bertsch, Amanda, et al.
Published: (2025)
by: Bertsch, Amanda, et al.
Published: (2025)
Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
by: Xiao, Emily, et al.
Published: (2025)
by: Xiao, Emily, et al.
Published: (2025)
Multitask Learning Can Improve Worst-Group Outcomes
by: Kulkarni, Atharva, et al.
Published: (2023)
by: Kulkarni, Atharva, et al.
Published: (2023)
Leveraging Open Information Extraction for More Robust Domain Transfer of Event Trigger Detection
by: Dukić, David, et al.
Published: (2023)
by: Dukić, David, et al.
Published: (2023)
Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics
by: Liu, Jiarui, et al.
Published: (2025)
by: Liu, Jiarui, et al.
Published: (2025)
CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents
by: Sutawika, Lintang, et al.
Published: (2026)
by: Sutawika, Lintang, et al.
Published: (2026)
Prompt-MII: Meta-Learning Instruction Induction for LLMs
by: Xiao, Emily, et al.
Published: (2025)
by: Xiao, Emily, et al.
Published: (2025)
TextMineX: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine Action
by: Zhou, Chenyue, et al.
Published: (2025)
by: Zhou, Chenyue, et al.
Published: (2025)
LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization
by: Enomoto, Masafumi, et al.
Published: (2024)
by: Enomoto, Masafumi, et al.
Published: (2024)
Robust Text Classification: Analyzing Prototype-Based Networks
by: Sourati, Zhivar, et al.
Published: (2023)
by: Sourati, Zhivar, et al.
Published: (2023)
In-Context Learning with Long-Context Models: An In-Depth Exploration
by: Bertsch, Amanda, et al.
Published: (2024)
by: Bertsch, Amanda, et al.
Published: (2024)
Divergences between Language Models and Human Brains
by: Zhou, Yuchen, et al.
Published: (2023)
by: Zhou, Yuchen, et al.
Published: (2023)
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
by: Welleck, Sean, et al.
Published: (2024)
by: Welleck, Sean, et al.
Published: (2024)
Overtrained Language Models Are Harder to Fine-Tune
by: Springer, Jacob Mitchell, et al.
Published: (2025)
by: Springer, Jacob Mitchell, et al.
Published: (2025)
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
by: Zhong, Ziqian, et al.
Published: (2025)
by: Zhong, Ziqian, et al.
Published: (2025)
Language Modeling with Editable External Knowledge
by: Li, Belinda Z., et al.
Published: (2024)
by: Li, Belinda Z., et al.
Published: (2024)
Penetrating School Strata through Career Education. Program Evaluation.
by: Lindia, Albert, et al.
Published: (1976)
by: Lindia, Albert, et al.
Published: (1976)
Better Synthetic Data by Retrieving and Transforming Existing Datasets
by: Gandhi, Saumya, et al.
Published: (2024)
by: Gandhi, Saumya, et al.
Published: (2024)
Self-Trained Verification for Training- and Test-Time Self-Improvement
by: Wu, Chen Henry, et al.
Published: (2026)
by: Wu, Chen Henry, et al.
Published: (2026)
CO 2 Electroreduction to CO Using Cu‐Supported NiO Catalyst: XPS Evidence of Redox Interaction Between Metal and Support
by: Akanksha Sharma, et al.
Published: (2025)
by: Akanksha Sharma, et al.
Published: (2025)
Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic
by: Goyal, Sachin, et al.
Published: (2024)
by: Goyal, Sachin, et al.
Published: (2024)
Similar Items
-
BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models
by: Tjuatja, Lindia, et al.
Published: (2025) -
What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length
by: Tjuatja, Lindia, et al.
Published: (2024) -
Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning
by: Sutawika, Lintang, et al.
Published: (2026) -
What do Language Models Learn and When? The Implicit Curriculum Hypothesis
by: Liu, Emmy, et al.
Published: (2026) -
Do LLMs exhibit human-like response biases? A case study in survey design
by: Tjuatja, Lindia, et al.
Published: (2023)