:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Xiang Lisa, Kaiyom, Farzaan, Liu, Evan Zheran, Mai, Yifan, Liang, Percy, Hashimoto, Tatsunori
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2407.08351
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On the Learnability of Watermarks for Language Models
by: Gu, Chenchen, et al.
Published: (2023)

Auditing Prompt Caching in Language Model APIs
by: Gu, Chenchen, et al.
Published: (2025)

Robust Distortion-free Watermarks for Language Models
by: Kuditipudi, Rohith, et al.
Published: (2023)

Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
by: Dubois, Yann, et al.
Published: (2024)

Eliciting Language Model Behaviors with Investigator Agents
by: Li, Xiang Lisa, et al.
Published: (2025)

ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation
by: Liu, Qin, et al.
Published: (2025)

s1: Simple test-time scaling
by: Muennighoff, Niklas, et al.
Published: (2025)

Language Models with Conformal Factuality Guarantees
by: Mohri, Christopher, et al.
Published: (2024)

Understanding Finetuning for Factual Knowledge Extraction
by: Ghosal, Gaurav, et al.
Published: (2024)

Improving Pretraining Data Using Perplexity Correlations
by: Thrush, Tristan, et al.
Published: (2024)

AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
by: Dubois, Yann, et al.
Published: (2023)

Evaluating Self-Supervised Learning via Risk Decomposition
by: Dubois, Yann, et al.
Published: (2023)

Linguistic Calibration of Long-Form Generations
by: Band, Neil, et al.
Published: (2024)

Observational Scaling Laws and the Predictability of Language Model Performance
by: Ruan, Yangjun, et al.
Published: (2024)

Towards Execution-Grounded Automated AI Research
by: Si, Chenglei, et al.
Published: (2026)

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
by: Si, Chenglei, et al.
Published: (2024)

The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
by: Si, Chenglei, et al.
Published: (2025)

Putting It All into Context: Simplifying Agents with LCLMs
by: Jiang, Mingjian, et al.
Published: (2025)

Synthetic continued pretraining
by: Yang, Zitong, et al.
Published: (2024)

Reasoning to Learn from Latent Thoughts
by: Ruan, Yangjun, et al.
Published: (2025)

Pre-training under infinite compute
by: Kim, Konwoo, et al.
Published: (2025)

Agentic Adversarial QA for Improving Domain-Specific LLMs
by: Grari, Vincent, et al.
Published: (2026)

Graph-based Uncertainty Metrics for Long-form Language Model Outputs
by: Jiang, Mingjian, et al.
Published: (2024)

Out-of-Domain Robustness via Targeted Augmentations
by: Gao, Irena, et al.
Published: (2023)

Replaying pre-training data improves fine-tuning
by: Kotha, Suhas, et al.
Published: (2026)

Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline
by: Lee, Tony, et al.
Published: (2026)

AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines
by: Zeng, Xintan, et al.
Published: (2026)

Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization
by: Papenmeier, Leonard, et al.
Published: (2025)

Independence Tests for Language Models
by: Zhu, Sally, et al.
Published: (2025)

CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition
by: Bartelds, Martijn, et al.
Published: (2025)

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
by: Liu, Hong, et al.
Published: (2023)

On the Entropy Calibration of Language Models
by: Cao, Steven, et al.
Published: (2025)

CEQuest: Benchmarking Large Language Models for Construction Estimation
by: Wu, Yanzhao, et al.
Published: (2025)

Synthetic Data for any Differentiable Target
by: Thrush, Tristan, et al.
Published: (2026)

Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective
by: Wen, Kaiyue, et al.
Published: (2024)

Temporal Entailment Pretraining for Clinical Language Models over EHR Data
by: Tanaka, Tatsunori, et al.
Published: (2025)

Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization
by: Gong, Zixuan, et al.
Published: (2025)

Data-efficient pre-training by scaling synthetic megadocs
by: Kim, Konwoo, et al.
Published: (2026)

FSPO: Few-Shot Optimization of Synthetic Preferences Personalizes to Real Users
by: Singh, Anikait, et al.
Published: (2025)

Knowledge Graph Construction in Power Distribution Networks
by: Li, Xiang, et al.
Published: (2023)