:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Elazar, Yanai, Paranjape, Bhargavi, Peng, Hao, Wiegreffe, Sarah, Raghavi, Khyathi, Srikumar, Vivek, Singh, Sameer, Smith, Noah A.
Format:	Preprint
Published:	2023
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2311.09605
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On Linear Representations and Pretraining Data Frequency in Language Models
by: Merullo, Jack, et al.
Published: (2025)

Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG
by: Merrill, William, et al.
Published: (2024)

Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior
by: Nadkarni, Rahul, et al.
Published: (2025)

Estimating the Causal Effect of Early ArXiving on Paper Acceptance
by: Elazar, Yanai, et al.
Published: (2023)

LLM-Generated or Human-Written? Comparing Review and Non-Review Papers on ArXiv
by: Elazar, Yanai, et al.
Published: (2026)

Detection and Measurement of Syntactic Templates in Generated Text
by: Shaib, Chantal, et al.
Published: (2024)

Test-Time Scaling with Repeated Sampling Improves Multilingual Text Generation
by: Gupta, Ashim, et al.
Published: (2025)

Reinforcing Code Generation: Improving Text-to-SQL with Execution-Based Learning
by: Kulkarni, Atharv, et al.
Published: (2025)

Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models
by: Sicilia, Anthony, et al.
Published: (2024)

Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
by: Iluz, Bar, et al.
Published: (2024)

What's In My Big Data?
by: Elazar, Yanai, et al.
Published: (2023)

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
by: Kim, Joongwon, et al.
Published: (2024)

Mechanistic?
by: Saphra, Naomi, et al.
Published: (2024)

InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis
by: Bentham, Oliver, et al.
Published: (2026)

Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate
by: Gupta, Ashim, et al.
Published: (2025)

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
by: Miranda, Lester James V., et al.
Published: (2024)

Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
by: Srinivasan, Tejas, et al.
Published: (2024)

Optimizing Pretraining Data Mixtures with LLM-Estimated Utility
by: Held, William, et al.
Published: (2025)

Better Aligned with Survey Respondents or Training Data? Unveiling Political Leanings of LLMs on U.S. Supreme Court Cases
by: Xu, Shanshan, et al.
Published: (2025)

CounterBench: Evaluating and Improving Counterfactual Reasoning in Large Language Models
by: Chen, Yuefei, et al.
Published: (2025)

Continual Dialogue State Tracking via Example-Guided Question Answering
by: Cho, Hyundong, et al.
Published: (2023)

Understanding the Logic of Direct Preference Alignment through Logic
by: Richardson, Kyle, et al.
Published: (2024)

Promptly Predicting Structures: The Return of Inference
by: Mehta, Maitrey, et al.
Published: (2024)

The Art of Saying No: Contextual Noncompliance in Language Models
by: Brahman, Faeze, et al.
Published: (2024)

What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal
by: Cheng, Stephen, et al.
Published: (2026)

LLM-Symbolic Integration for Robust Temporal Tabular Reasoning
by: Kulkarni, Atharv, et al.
Published: (2025)

Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs
by: Ravisankar, Kartik, et al.
Published: (2025)

Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
by: Chandu, Khyathi Raghavi, et al.
Published: (2024)

In-Context Example Ordering Guided by Label Distributions
by: Xu, Zhichao, et al.
Published: (2024)

Distillation versus Contrastive Learning: How to Train Your Rerankers
by: Xu, Zhichao, et al.
Published: (2025)

State Space Models are Strong Text Rerankers
by: Xu, Zhichao, et al.
Published: (2024)

The Unreasonable Effectiveness of Easy Training Data for Hard Tasks
by: Hase, Peter, et al.
Published: (2024)

Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data
by: Wang, Xinyi, et al.
Published: (2024)

An Empirical Investigation of Matrix Factorization Methods for Pre-trained Transformers
by: Gupta, Ashim, et al.
Published: (2024)

Are you going to finish that? A Practical Study of the Partial Token Problem
by: Xu, Hao, et al.
Published: (2026)

Everything is Plausible: Investigating the Impact of LLM Rationales on Human Notions of Plausibility
by: Palta, Shramay, et al.
Published: (2025)

Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
by: Gupta, Ashim, et al.
Published: (2023)

Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression
by: Xu, Zhichao, et al.
Published: (2024)

Defragmenting Language Models: An Interpretability-based Approach for Vocabulary Expansion
by: Mehta, Maitrey, et al.
Published: (2026)

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
by: Wiegreffe, Sarah, et al.
Published: (2024)