:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Miller, Evan
Format:	Preprint
Published:	2024
Subjects:	Applications Computation and Language
Online Access:	https://arxiv.org/abs/2411.00640
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Statistical Multicriteria Evaluation of LLM-Generated Text
by: Arias, Esteban Garces, et al.
Published: (2025)

Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation
by: Lum, Kristian, et al.
Published: (2024)

Systematic Evaluation of Uncertainty Estimation Methods in Large Language Models
by: Hobelsberger, Christian, et al.
Published: (2025)

"All that Glitters": Approaches to Evaluations with Unreliable Model and Human Annotations
by: Hardy, Michael
Published: (2024)

Bayesian Evaluation of Large Language Model Behavior
by: Longjohn, Rachel, et al.
Published: (2025)

Does a Large Language Model Really Speak in Human-Like Language?
by: Park, Mose, et al.
Published: (2025)

A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios
by: Ackerman, Samuel, et al.
Published: (2024)

Auditing the Use of Language Models to Guide Hiring Decisions
by: Gaebler, Johann D., et al.
Published: (2024)

The Multi-Range Theory of Translation Quality Measurement: MQM scoring models and Statistical Quality Control
by: Lommel, Arle, et al.
Published: (2024)

Large Language Models for Full-Text Methods Assessment: A Case Study on Mediation Analysis
by: Zhang, Wenqing, et al.
Published: (2025)

Advanced Crash Causation Analysis for Freeway Safety: A Large Language Model Approach to Identifying Key Contributing Factors
by: Abdelrahman, Ahmed S., et al.
Published: (2025)

LAVA: Language Model Assisted Verbal Autopsy for Cause-of-Death Determination
by: Chen, Yiqun T., et al.
Published: (2025)

Enhancing Systematic Reviews with Large Language Models: Using GPT-4 and Kimi
by: Kaptur, Dandan Chen, et al.
Published: (2025)

Judging It, Washing It: Scoring and Greenwashing Corporate Climate Disclosures using Large Language Models
by: Chuang, Marianne, et al.
Published: (2025)

Personalized Prediction of Perceived Message Effectiveness Using Large Language Model Based Digital Twins
by: Han, Jasmin, et al.
Published: (2026)

Classification errors distort findings in automated speech processing: examples and solutions from child-development research
by: Gautheron, Lucas, et al.
Published: (2025)

Gender Inequality in English Textbooks Around the World: an NLP Approach
by: Liu, Tairan
Published: (2025)

A Latent Dirichlet Allocation (LDA) Semantic Text Analytics Approach to Explore Topical Features in Charity Crowdfunding Campaigns
by: Muzumdar, Prathamesh, et al.
Published: (2024)

How to Choose a Threshold for an Evaluation Metric for Large Language Models
by: Sarmah, Bhaskarjit, et al.
Published: (2024)

Statistical multi-metric evaluation and visualization of LLM system predictive performance
by: Ackerman, Samuel, et al.
Published: (2025)

Repeated Sequences Reveal Gaps between Large Language Models and Natural Language
by: Tanaka-Ishii, Kumiko
Published: (2026)

Statistics of punctuation in experimental literature -- the remarkable case of "Finnegans Wake" by James Joyce
by: Stanisz, Tomasz, et al.
Published: (2024)

Sampling the Swadesh List to Identify Similar Languages with Tree Spaces
by: Ordway, Garett, et al.
Published: (2024)

Documents Are People and Words Are Items: A Psychometric Approach to Textual Data with Contextual Embeddings
by: Chen, Jinsong
Published: (2025)

Language Markers of Emotion Flexibility Predict Depression and Anxiety Treatment Outcomes
by: Brindle, Benjamin, et al.
Published: (2026)

Language Hierarchization Provides the Optimal Solution to Human Working Memory Limits
by: Chen, Luyao, et al.
Published: (2026)

Metacognitive Myopia in Large Language Models
by: Scholten, Florian, et al.
Published: (2024)

A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution
by: Hu, Zhengmian, et al.
Published: (2024)

TransitGPT: A Generative AI-based framework for interacting with GTFS data using Large Language Models
by: Devunuri, Saipraneeth, et al.
Published: (2024)

Dynamic Topic Language Model on Heterogeneous Children's Mental Health Clinical Notes
by: Ye, Hanwen, et al.
Published: (2023)

DeepScore: A Comprehensive Approach to Measuring Quality in AI-Generated Clinical Documentation
by: Oleson, Jon
Published: (2024)

Reliable and Efficient Amortized Model-based Evaluation
by: Truong, Sang, et al.
Published: (2025)

A Design-based Solution for Causal Inference with Text: Can a Language Model Be Too Large?
by: Tierney, Graham, et al.
Published: (2025)

How to Correctly Report LLM-as-a-Judge Evaluations
by: Lee, Chungpa, et al.
Published: (2025)

Limits of Large Language Models in Debating Humans
by: Flamino, James, et al.
Published: (2024)

Emotion Detection with Transformers: A Comparative Study
by: Rezapour, Mahdi
Published: (2024)

From Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languages
by: Schöffel, Matthias, et al.
Published: (2026)

Still no evidence for an effect of the proportion of non-native speakers on language complexity -- A response to Kauhanen, Einhaus & Walkden (2023)
by: Koplenig, Alexander
Published: (2023)

Improving Probabilistic Models in Text Classification via Active Learning
by: Bosley, Mitchell, et al.
Published: (2022)

Domain-Shift-Aware Conformal Prediction for Large Language Models
by: Lin, Zhexiao, et al.
Published: (2025)