Saved in:
| Main Author: | Miller, Evan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.00640 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Statistical Multicriteria Evaluation of LLM-Generated Text
by: Arias, Esteban Garces, et al.
Published: (2025)
by: Arias, Esteban Garces, et al.
Published: (2025)
Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation
by: Lum, Kristian, et al.
Published: (2024)
by: Lum, Kristian, et al.
Published: (2024)
Systematic Evaluation of Uncertainty Estimation Methods in Large Language Models
by: Hobelsberger, Christian, et al.
Published: (2025)
by: Hobelsberger, Christian, et al.
Published: (2025)
"All that Glitters": Approaches to Evaluations with Unreliable Model and Human Annotations
by: Hardy, Michael
Published: (2024)
by: Hardy, Michael
Published: (2024)
Bayesian Evaluation of Large Language Model Behavior
by: Longjohn, Rachel, et al.
Published: (2025)
by: Longjohn, Rachel, et al.
Published: (2025)
Does a Large Language Model Really Speak in Human-Like Language?
by: Park, Mose, et al.
Published: (2025)
by: Park, Mose, et al.
Published: (2025)
A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios
by: Ackerman, Samuel, et al.
Published: (2024)
by: Ackerman, Samuel, et al.
Published: (2024)
Auditing the Use of Language Models to Guide Hiring Decisions
by: Gaebler, Johann D., et al.
Published: (2024)
by: Gaebler, Johann D., et al.
Published: (2024)
The Multi-Range Theory of Translation Quality Measurement: MQM scoring models and Statistical Quality Control
by: Lommel, Arle, et al.
Published: (2024)
by: Lommel, Arle, et al.
Published: (2024)
Large Language Models for Full-Text Methods Assessment: A Case Study on Mediation Analysis
by: Zhang, Wenqing, et al.
Published: (2025)
by: Zhang, Wenqing, et al.
Published: (2025)
Advanced Crash Causation Analysis for Freeway Safety: A Large Language Model Approach to Identifying Key Contributing Factors
by: Abdelrahman, Ahmed S., et al.
Published: (2025)
by: Abdelrahman, Ahmed S., et al.
Published: (2025)
LAVA: Language Model Assisted Verbal Autopsy for Cause-of-Death Determination
by: Chen, Yiqun T., et al.
Published: (2025)
by: Chen, Yiqun T., et al.
Published: (2025)
Enhancing Systematic Reviews with Large Language Models: Using GPT-4 and Kimi
by: Kaptur, Dandan Chen, et al.
Published: (2025)
by: Kaptur, Dandan Chen, et al.
Published: (2025)
Judging It, Washing It: Scoring and Greenwashing Corporate Climate Disclosures using Large Language Models
by: Chuang, Marianne, et al.
Published: (2025)
by: Chuang, Marianne, et al.
Published: (2025)
Personalized Prediction of Perceived Message Effectiveness Using Large Language Model Based Digital Twins
by: Han, Jasmin, et al.
Published: (2026)
by: Han, Jasmin, et al.
Published: (2026)
Classification errors distort findings in automated speech processing: examples and solutions from child-development research
by: Gautheron, Lucas, et al.
Published: (2025)
by: Gautheron, Lucas, et al.
Published: (2025)
Gender Inequality in English Textbooks Around the World: an NLP Approach
by: Liu, Tairan
Published: (2025)
by: Liu, Tairan
Published: (2025)
A Latent Dirichlet Allocation (LDA) Semantic Text Analytics Approach to Explore Topical Features in Charity Crowdfunding Campaigns
by: Muzumdar, Prathamesh, et al.
Published: (2024)
by: Muzumdar, Prathamesh, et al.
Published: (2024)
How to Choose a Threshold for an Evaluation Metric for Large Language Models
by: Sarmah, Bhaskarjit, et al.
Published: (2024)
by: Sarmah, Bhaskarjit, et al.
Published: (2024)
Statistical multi-metric evaluation and visualization of LLM system predictive performance
by: Ackerman, Samuel, et al.
Published: (2025)
by: Ackerman, Samuel, et al.
Published: (2025)
Repeated Sequences Reveal Gaps between Large Language Models and Natural Language
by: Tanaka-Ishii, Kumiko
Published: (2026)
by: Tanaka-Ishii, Kumiko
Published: (2026)
Statistics of punctuation in experimental literature -- the remarkable case of "Finnegans Wake" by James Joyce
by: Stanisz, Tomasz, et al.
Published: (2024)
by: Stanisz, Tomasz, et al.
Published: (2024)
Sampling the Swadesh List to Identify Similar Languages with Tree Spaces
by: Ordway, Garett, et al.
Published: (2024)
by: Ordway, Garett, et al.
Published: (2024)
Documents Are People and Words Are Items: A Psychometric Approach to Textual Data with Contextual Embeddings
by: Chen, Jinsong
Published: (2025)
by: Chen, Jinsong
Published: (2025)
Language Markers of Emotion Flexibility Predict Depression and Anxiety Treatment Outcomes
by: Brindle, Benjamin, et al.
Published: (2026)
by: Brindle, Benjamin, et al.
Published: (2026)
Language Hierarchization Provides the Optimal Solution to Human Working Memory Limits
by: Chen, Luyao, et al.
Published: (2026)
by: Chen, Luyao, et al.
Published: (2026)
Metacognitive Myopia in Large Language Models
by: Scholten, Florian, et al.
Published: (2024)
by: Scholten, Florian, et al.
Published: (2024)
A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution
by: Hu, Zhengmian, et al.
Published: (2024)
by: Hu, Zhengmian, et al.
Published: (2024)
TransitGPT: A Generative AI-based framework for interacting with GTFS data using Large Language Models
by: Devunuri, Saipraneeth, et al.
Published: (2024)
by: Devunuri, Saipraneeth, et al.
Published: (2024)
Dynamic Topic Language Model on Heterogeneous Children's Mental Health Clinical Notes
by: Ye, Hanwen, et al.
Published: (2023)
by: Ye, Hanwen, et al.
Published: (2023)
DeepScore: A Comprehensive Approach to Measuring Quality in AI-Generated Clinical Documentation
by: Oleson, Jon
Published: (2024)
by: Oleson, Jon
Published: (2024)
Reliable and Efficient Amortized Model-based Evaluation
by: Truong, Sang, et al.
Published: (2025)
by: Truong, Sang, et al.
Published: (2025)
A Design-based Solution for Causal Inference with Text: Can a Language Model Be Too Large?
by: Tierney, Graham, et al.
Published: (2025)
by: Tierney, Graham, et al.
Published: (2025)
How to Correctly Report LLM-as-a-Judge Evaluations
by: Lee, Chungpa, et al.
Published: (2025)
by: Lee, Chungpa, et al.
Published: (2025)
Limits of Large Language Models in Debating Humans
by: Flamino, James, et al.
Published: (2024)
by: Flamino, James, et al.
Published: (2024)
Emotion Detection with Transformers: A Comparative Study
by: Rezapour, Mahdi
Published: (2024)
by: Rezapour, Mahdi
Published: (2024)
From Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languages
by: Schöffel, Matthias, et al.
Published: (2026)
by: Schöffel, Matthias, et al.
Published: (2026)
Still no evidence for an effect of the proportion of non-native speakers on language complexity -- A response to Kauhanen, Einhaus & Walkden (2023)
by: Koplenig, Alexander
Published: (2023)
by: Koplenig, Alexander
Published: (2023)
Improving Probabilistic Models in Text Classification via Active Learning
by: Bosley, Mitchell, et al.
Published: (2022)
by: Bosley, Mitchell, et al.
Published: (2022)
Domain-Shift-Aware Conformal Prediction for Large Language Models
by: Lin, Zhexiao, et al.
Published: (2025)
by: Lin, Zhexiao, et al.
Published: (2025)
Similar Items
-
Statistical Multicriteria Evaluation of LLM-Generated Text
by: Arias, Esteban Garces, et al.
Published: (2025) -
Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation
by: Lum, Kristian, et al.
Published: (2024) -
Systematic Evaluation of Uncertainty Estimation Methods in Large Language Models
by: Hobelsberger, Christian, et al.
Published: (2025) -
"All that Glitters": Approaches to Evaluations with Unreliable Model and Human Annotations
by: Hardy, Michael
Published: (2024) -
Bayesian Evaluation of Large Language Model Behavior
by: Longjohn, Rachel, et al.
Published: (2025)