Saved in:
| Main Author: | Gessler, Luke |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2305.12612 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
OntoURL: A Benchmark for Evaluating Large Language Models on Symbolic Ontological Understanding, Reasoning and Learning
by: Zhang, Xiao, et al.
Published: (2025)
by: Zhang, Xiao, et al.
Published: (2025)
OntoTune: Ontology-Driven Self-training for Aligning Large Language Models
by: Liu, Zhiqiang, et al.
Published: (2025)
by: Liu, Zhiqiang, et al.
Published: (2025)
OntoType: Ontology-Guided and Pre-Trained Language Model Assisted Fine-Grained Entity Typing
by: Komarlu, Tanay, et al.
Published: (2023)
by: Komarlu, Tanay, et al.
Published: (2023)
From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition
by: Yang, Qiyuan, et al.
Published: (2024)
by: Yang, Qiyuan, et al.
Published: (2024)
iPrOp: Interactive Prompt Optimization for Large Language Models with a Human in the Loop
by: Li, Jiahui, et al.
Published: (2024)
by: Li, Jiahui, et al.
Published: (2024)
Evaluating Copyright Takedown Methods for Language Models
by: Wei, Boyi, et al.
Published: (2024)
by: Wei, Boyi, et al.
Published: (2024)
TAMS: Translation-Assisted Morphological Segmentation
by: Rice, Enora, et al.
Published: (2024)
by: Rice, Enora, et al.
Published: (2024)
Re-evaluating Open-ended Evaluation of Large Language Models
by: Liu, Siqi, et al.
Published: (2025)
by: Liu, Siqi, et al.
Published: (2025)
eRST: A Signaled Graph Theory of Discourse Relations and Organization
by: Zeldes, Amir, et al.
Published: (2024)
by: Zeldes, Amir, et al.
Published: (2024)
From Priest to Doctor: Domain Adaptation for Low-Resource Neural Machine Translation
by: Marashian, Ali, et al.
Published: (2024)
by: Marashian, Ali, et al.
Published: (2024)
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
by: Shi, Weijia, et al.
Published: (2024)
by: Shi, Weijia, et al.
Published: (2024)
PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs
by: Hou, Charlie, et al.
Published: (2024)
by: Hou, Charlie, et al.
Published: (2024)
Micro Language Models Enable Instant Responses
by: Cheng, Wen, et al.
Published: (2026)
by: Cheng, Wen, et al.
Published: (2026)
Demystifying Prompts in Language Models via Perplexity Estimation
by: Gonen, Hila, et al.
Published: (2022)
by: Gonen, Hila, et al.
Published: (2022)
Annotation Errors and NER: A Study with OntoNotes 5.0
by: Bernier-Colborne, Gabriel, et al.
Published: (2024)
by: Bernier-Colborne, Gabriel, et al.
Published: (2024)
Evaluating Large Language Models for Radiology Natural Language Processing
by: Liu, Zhengliang, et al.
Published: (2023)
by: Liu, Zhengliang, et al.
Published: (2023)
Performance Evaluation of Tokenizers in Large Language Models for the Assamese Language
by: Tamang, Sagar, et al.
Published: (2024)
by: Tamang, Sagar, et al.
Published: (2024)
Disentangling Language and Culture for Evaluating Multilingual Large Language Models
by: Ying, Jiahao, et al.
Published: (2025)
by: Ying, Jiahao, et al.
Published: (2025)
Pragmatic Competence Evaluation of Large Language Models for the Korean Language
by: Park, Dojun, et al.
Published: (2024)
by: Park, Dojun, et al.
Published: (2024)
Language Shapes Mental Health Evaluations in Large Language Models
by: Xu, Jiayi, et al.
Published: (2026)
by: Xu, Jiayi, et al.
Published: (2026)
Evaluation of Geographical Distortions in Language Models
by: Decoupes, Rémy, et al.
Published: (2024)
by: Decoupes, Rémy, et al.
Published: (2024)
Evaluating Human-Language Model Interaction
by: Lee, Mina, et al.
Published: (2022)
by: Lee, Mina, et al.
Published: (2022)
Evaluating Large Language Models with Psychometrics
by: Li, Yuan, et al.
Published: (2024)
by: Li, Yuan, et al.
Published: (2024)
Evaluating Language Model Character Traits
by: Ward, Francis Rhys, et al.
Published: (2024)
by: Ward, Francis Rhys, et al.
Published: (2024)
Evaluating Language Models' Evaluations of Games
by: Collins, Katherine M., et al.
Published: (2025)
by: Collins, Katherine M., et al.
Published: (2025)
MAGPIE: A benchmark for Multi-AGent contextual PrIvacy Evaluation
by: Juneja, Gurusha, et al.
Published: (2025)
by: Juneja, Gurusha, et al.
Published: (2025)
MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation
by: Juneja, Gurusha, et al.
Published: (2025)
by: Juneja, Gurusha, et al.
Published: (2025)
Evaluating Neural Language Models as Cognitive Models of Language Acquisition
by: Martínez, Héctor Javier Vázquez, et al.
Published: (2023)
by: Martínez, Héctor Javier Vázquez, et al.
Published: (2023)
ConspirED: A Dataset for Cognitive Traits of Conspiracy Theories and Large Language Model Safety
by: Bates, Luke, et al.
Published: (2025)
by: Bates, Luke, et al.
Published: (2025)
Evaluating Metalinguistic Knowledge in Large Language Models across the World's Languages
by: Arčon, Tjaša, et al.
Published: (2026)
by: Arčon, Tjaša, et al.
Published: (2026)
RESPONSE: Benchmarking the Ability of Language Models to Undertake Commonsense Reasoning in Crisis Situation
by: Diallo, Aissatou, et al.
Published: (2025)
by: Diallo, Aissatou, et al.
Published: (2025)
Evaluating Language-Model Agents on Realistic Autonomous Tasks
by: Kinniment, Megan, et al.
Published: (2023)
by: Kinniment, Megan, et al.
Published: (2023)
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
by: Kim, Seungone, et al.
Published: (2024)
by: Kim, Seungone, et al.
Published: (2024)
TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention
by: Duan, Jinhao, et al.
Published: (2025)
by: Duan, Jinhao, et al.
Published: (2025)
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models
by: Diallo, Aissatou, et al.
Published: (2025)
by: Diallo, Aissatou, et al.
Published: (2025)
Hold Onto That Thought: Assessing KV Cache Compression On Reasoning
by: Liu, Minghui, et al.
Published: (2025)
by: Liu, Minghui, et al.
Published: (2025)
Comprehensive Evaluation of Large Language Models for Topic Modeling
by: Doi, Tomoki, et al.
Published: (2024)
by: Doi, Tomoki, et al.
Published: (2024)
AC-EVAL: Evaluating Ancient Chinese Language Understanding in Large Language Models
by: Wei, Yuting, et al.
Published: (2024)
by: Wei, Yuting, et al.
Published: (2024)
A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks
by: Ni, Xuanfan, et al.
Published: (2024)
by: Ni, Xuanfan, et al.
Published: (2024)
Evaluating and Adapting Large Language Models to Represent Folktales in Low-Resource Languages
by: Meaney, JA, et al.
Published: (2024)
by: Meaney, JA, et al.
Published: (2024)
Similar Items
-
OntoURL: A Benchmark for Evaluating Large Language Models on Symbolic Ontological Understanding, Reasoning and Learning
by: Zhang, Xiao, et al.
Published: (2025) -
OntoTune: Ontology-Driven Self-training for Aligning Large Language Models
by: Liu, Zhiqiang, et al.
Published: (2025) -
OntoType: Ontology-Guided and Pre-Trained Language Model Assisted Fine-Grained Entity Typing
by: Komarlu, Tanay, et al.
Published: (2023) -
From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition
by: Yang, Qiyuan, et al.
Published: (2024) -
iPrOp: Interactive Prompt Optimization for Large Language Models with a Human in the Loop
by: Li, Jiahui, et al.
Published: (2024)