:: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Kale, Sahil
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.18931
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TeXpert: A Multi-Level Benchmark for Evaluating LaTeX Code Generation by LLMs
by: Kale, Sahil, et al.
Published: (2025)

Lie to Me: Knowledge Graphs for Robust Hallucination Self-Detection in LLMs
by: Kale, Sahil, et al.
Published: (2025)

Line of Duty: Evaluating LLM Self-Knowledge via Consistency in Feasibility Boundaries
by: Kale, Sahil, et al.
Published: (2025)

KnowRL: Teaching Language Models to Know What They Know
by: Kale, Sahil, et al.
Published: (2025)

PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
by: Wei, Hui, et al.
Published: (2025)

Mirage of Mastery: Memorization Tricks LLMs into Artificially Inflated Self-Knowledge
by: Kale, Sahil
Published: (2025)

CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities
by: Mao, Yujun, et al.
Published: (2024)

Does This Look Familiar to You? Knowledge Analysis via Model Internal Representations
by: Park, Sihyun
Published: (2025)

WebWalker: Benchmarking LLMs in Web Traversal
by: Wu, Jialong, et al.
Published: (2025)

Hallucination Detection with the Internal Layers of LLMs
by: Preiß, Martin
Published: (2025)

A Closer Look into LLMs for Table Understanding
by: Wang, Jia, et al.
Published: (2026)

R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
by: Song, Huatong, et al.
Published: (2025)

Are Your LLMs Capable of Stable Reasoning?
by: Liu, Junnan, et al.
Published: (2024)

Probing the Lack of Stable Internal Beliefs in LLMs
by: Luo, Yifan, et al.
Published: (2026)

Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs
by: Kumar, Divyanshu, et al.
Published: (2024)

Assessing the Capability of LLMs in Solving POSCOMP Questions
by: Viegas, Cayo, et al.
Published: (2025)

Explore the Reasoning Capability of LLMs in the Chess Testbed
by: Wang, Shu, et al.
Published: (2024)

DeepInnovator: Triggering the Innovative Capabilities of LLMs
by: Fan, Tianyu, et al.
Published: (2026)

Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
by: Clymer, Joshua, et al.
Published: (2024)

Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations
by: Ji-An, Li, et al.
Published: (2025)

Look Within, Why LLMs Hallucinate: A Causal Perspective
by: Li, He, et al.
Published: (2024)

RAG-R1: Incentivizing the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
by: Tan, Zhiwen, et al.
Published: (2025)

Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues
by: Kwon, Deuksin, et al.
Published: (2024)

Analysing Moral Bias in Finetuned LLMs through Mechanistic Interpretability
by: Raimondi, Bianca, et al.
Published: (2025)

Benchmarking the Detection of LLMs-Generated Modern Chinese Poetry
by: Wang, Shanshan, et al.
Published: (2025)

The Diminishing Returns of Early-Exit Decoding in Modern LLMs
by: Wei, Rui, et al.
Published: (2026)

An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs
by: Vyas, Kaustubh, et al.
Published: (2025)

Assessing the Performance of Human-Capable LLMs -- Are LLMs Coming for Your Job?
by: Mavi, John, et al.
Published: (2024)

AXCEL: Automated eXplainable Consistency Evaluation using LLMs
by: Sreekar, P Aditya, et al.
Published: (2024)

The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities
by: Lalai, Harsh Nishant, et al.
Published: (2025)

Knowing When to Abstain: Medical LLMs Under Clinical Uncertainty
by: Machcha, Sravanthi, et al.
Published: (2026)

When LLMs Team Up: The Emergence of Collaborative Affective Computing
by: Lai, Wenna, et al.
Published: (2025)

HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning
by: Jiang, Zhuohang, et al.
Published: (2025)

Beyond Words: A Latent Memory Approach to Internal Reasoning in LLMs
by: Orlicki, José I.
Published: (2025)

Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
by: Huang, Ziyang, et al.
Published: (2025)

Evaluating Cultural Awareness of LLMs for Yoruba, Malayalam, and English
by: Dawson, Fiifi, et al.
Published: (2024)

Evaluating the Capabilities of LLMs for Supporting Anticipatory Impact Assessment
by: Allaham, Mowafak, et al.
Published: (2024)

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
by: Liu, Junpeng, et al.
Published: (2024)

MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs
by: Zhang, Mengyuan, et al.
Published: (2024)

FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs
by: Bao, Forrest Sheng, et al.
Published: (2024)