Saved in:
| Main Author: | Kale, Sahil |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.18931 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TeXpert: A Multi-Level Benchmark for Evaluating LaTeX Code Generation by LLMs
by: Kale, Sahil, et al.
Published: (2025)
by: Kale, Sahil, et al.
Published: (2025)
Lie to Me: Knowledge Graphs for Robust Hallucination Self-Detection in LLMs
by: Kale, Sahil, et al.
Published: (2025)
by: Kale, Sahil, et al.
Published: (2025)
Line of Duty: Evaluating LLM Self-Knowledge via Consistency in Feasibility Boundaries
by: Kale, Sahil, et al.
Published: (2025)
by: Kale, Sahil, et al.
Published: (2025)
KnowRL: Teaching Language Models to Know What They Know
by: Kale, Sahil, et al.
Published: (2025)
by: Kale, Sahil, et al.
Published: (2025)
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
by: Wei, Hui, et al.
Published: (2025)
by: Wei, Hui, et al.
Published: (2025)
Mirage of Mastery: Memorization Tricks LLMs into Artificially Inflated Self-Knowledge
by: Kale, Sahil
Published: (2025)
by: Kale, Sahil
Published: (2025)
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities
by: Mao, Yujun, et al.
Published: (2024)
by: Mao, Yujun, et al.
Published: (2024)
Does This Look Familiar to You? Knowledge Analysis via Model Internal Representations
by: Park, Sihyun
Published: (2025)
by: Park, Sihyun
Published: (2025)
WebWalker: Benchmarking LLMs in Web Traversal
by: Wu, Jialong, et al.
Published: (2025)
by: Wu, Jialong, et al.
Published: (2025)
Hallucination Detection with the Internal Layers of LLMs
by: Preiß, Martin
Published: (2025)
by: Preiß, Martin
Published: (2025)
A Closer Look into LLMs for Table Understanding
by: Wang, Jia, et al.
Published: (2026)
by: Wang, Jia, et al.
Published: (2026)
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
by: Song, Huatong, et al.
Published: (2025)
by: Song, Huatong, et al.
Published: (2025)
Are Your LLMs Capable of Stable Reasoning?
by: Liu, Junnan, et al.
Published: (2024)
by: Liu, Junnan, et al.
Published: (2024)
Probing the Lack of Stable Internal Beliefs in LLMs
by: Luo, Yifan, et al.
Published: (2026)
by: Luo, Yifan, et al.
Published: (2026)
Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs
by: Kumar, Divyanshu, et al.
Published: (2024)
by: Kumar, Divyanshu, et al.
Published: (2024)
Assessing the Capability of LLMs in Solving POSCOMP Questions
by: Viegas, Cayo, et al.
Published: (2025)
by: Viegas, Cayo, et al.
Published: (2025)
Explore the Reasoning Capability of LLMs in the Chess Testbed
by: Wang, Shu, et al.
Published: (2024)
by: Wang, Shu, et al.
Published: (2024)
DeepInnovator: Triggering the Innovative Capabilities of LLMs
by: Fan, Tianyu, et al.
Published: (2026)
by: Fan, Tianyu, et al.
Published: (2026)
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
by: Clymer, Joshua, et al.
Published: (2024)
by: Clymer, Joshua, et al.
Published: (2024)
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations
by: Ji-An, Li, et al.
Published: (2025)
by: Ji-An, Li, et al.
Published: (2025)
Look Within, Why LLMs Hallucinate: A Causal Perspective
by: Li, He, et al.
Published: (2024)
by: Li, He, et al.
Published: (2024)
RAG-R1: Incentivizing the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
by: Tan, Zhiwen, et al.
Published: (2025)
by: Tan, Zhiwen, et al.
Published: (2025)
Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues
by: Kwon, Deuksin, et al.
Published: (2024)
by: Kwon, Deuksin, et al.
Published: (2024)
Analysing Moral Bias in Finetuned LLMs through Mechanistic Interpretability
by: Raimondi, Bianca, et al.
Published: (2025)
by: Raimondi, Bianca, et al.
Published: (2025)
Benchmarking the Detection of LLMs-Generated Modern Chinese Poetry
by: Wang, Shanshan, et al.
Published: (2025)
by: Wang, Shanshan, et al.
Published: (2025)
The Diminishing Returns of Early-Exit Decoding in Modern LLMs
by: Wei, Rui, et al.
Published: (2026)
by: Wei, Rui, et al.
Published: (2026)
An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs
by: Vyas, Kaustubh, et al.
Published: (2025)
by: Vyas, Kaustubh, et al.
Published: (2025)
Assessing the Performance of Human-Capable LLMs -- Are LLMs Coming for Your Job?
by: Mavi, John, et al.
Published: (2024)
by: Mavi, John, et al.
Published: (2024)
AXCEL: Automated eXplainable Consistency Evaluation using LLMs
by: Sreekar, P Aditya, et al.
Published: (2024)
by: Sreekar, P Aditya, et al.
Published: (2024)
The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities
by: Lalai, Harsh Nishant, et al.
Published: (2025)
by: Lalai, Harsh Nishant, et al.
Published: (2025)
Knowing When to Abstain: Medical LLMs Under Clinical Uncertainty
by: Machcha, Sravanthi, et al.
Published: (2026)
by: Machcha, Sravanthi, et al.
Published: (2026)
When LLMs Team Up: The Emergence of Collaborative Affective Computing
by: Lai, Wenna, et al.
Published: (2025)
by: Lai, Wenna, et al.
Published: (2025)
HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning
by: Jiang, Zhuohang, et al.
Published: (2025)
by: Jiang, Zhuohang, et al.
Published: (2025)
Beyond Words: A Latent Memory Approach to Internal Reasoning in LLMs
by: Orlicki, José I.
Published: (2025)
by: Orlicki, José I.
Published: (2025)
Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
by: Huang, Ziyang, et al.
Published: (2025)
by: Huang, Ziyang, et al.
Published: (2025)
Evaluating Cultural Awareness of LLMs for Yoruba, Malayalam, and English
by: Dawson, Fiifi, et al.
Published: (2024)
by: Dawson, Fiifi, et al.
Published: (2024)
Evaluating the Capabilities of LLMs for Supporting Anticipatory Impact Assessment
by: Allaham, Mowafak, et al.
Published: (2024)
by: Allaham, Mowafak, et al.
Published: (2024)
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
by: Liu, Junpeng, et al.
Published: (2024)
by: Liu, Junpeng, et al.
Published: (2024)
MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs
by: Zhang, Mengyuan, et al.
Published: (2024)
by: Zhang, Mengyuan, et al.
Published: (2024)
FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs
by: Bao, Forrest Sheng, et al.
Published: (2024)
by: Bao, Forrest Sheng, et al.
Published: (2024)
Similar Items
-
TeXpert: A Multi-Level Benchmark for Evaluating LaTeX Code Generation by LLMs
by: Kale, Sahil, et al.
Published: (2025) -
Lie to Me: Knowledge Graphs for Robust Hallucination Self-Detection in LLMs
by: Kale, Sahil, et al.
Published: (2025) -
Line of Duty: Evaluating LLM Self-Knowledge via Consistency in Feasibility Boundaries
by: Kale, Sahil, et al.
Published: (2025) -
KnowRL: Teaching Language Models to Know What They Know
by: Kale, Sahil, et al.
Published: (2025) -
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
by: Wei, Hui, et al.
Published: (2025)