Saved in:
| Main Authors: | Choudhury, Manan Roy, Chandramouli, Adithya, Anand, Mannan, Gupta, Vivek |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.00340 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles
by: Malarkkan, Arun Vignesh, et al.
Published: (2026)
by: Malarkkan, Arun Vignesh, et al.
Published: (2026)
CLAUSE: Agentic Neuro-Symbolic Knowledge Graph Reasoning via Dynamic Learnable Context Engineering
by: Zhao, Yang, et al.
Published: (2025)
by: Zhao, Yang, et al.
Published: (2025)
DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval
by: Nguyen, Hai-Long, et al.
Published: (2024)
by: Nguyen, Hai-Long, et al.
Published: (2024)
CIRCUIT: A Benchmark for Circuit Interpretation and Reasoning Capabilities of LLMs
by: Skelic, Lejla, et al.
Published: (2025)
by: Skelic, Lejla, et al.
Published: (2025)
Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities
by: Bertsch, Amanda, et al.
Published: (2025)
by: Bertsch, Amanda, et al.
Published: (2025)
Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges
by: Quan, Pengrui, et al.
Published: (2025)
by: Quan, Pengrui, et al.
Published: (2025)
Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues Test
by: Khandelwal, Aditi, et al.
Published: (2024)
by: Khandelwal, Aditi, et al.
Published: (2024)
HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning
by: Jiang, Zhuohang, et al.
Published: (2025)
by: Jiang, Zhuohang, et al.
Published: (2025)
Small Language Models for Agentic Systems: A Survey of Architectures, Capabilities, and Deployment Trade offs
by: Sharma, Raghav, et al.
Published: (2025)
by: Sharma, Raghav, et al.
Published: (2025)
Indian Legal NLP Benchmarks : A Survey
by: Kalamkar, Prathamesh, et al.
Published: (2021)
by: Kalamkar, Prathamesh, et al.
Published: (2021)
Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases
by: Gan, Eric, et al.
Published: (2026)
by: Gan, Eric, et al.
Published: (2026)
Benchmarking the Legal Reasoning of LLMs in Arabic Islamic Inheritance Cases
by: AlDahoul, Nouar, et al.
Published: (2025)
by: AlDahoul, Nouar, et al.
Published: (2025)
Auditing the Ethical Logic of Generative AI Models
by: Neuman, W. Russell, et al.
Published: (2025)
by: Neuman, W. Russell, et al.
Published: (2025)
Better Accuracies, Worse Reasoning: A Step-Level Audit of Medical Chain-of-Thought Distillation
by: Jiang, Zhaoyang, et al.
Published: (2026)
by: Jiang, Zhaoyang, et al.
Published: (2026)
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data
by: Liu, Xiao, et al.
Published: (2024)
by: Liu, Xiao, et al.
Published: (2024)
CyberSOCEval: Benchmarking LLMs Capabilities for Malware Analysis and Threat Intelligence Reasoning
by: Deason, Lauren, et al.
Published: (2025)
by: Deason, Lauren, et al.
Published: (2025)
Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations
by: Gupta, Manan, et al.
Published: (2026)
by: Gupta, Manan, et al.
Published: (2026)
Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering
by: Gupta, Manan, et al.
Published: (2026)
by: Gupta, Manan, et al.
Published: (2026)
CHBench: A Cognitive Hierarchy Benchmark for Evaluating Strategic Reasoning Capability of LLMs
by: Liu, Hongtao, et al.
Published: (2025)
by: Liu, Hongtao, et al.
Published: (2025)
Exploring the psychology of LLMs' Moral and Legal Reasoning
by: Almeida, Guilherme F. C. F., et al.
Published: (2023)
by: Almeida, Guilherme F. C. F., et al.
Published: (2023)
Ko-MuSR: A Multistep Soft Reasoning Benchmark for LLMs Capable of Understanding Korean
by: Park, Chanwoo, et al.
Published: (2025)
by: Park, Chanwoo, et al.
Published: (2025)
Structured Definitions and Segmentations for Legal Reasoning in LLMs: A Study on Indian Legal Data
by: Khatri, Mann, et al.
Published: (2025)
by: Khatri, Mann, et al.
Published: (2025)
DIAGRAMS: A Review Framework for Reasoning-Level Attribution in Diagram QA
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
by: Pandya, Pranshu, et al.
Published: (2024)
by: Pandya, Pranshu, et al.
Published: (2024)
Interpretable Emergent Language Using Inter-Agent Transformers
by: Bhardwaj, Mannan
Published: (2025)
by: Bhardwaj, Mannan
Published: (2025)
CoRe: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks
by: Xie, Danning, et al.
Published: (2025)
by: Xie, Danning, et al.
Published: (2025)
Are Your LLMs Capable of Stable Reasoning?
by: Liu, Junnan, et al.
Published: (2024)
by: Liu, Junnan, et al.
Published: (2024)
REDDIX-NET: A Novel Dataset and Benchmark for Moderating Online Explicit Services
by: Sathvik, MSVPJ, et al.
Published: (2025)
by: Sathvik, MSVPJ, et al.
Published: (2025)
RESOLVE: Relational Reasoning with Symbolic and Object-Level Features Using Vector Symbolic Processing
by: Mejri, Mohamed, et al.
Published: (2024)
by: Mejri, Mohamed, et al.
Published: (2024)
Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities
by: Ying, Shuangshuang, et al.
Published: (2026)
by: Ying, Shuangshuang, et al.
Published: (2026)
NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls
by: Basu, Kinjal, et al.
Published: (2024)
by: Basu, Kinjal, et al.
Published: (2024)
Unequal Voices: How LLMs Construct Constrained Queer Narratives
by: Ghosal, Atreya, et al.
Published: (2025)
by: Ghosal, Atreya, et al.
Published: (2025)
Auditing of AI: Legal, Ethical and Technical Approaches
by: Mokander, Jakob
Published: (2024)
by: Mokander, Jakob
Published: (2024)
Dynamic Reasoning Chains through Depth-Specialized Mixture-of-Experts in Transformer Architectures
by: Roy, Sampurna, et al.
Published: (2025)
by: Roy, Sampurna, et al.
Published: (2025)
From Search to Reasoning: A Five-Level RAG Capability Framework for Enterprise Data
by: Gill, Gurbinder, et al.
Published: (2025)
by: Gill, Gurbinder, et al.
Published: (2025)
Explore the Reasoning Capability of LLMs in the Chess Testbed
by: Wang, Shu, et al.
Published: (2024)
by: Wang, Shu, et al.
Published: (2024)
ALARB: An Arabic Legal Argument Reasoning Benchmark
by: Shairah, Harethah Abu, et al.
Published: (2025)
by: Shairah, Harethah Abu, et al.
Published: (2025)
EarthSpatialBench: Benchmarking Spatial Reasoning Capabilities of Multimodal LLMs on Earth Imagery
by: Xu, Zelin, et al.
Published: (2026)
by: Xu, Zelin, et al.
Published: (2026)
Understanding the Geospatial Reasoning Capabilities of LLMs: A Trajectory Recovery Perspective
by: Truong, Thinh Hung, et al.
Published: (2025)
by: Truong, Thinh Hung, et al.
Published: (2025)
Similar Items
-
FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles
by: Malarkkan, Arun Vignesh, et al.
Published: (2026) -
CLAUSE: Agentic Neuro-Symbolic Knowledge Graph Reasoning via Dynamic Learnable Context Engineering
by: Zhao, Yang, et al.
Published: (2025) -
DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026) -
Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval
by: Nguyen, Hai-Long, et al.
Published: (2024) -
CIRCUIT: A Benchmark for Circuit Interpretation and Reasoning Capabilities of LLMs
by: Skelic, Lejla, et al.
Published: (2025)