:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Srivastava, Saurabh, B, Annarose M, P V, Anto, Menon, Shashank, Sukumar, Ajay, T, Adwaith Samod, Philipose, Alan, Prince, Stevin, Thomas, Sooraj
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2402.19450
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Neurosymbolic Language Reasoning as Satisfiability Modulo Theory
by: Oh, Hyunseok, et al.
Published: (2026)

DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers
by: Menon, Rakesh R., et al.
Published: (2024)

The Point of No Return: Counterfactual Localization of Deceptive Commitment in Language-Model Reasoning
by: Merrill, Scott, et al.
Published: (2026)

An Objective Performance Evaluation of the LSTM Networks in Time Series Classification
by: Sunil, Sooraj, et al.
Published: (2026)

Revisiting Prompt Optimization with Large Reasoning Models-A Case Study on Event Extraction
by: Srivastava, Saurabh, et al.
Published: (2025)

A Causal Lens for Evaluating Faithfulness Metrics
by: Zaman, Kerem, et al.
Published: (2025)

INTERACT: Enabling Interactive, Question-Driven Learning in Large Language Models
by: Kendapadi, Aum, et al.
Published: (2024)

Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges
by: Quan, Pengrui, et al.
Published: (2025)

Evaluating Chain-of-Thought Reasoning through Reusability and Verifiability
by: Aggarwal, Shashank, et al.
Published: (2026)

Continuous Optimization for Decoding Errors
by: Srivastava, Shashank
Published: (2024)

Improved List Size for Folded Reed-Solomon Codes
by: Srivastava, Shashank
Published: (2024)

Voice Evaluation of Reasoning Ability: Diagnosing the Modality-Induced Performance Gap
by: Lin, Yueqian, et al.
Published: (2025)

Real-Time Performance Benchmarking of TinyML Models in Embedded Systems (PICO: Performance of Inference, CPU, and Operations)
by: Dey, Abhishek, et al.
Published: (2025)

Bridging the Arithmetic Gap: The Cognitive Complexity Benchmark and Financial-PoT for Robust Financial Reasoning
by: Zhao, Boxiang, et al.
Published: (2026)

JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation
by: Bi, Zhenyu, et al.
Published: (2025)

Bullous Lung Disease in Turner Syndrome: An Underrecognized Comorbidity?
by: Stevin Lu, et al.
Published: (2024)

Ai-Powered Sales Demand Forecasting and Desicion Support system
by: M, Nithin, et al.
Published: (2026)

MetaCluster: Enabling Deep Compression of Kolmogorov-Arnold Network
by: Raffel, Matthew, et al.
Published: (2025)

Enhancing the Diagnostic Evaluation of Thyroid Functionality Using Diffuse Reflectance Spectroscopy and Regression Models
by: W. Anto Win Shalini, et al.
Published: (2025)

Integrating Performance Tools in Model Reasoning for GPU Kernel Optimization
by: Nichols, Daniel, et al.
Published: (2025)

Pharmacognostical and Preliminary phytochemical evaluation of Seed kernel of Chinchasthi (Tamarindus indica Linn)
by: MS Megha, et al.
Published: (2026)

The intersection of philosophy of language and artificial intelligence: Challenges in replicating human language understanding
by: Sooraj Kumar Maurya
Published: (2024)

Cost Trade-offs of Reasoning and Non-Reasoning Large Language Models in Text-to-SQL
by: Deochake, Saurabh, et al.
Published: (2025)

An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models
by: Sun, Mingzhong, et al.
Published: (2026)

Robustness and Reasoning Fidelity of Large Language Models in Long-Context Code Question Answering
by: Maharaj, Kishan, et al.
Published: (2026)

Enhancing Domain-Specific Retrieval-Augmented Generation: Synthetic Data Generation and Evaluation using Reasoning Models
by: Jadon, Aryan, et al.
Published: (2025)

Therapeutic Potential Of M@B 40 (M = Mg and Ca) Fullerene as a Drug Delivery System for Gemcitabine Anti‐Lung Cancer Drug: A DFT Approach
by: Abisha Nancy Sukumar, et al.
Published: (2025)

Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models
by: Stogiannidis, Ilias, et al.
Published: (2025)

SocialGaze: Improving the Integration of Human Social Norms in Large Language Models
by: Vijjini, Anvesh Rao, et al.
Published: (2024)

LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks
by: Ullah, Saad, et al.
Published: (2023)

A Robust Placeability Metric for Model-Free Unified Pick-and-Place Reasoning
by: Wingender, Benno, et al.
Published: (2025)

ECG-Reasoning-Benchmark: A Benchmark for Evaluating Clinical Reasoning Capabilities in ECG Interpretation
by: Oh, Jungwoo, et al.
Published: (2026)

RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models
by: Wang, Yuqing, et al.
Published: (2024)

Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
by: Karia, Rushang, et al.
Published: (2024)

Benchmarking Reasoning Robustness in Large Language Models
by: Yu, Tong, et al.
Published: (2025)

Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens
by: Mutisya, Fred, et al.
Published: (2025)

Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization
by: Zaman, Kerem, et al.
Published: (2025)

List Decoding Expander-Based Codes up to Capacity in Near-Linear Time
by: Srivastava, Shashank, et al.
Published: (2025)

Point of Order: Action-Aware LLM Persona Modeling for Realistic Civic Simulation
by: Merrill, Scott, et al.
Published: (2025)

MOAT: MobileNet‐Optimized Attention Transfer for Robust and Scalable Dermatology Image Classification
by: Pradeep Radhakrishnan, et al.
Published: (2025)