:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mohanty, Dikshya, Hasan, Mohammad Saqib, Monsur, Syed Mostofa, Zheng, Size, Hsiao, Benjamin, Balasubramanian, Niranjan
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.16312
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Syntax Is Easy, Semantics Is Hard: Evaluating LLMs for LTL Translation
by: Danso, Priscilla Kyei, et al.
Published: (2026)

Addressing the Ecological Fallacy in Larger LMs with Human Context
by: Soni, Nikita, et al.
Published: (2026)

ProST: Progressive Sub-task Training for Pareto-Optimal Multi-agent Systems Using Small Language Models
by: Bijoy, Biddut Sarker, et al.
Published: (2025)

Evaluation of LLMs-based Hidden States as Author Representations for Psychological Human-Centered NLP Tasks
by: Soni, Nikita, et al.
Published: (2025)

Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences
by: Hasan, Mohammad Saqib, et al.
Published: (2025)

MuSciClaims: Multimodal Scientific Claim Verification
by: Lal, Yash Kumar, et al.
Published: (2025)

Continual Learning with Global Alignment
by: Bai, Xueying, et al.
Published: (2022)

Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
by: Karia, Rushang, et al.
Published: (2024)

Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference
by: Le-Cong, Thanh, et al.
Published: (2025)

$\texttt{DIAMONDs}$: A Dataset for $\mathbb{D}$ynamic $\mathbb{I}$nformation $\mathbb{A}$nd $\mathbb{M}$ental modeling $\mathbb{O}$f $\mathbb{N}$umeric $\mathbb{D}$iscussions
by: Ghosh, Sayontan, et al.
Published: (2025)

Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?
by: Soni, Nikita, et al.
Published: (2024)

Large Human Language Models: A Need and the Challenges
by: Soni, Nikita, et al.
Published: (2023)

MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
by: Akter, Syeda Nahida, et al.
Published: (2024)

Teaching Transformers Causal Reasoning through Axiomatic Training
by: Vashishtha, Aniket, et al.
Published: (2024)

Counting Without Running: Evaluating LLMs' Reasoning About Code Complexity
by: Bolet, Gregory, et al.
Published: (2025)

MTCMB: A Multi-Task Benchmark Framework for Evaluating LLMs on Knowledge, Reasoning, and Safety in Traditional Chinese Medicine
by: Kong, Shufeng, et al.
Published: (2025)

Thinking About Thinking: Evaluating Reasoning in Post-Trained Language Models
by: Singla, Pratham, et al.
Published: (2025)

BanglaLorica: Design and Evaluation of a Robust Watermarking Algorithm for Large Language Models in Bangla Text Generation
by: Tariqul, Amit Bin, et al.
Published: (2026)

LLMs Can Teach Themselves to Better Predict the Future
by: Turtel, Benjamin, et al.
Published: (2025)

Reasoning Models Will Sometimes Lie About Their Reasoning
by: Walden, William, et al.
Published: (2026)

XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs
by: Kabir, Mohsinul, et al.
Published: (2026)

LLMs for Relational Reasoning: How Far are We?
by: Li, Zhiming, et al.
Published: (2024)

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
by: Wang, Tianle, et al.
Published: (2026)

Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study
by: Ning, Xuefei, et al.
Published: (2024)

Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving
by: Xu, Xin, et al.
Published: (2025)

Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions
by: Murugadoss, Bhuvanashree, et al.
Published: (2024)

LLMs as On-demand Customizable Service
by: Sarkar, Souvika, et al.
Published: (2024)

Zero-Shot Commonsense Validation and Reasoning with Large Language Models: An Evaluation on SemEval-2020 Task 4 Dataset
by: Alfugaha, Rawand, et al.
Published: (2025)

Causal Graph based Event Reasoning using Semantic Relation Experts
by: Koupaee, Mahnaz, et al.
Published: (2025)

Reason2Decide: Rationale-Driven Multi-Task Learning
by: Hasan, H M Quamran, et al.
Published: (2025)

A Longitudinal, Multinational, and Multilingual Corpus of News Coverage of the Russo-Ukrainian War
by: Mohanty, Dikshya, et al.
Published: (2026)

Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization
by: Jin, Keyan, et al.
Published: (2025)

Certificates without Electrons? Theory and Evidence on Impacts from AI-Driven Power Demand
by: Golden, Dana, et al.
Published: (2026)

Towards Autonomous Agents: Adaptive-planning, Reasoning, and Acting in Language Models
by: Dutta, Abhishek, et al.
Published: (2024)

Are LLMs Ready to Replace Bangla Annotators?
by: Hasan, Md. Najib, et al.
Published: (2026)

Frontier LLMs Still Struggle with Simple Reasoning Tasks
by: Malek, Alan, et al.
Published: (2025)

PhysicsEval: Inference-Time Techniques to Improve the Reasoning Proficiency of Large Language Models on Physics Problems
by: Siddique, Oshayer, et al.
Published: (2025)

Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning
by: Lu, Ximing, et al.
Published: (2025)

Quantifying Misattribution Unfairness in Authorship Attribution
by: Alipoormolabashi, Pegah, et al.
Published: (2025)

Munsit at NADI 2025 Shared Task 2: Pushing the Boundaries of Multidialectal Arabic ASR with Weakly Supervised Pretraining and Continual Supervised Fine-tuning
by: Salhab, Mahmoud, et al.
Published: (2025)