Saved in:
| Main Authors: | Mohanty, Dikshya, Hasan, Mohammad Saqib, Monsur, Syed Mostofa, Zheng, Size, Hsiao, Benjamin, Balasubramanian, Niranjan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.16312 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Syntax Is Easy, Semantics Is Hard: Evaluating LLMs for LTL Translation
by: Danso, Priscilla Kyei, et al.
Published: (2026)
by: Danso, Priscilla Kyei, et al.
Published: (2026)
Addressing the Ecological Fallacy in Larger LMs with Human Context
by: Soni, Nikita, et al.
Published: (2026)
by: Soni, Nikita, et al.
Published: (2026)
ProST: Progressive Sub-task Training for Pareto-Optimal Multi-agent Systems Using Small Language Models
by: Bijoy, Biddut Sarker, et al.
Published: (2025)
by: Bijoy, Biddut Sarker, et al.
Published: (2025)
Evaluation of LLMs-based Hidden States as Author Representations for Psychological Human-Centered NLP Tasks
by: Soni, Nikita, et al.
Published: (2025)
by: Soni, Nikita, et al.
Published: (2025)
Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences
by: Hasan, Mohammad Saqib, et al.
Published: (2025)
by: Hasan, Mohammad Saqib, et al.
Published: (2025)
MuSciClaims: Multimodal Scientific Claim Verification
by: Lal, Yash Kumar, et al.
Published: (2025)
by: Lal, Yash Kumar, et al.
Published: (2025)
Continual Learning with Global Alignment
by: Bai, Xueying, et al.
Published: (2022)
by: Bai, Xueying, et al.
Published: (2022)
Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
by: Karia, Rushang, et al.
Published: (2024)
by: Karia, Rushang, et al.
Published: (2024)
Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference
by: Le-Cong, Thanh, et al.
Published: (2025)
by: Le-Cong, Thanh, et al.
Published: (2025)
$\texttt{DIAMONDs}$: A Dataset for $\mathbb{D}$ynamic $\mathbb{I}$nformation $\mathbb{A}$nd $\mathbb{M}$ental modeling $\mathbb{O}$f $\mathbb{N}$umeric $\mathbb{D}$iscussions
by: Ghosh, Sayontan, et al.
Published: (2025)
by: Ghosh, Sayontan, et al.
Published: (2025)
Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?
by: Soni, Nikita, et al.
Published: (2024)
by: Soni, Nikita, et al.
Published: (2024)
Large Human Language Models: A Need and the Challenges
by: Soni, Nikita, et al.
Published: (2023)
by: Soni, Nikita, et al.
Published: (2023)
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
by: Akter, Syeda Nahida, et al.
Published: (2024)
by: Akter, Syeda Nahida, et al.
Published: (2024)
Teaching Transformers Causal Reasoning through Axiomatic Training
by: Vashishtha, Aniket, et al.
Published: (2024)
by: Vashishtha, Aniket, et al.
Published: (2024)
Counting Without Running: Evaluating LLMs' Reasoning About Code Complexity
by: Bolet, Gregory, et al.
Published: (2025)
by: Bolet, Gregory, et al.
Published: (2025)
MTCMB: A Multi-Task Benchmark Framework for Evaluating LLMs on Knowledge, Reasoning, and Safety in Traditional Chinese Medicine
by: Kong, Shufeng, et al.
Published: (2025)
by: Kong, Shufeng, et al.
Published: (2025)
Thinking About Thinking: Evaluating Reasoning in Post-Trained Language Models
by: Singla, Pratham, et al.
Published: (2025)
by: Singla, Pratham, et al.
Published: (2025)
BanglaLorica: Design and Evaluation of a Robust Watermarking Algorithm for Large Language Models in Bangla Text Generation
by: Tariqul, Amit Bin, et al.
Published: (2026)
by: Tariqul, Amit Bin, et al.
Published: (2026)
LLMs Can Teach Themselves to Better Predict the Future
by: Turtel, Benjamin, et al.
Published: (2025)
by: Turtel, Benjamin, et al.
Published: (2025)
Reasoning Models Will Sometimes Lie About Their Reasoning
by: Walden, William, et al.
Published: (2026)
by: Walden, William, et al.
Published: (2026)
XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs
by: Kabir, Mohsinul, et al.
Published: (2026)
by: Kabir, Mohsinul, et al.
Published: (2026)
LLMs for Relational Reasoning: How Far are We?
by: Li, Zhiming, et al.
Published: (2024)
by: Li, Zhiming, et al.
Published: (2024)
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
by: Wang, Tianle, et al.
Published: (2026)
by: Wang, Tianle, et al.
Published: (2026)
Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study
by: Ning, Xuefei, et al.
Published: (2024)
by: Ning, Xuefei, et al.
Published: (2024)
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving
by: Xu, Xin, et al.
Published: (2025)
by: Xu, Xin, et al.
Published: (2025)
Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions
by: Murugadoss, Bhuvanashree, et al.
Published: (2024)
by: Murugadoss, Bhuvanashree, et al.
Published: (2024)
LLMs as On-demand Customizable Service
by: Sarkar, Souvika, et al.
Published: (2024)
by: Sarkar, Souvika, et al.
Published: (2024)
Zero-Shot Commonsense Validation and Reasoning with Large Language Models: An Evaluation on SemEval-2020 Task 4 Dataset
by: Alfugaha, Rawand, et al.
Published: (2025)
by: Alfugaha, Rawand, et al.
Published: (2025)
Causal Graph based Event Reasoning using Semantic Relation Experts
by: Koupaee, Mahnaz, et al.
Published: (2025)
by: Koupaee, Mahnaz, et al.
Published: (2025)
Reason2Decide: Rationale-Driven Multi-Task Learning
by: Hasan, H M Quamran, et al.
Published: (2025)
by: Hasan, H M Quamran, et al.
Published: (2025)
A Longitudinal, Multinational, and Multilingual Corpus of News Coverage of the Russo-Ukrainian War
by: Mohanty, Dikshya, et al.
Published: (2026)
by: Mohanty, Dikshya, et al.
Published: (2026)
Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization
by: Jin, Keyan, et al.
Published: (2025)
by: Jin, Keyan, et al.
Published: (2025)
Certificates without Electrons? Theory and Evidence on Impacts from AI-Driven Power Demand
by: Golden, Dana, et al.
Published: (2026)
by: Golden, Dana, et al.
Published: (2026)
Towards Autonomous Agents: Adaptive-planning, Reasoning, and Acting in Language Models
by: Dutta, Abhishek, et al.
Published: (2024)
by: Dutta, Abhishek, et al.
Published: (2024)
Are LLMs Ready to Replace Bangla Annotators?
by: Hasan, Md. Najib, et al.
Published: (2026)
by: Hasan, Md. Najib, et al.
Published: (2026)
Frontier LLMs Still Struggle with Simple Reasoning Tasks
by: Malek, Alan, et al.
Published: (2025)
by: Malek, Alan, et al.
Published: (2025)
PhysicsEval: Inference-Time Techniques to Improve the Reasoning Proficiency of Large Language Models on Physics Problems
by: Siddique, Oshayer, et al.
Published: (2025)
by: Siddique, Oshayer, et al.
Published: (2025)
Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning
by: Lu, Ximing, et al.
Published: (2025)
by: Lu, Ximing, et al.
Published: (2025)
Quantifying Misattribution Unfairness in Authorship Attribution
by: Alipoormolabashi, Pegah, et al.
Published: (2025)
by: Alipoormolabashi, Pegah, et al.
Published: (2025)
Munsit at NADI 2025 Shared Task 2: Pushing the Boundaries of Multidialectal Arabic ASR with Weakly Supervised Pretraining and Continual Supervised Fine-tuning
by: Salhab, Mahmoud, et al.
Published: (2025)
by: Salhab, Mahmoud, et al.
Published: (2025)
Similar Items
-
Syntax Is Easy, Semantics Is Hard: Evaluating LLMs for LTL Translation
by: Danso, Priscilla Kyei, et al.
Published: (2026) -
Addressing the Ecological Fallacy in Larger LMs with Human Context
by: Soni, Nikita, et al.
Published: (2026) -
ProST: Progressive Sub-task Training for Pareto-Optimal Multi-agent Systems Using Small Language Models
by: Bijoy, Biddut Sarker, et al.
Published: (2025) -
Evaluation of LLMs-based Hidden States as Author Representations for Psychological Human-Centered NLP Tasks
by: Soni, Nikita, et al.
Published: (2025) -
Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences
by: Hasan, Mohammad Saqib, et al.
Published: (2025)