Saved in:
| Main Author: | Abbasloo, Soheil |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.02573 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evaluating LLM Reasoning Beyond Correctness and CoT
by: Abbasloo, Soheil
Published: (2025)
by: Abbasloo, Soheil
Published: (2025)
On Speeding Up Language Model Evaluation
by: Zhou, Jin Peng, et al.
Published: (2024)
by: Zhou, Jin Peng, et al.
Published: (2024)
On the Superimposed Noise Accumulation Problem in Sequential Knowledge Editing of Large Language Models
by: Cao, Ding, et al.
Published: (2025)
by: Cao, Ding, et al.
Published: (2025)
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions
by: Li, Yubo, et al.
Published: (2025)
by: Li, Yubo, et al.
Published: (2025)
Beyond Correctness: Learning Robust Reasoning via Transfer
by: Lee, Hyunseok, et al.
Published: (2026)
by: Lee, Hyunseok, et al.
Published: (2026)
Sequential Large Language Model-Based Hyper-parameter Optimization
by: Mahammadli, Kanan, et al.
Published: (2024)
by: Mahammadli, Kanan, et al.
Published: (2024)
Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models
by: Lin, Zihao, et al.
Published: (2024)
by: Lin, Zihao, et al.
Published: (2024)
Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption
by: Wang, Wenxiao, et al.
Published: (2025)
by: Wang, Wenxiao, et al.
Published: (2025)
Language Models as Semantic Augmenters for Sequential Recommenders
by: Valizadeh, Mahsa, et al.
Published: (2025)
by: Valizadeh, Mahsa, et al.
Published: (2025)
Cerberus: Efficient Inference with Adaptive Parallel Decoding and Sequential Knowledge Enhancement
by: Liu, Yuxuan, et al.
Published: (2024)
by: Liu, Yuxuan, et al.
Published: (2024)
Natural Language Satisfiability: Exploring the Problem Distribution and Evaluating Transformer-based Language Models
by: Madusanka, Tharindu, et al.
Published: (2025)
by: Madusanka, Tharindu, et al.
Published: (2025)
Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks
by: Anand, Avinash, et al.
Published: (2024)
by: Anand, Avinash, et al.
Published: (2024)
Reasoning Up the Instruction Ladder for Controllable Language Models
by: Zheng, Zishuo, et al.
Published: (2025)
by: Zheng, Zishuo, et al.
Published: (2025)
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
by: Li, Ming, et al.
Published: (2025)
by: Li, Ming, et al.
Published: (2025)
Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models
by: Fodor, James
Published: (2025)
by: Fodor, James
Published: (2025)
Pedagogically-Inspired Data Synthesis for Language Model Knowledge Distillation
by: He, Bowei, et al.
Published: (2026)
by: He, Bowei, et al.
Published: (2026)
ARE: Scaling Up Agent Environments and Evaluations
by: Froger, Romain, et al.
Published: (2025)
by: Froger, Romain, et al.
Published: (2025)
Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement
by: Ye, Haoran, et al.
Published: (2025)
by: Ye, Haoran, et al.
Published: (2025)
Evaluating and Optimizing Educational Content with Large Language Model Judgments
by: He-Yueya, Joy, et al.
Published: (2024)
by: He-Yueya, Joy, et al.
Published: (2024)
Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement
by: Zhou, Xiaofeng, et al.
Published: (2025)
by: Zhou, Xiaofeng, et al.
Published: (2025)
Theory of Mind in Large Language Models: Assessment and Enhancement
by: Chen, Ruirui, et al.
Published: (2025)
by: Chen, Ruirui, et al.
Published: (2025)
Progressively Label Enhancement for Large Language Model Alignment
by: Liu, Biao, et al.
Published: (2024)
by: Liu, Biao, et al.
Published: (2024)
QUILL: Quotation Generation Enhancement of Large Language Models
by: Xiao, Jin, et al.
Published: (2024)
by: Xiao, Jin, et al.
Published: (2024)
From Phonemes to Meaning: Evaluating Large Language Models on Tamil
by: Varsha, Jeyarajalingam, et al.
Published: (2025)
by: Varsha, Jeyarajalingam, et al.
Published: (2025)
Follow-Up Questions Improve Documents Generated by Large Language Models
by: Tix, Bernadette J
Published: (2024)
by: Tix, Bernadette J
Published: (2024)
Learning Uncertainty from Sequential Internal Dispersion in Large Language Models
by: Srey, Ponhvoan, et al.
Published: (2026)
by: Srey, Ponhvoan, et al.
Published: (2026)
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective
by: Wen, Yuchen, et al.
Published: (2024)
by: Wen, Yuchen, et al.
Published: (2024)
Quantum-Inspired Self-Attention in a Large Language Model
by: Kuznetsov, Nikita, et al.
Published: (2026)
by: Kuznetsov, Nikita, et al.
Published: (2026)
Evaluating Large Language Models on the Frame and Symbol Grounding Problems: A Zero-shot Benchmark
by: Oka, Shoko
Published: (2025)
by: Oka, Shoko
Published: (2025)
Sample-Efficient Language Modeling with Linear Attention and Lightweight Enhancements
by: Haller, Patrick, et al.
Published: (2025)
by: Haller, Patrick, et al.
Published: (2025)
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
by: Gutiérrez, Bernal Jiménez, et al.
Published: (2024)
by: Gutiérrez, Bernal Jiménez, et al.
Published: (2024)
Ensembling Language Models with Sequential Monte Carlo
by: Chan, Robin Shing Moon, et al.
Published: (2026)
by: Chan, Robin Shing Moon, et al.
Published: (2026)
Safety Evaluation and Enhancement of DeepSeek Models in Chinese Contexts
by: Zhang, Wenjing, et al.
Published: (2025)
by: Zhang, Wenjing, et al.
Published: (2025)
The Labyrinth and the Thread: Rethinking Regularizations in Sequential Knowledge Editing for Large Language Models
by: Wang, Zheng, et al.
Published: (2026)
by: Wang, Zheng, et al.
Published: (2026)
Maestro: Joint Graph & Config Optimization for Reliable AI Agents
by: Wang, Wenxiao, et al.
Published: (2025)
by: Wang, Wenxiao, et al.
Published: (2025)
Fast Adversarial Attacks on Language Models In One GPU Minute
by: Sadasivan, Vinu Sankar, et al.
Published: (2024)
by: Sadasivan, Vinu Sankar, et al.
Published: (2024)
GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation
by: He, Jiashu, et al.
Published: (2024)
by: He, Jiashu, et al.
Published: (2024)
Can Language Models Solve Graph Problems in Natural Language?
by: Wang, Heng, et al.
Published: (2023)
by: Wang, Heng, et al.
Published: (2023)
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
by: Dou, Shihan, et al.
Published: (2025)
by: Dou, Shihan, et al.
Published: (2025)
Evaluating Language Models' Evaluations of Games
by: Collins, Katherine M., et al.
Published: (2025)
by: Collins, Katherine M., et al.
Published: (2025)
Similar Items
-
Evaluating LLM Reasoning Beyond Correctness and CoT
by: Abbasloo, Soheil
Published: (2025) -
On Speeding Up Language Model Evaluation
by: Zhou, Jin Peng, et al.
Published: (2024) -
On the Superimposed Noise Accumulation Problem in Sequential Knowledge Editing of Large Language Models
by: Cao, Ding, et al.
Published: (2025) -
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions
by: Li, Yubo, et al.
Published: (2025) -
Beyond Correctness: Learning Robust Reasoning via Transfer
by: Lee, Hyunseok, et al.
Published: (2026)