Enregistré dans:
| Auteurs principaux: | M, Eshwar Reddy, Karmakar, Sourav |
|---|---|
| Format: | Preprint |
| Publié: |
2026
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2603.16197 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
Cause and Effect: Can Large Language Models Truly Understand Causality?
par: Ashwani, Swagata, et autres
Publié: (2024)
par: Ashwani, Swagata, et autres
Publié: (2024)
Benchmarks Saturate When The Model Gets Smarter Than The Judge
par: Ballon, Marthe, et autres
Publié: (2026)
par: Ballon, Marthe, et autres
Publié: (2026)
Reasoning with Sampling: Your Base Model is Smarter Than You Think
par: Karan, Aayush, et autres
Publié: (2025)
par: Karan, Aayush, et autres
Publié: (2025)
How Well Do Large Language Models Truly Ground?
par: Lee, Hyunji, et autres
Publié: (2023)
par: Lee, Hyunji, et autres
Publié: (2023)
Is Pre-training Truly Better Than Meta-Learning?
par: Miranda, Brando, et autres
Publié: (2023)
par: Miranda, Brando, et autres
Publié: (2023)
Language Models Largely Exhibit Human-like Constituent Ordering Preferences
par: Tur, Ada Defne, et autres
Publié: (2025)
par: Tur, Ada Defne, et autres
Publié: (2025)
The End of Manual Decoding: Towards Truly End-to-End Language Models
par: Wang, Zhichao, et autres
Publié: (2025)
par: Wang, Zhichao, et autres
Publié: (2025)
Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?
par: Li, Pengxiang, et autres
Publié: (2026)
par: Li, Pengxiang, et autres
Publié: (2026)
What Single-Prompt Accuracy Misses: A Multi-Variant Reliability Audit of Language Models
par: Karmakar, Ranit, et autres
Publié: (2026)
par: Karmakar, Ranit, et autres
Publié: (2026)
Can LLMs Truly Embody Human Personality? Analyzing AI and Human Behavior Alignment in Dispute Resolution
par: Kwon, Deuksin, et autres
Publié: (2026)
par: Kwon, Deuksin, et autres
Publié: (2026)
Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology
par: Xie, Wei, et autres
Publié: (2024)
par: Xie, Wei, et autres
Publié: (2024)
Scope Ambiguities in Large Language Models
par: Kamath, Gaurav, et autres
Publié: (2024)
par: Kamath, Gaurav, et autres
Publié: (2024)
Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks
par: Wang, Yang, et autres
Publié: (2025)
par: Wang, Yang, et autres
Publié: (2025)
SUGAR: Leveraging Contextual Confidence for Smarter Retrieval
par: Zubkova, Hanna, et autres
Publié: (2025)
par: Zubkova, Hanna, et autres
Publié: (2025)
Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language?
par: Ghahroodi, Omid, et autres
Publié: (2024)
par: Ghahroodi, Omid, et autres
Publié: (2024)
Using Perspectival Words Is Harder Than Vocabulary Words for Humans and Even More So for Multimodal Language Models
par: Dong, Dota Tianai, et autres
Publié: (2025)
par: Dong, Dota Tianai, et autres
Publié: (2025)
Continual-learning for Modelling Low-Resource Languages from Large Language Models
par: K, Santosh Srinath, et autres
Publié: (2026)
par: K, Santosh Srinath, et autres
Publié: (2026)
Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models
par: Bakhshi, Asim D.
Publié: (2026)
par: Bakhshi, Asim D.
Publié: (2026)
Metamorphic Testing for Fairness Evaluation in Large Language Models: Identifying Intersectional Bias in LLaMA and GPT
par: Reddy, Harishwar, et autres
Publié: (2025)
par: Reddy, Harishwar, et autres
Publié: (2025)
Are self-explanations from Large Language Models faithful?
par: Madsen, Andreas, et autres
Publié: (2024)
par: Madsen, Andreas, et autres
Publié: (2024)
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision
par: Pala, Tej Deep, et autres
Publié: (2025)
par: Pala, Tej Deep, et autres
Publié: (2025)
AgentFloor: How Far Up the tool use Ladder Can Small Open-Weight Models Go?
par: Karmakar, Ranit, et autres
Publié: (2026)
par: Karmakar, Ranit, et autres
Publié: (2026)
Is Multilingual LLM Watermarking Truly Multilingual? Scaling Robustness to 100+ Languages via Back-Translation
par: Mohamed, Asim, et autres
Publié: (2025)
par: Mohamed, Asim, et autres
Publié: (2025)
Informed Routing in LLMs: Smarter Token-Level Computation for Faster Inference
par: Han, Chao, et autres
Publié: (2025)
par: Han, Chao, et autres
Publié: (2025)
Leviathan: Decoupling Input and Output Representations in Language Models
par: Batley, Reza T., et autres
Publié: (2026)
par: Batley, Reza T., et autres
Publié: (2026)
Divergent Creativity in Humans and Large Language Models
par: Bellemare-Pepin, Antoine, et autres
Publié: (2024)
par: Bellemare-Pepin, Antoine, et autres
Publié: (2024)
Two Heads Are Better Than One: Integrating Knowledge from Knowledge Graphs and Large Language Models for Entity Alignment
par: Yang, Linyao, et autres
Publié: (2024)
par: Yang, Linyao, et autres
Publié: (2024)
BELL: Benchmarking the Explainability of Large Language Models
par: Ahmed, Syed Quiser, et autres
Publié: (2025)
par: Ahmed, Syed Quiser, et autres
Publié: (2025)
Loop as a Bridge: Can Looped Transformers Truly Link Representation Space and Natural Language Outputs?
par: Chen, Guanxu, et autres
Publié: (2026)
par: Chen, Guanxu, et autres
Publié: (2026)
Enhancing Human-Like Responses in Large Language Models
par: Çalık, Ethem Yağız, et autres
Publié: (2025)
par: Çalık, Ethem Yağız, et autres
Publié: (2025)
Humanity in AI: Detecting the Personality of Large Language Models
par: Zhan, Baohua, et autres
Publié: (2024)
par: Zhan, Baohua, et autres
Publié: (2024)
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
par: BehnamGhader, Parishad, et autres
Publié: (2024)
par: BehnamGhader, Parishad, et autres
Publié: (2024)
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
par: He, Nan, et autres
Publié: (2023)
par: He, Nan, et autres
Publié: (2023)
Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning
par: Li, Ang, et autres
Publié: (2025)
par: Li, Ang, et autres
Publié: (2025)
Algorithmic Cultivation: How Social Media Feeds Shape User Language
par: Pal, Olivia, et autres
Publié: (2026)
par: Pal, Olivia, et autres
Publié: (2026)
Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research
par: Zhong, Tianyang, et autres
Publié: (2024)
par: Zhong, Tianyang, et autres
Publié: (2024)
Comparing Human and Large Language Model Interpretation of Implicit Information
par: De Santis, Antonio, et autres
Publié: (2026)
par: De Santis, Antonio, et autres
Publié: (2026)
Aligning Large Language Model Behavior with Human Citation Preferences
par: Ando, Kenichiro, et autres
Publié: (2026)
par: Ando, Kenichiro, et autres
Publié: (2026)
Can Large Language Models Express Uncertainty Like Human?
par: Tao, Linwei, et autres
Publié: (2025)
par: Tao, Linwei, et autres
Publié: (2025)
High-Dimension Human Value Representation in Large Language Models
par: Cahyawijaya, Samuel, et autres
Publié: (2024)
par: Cahyawijaya, Samuel, et autres
Publié: (2024)
Documents similaires
-
Cause and Effect: Can Large Language Models Truly Understand Causality?
par: Ashwani, Swagata, et autres
Publié: (2024) -
Benchmarks Saturate When The Model Gets Smarter Than The Judge
par: Ballon, Marthe, et autres
Publié: (2026) -
Reasoning with Sampling: Your Base Model is Smarter Than You Think
par: Karan, Aayush, et autres
Publié: (2025) -
How Well Do Large Language Models Truly Ground?
par: Lee, Hyunji, et autres
Publié: (2023) -
Is Pre-training Truly Better Than Meta-Learning?
par: Miranda, Brando, et autres
Publié: (2023)