Guardado en:
| Autores principales: | Panda, Sailesh, Kadasi, Pritam, Upperwal, Abhishek, Singh, Mayank |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2605.00817 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Task--Specificity Score: Measuring How Much Instructions Really Matter for Supervision
por: Kadasi, Pritam, et al.
Publicado: (2026)
por: Kadasi, Pritam, et al.
Publicado: (2026)
ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning
por: Kadasi, Pritam, et al.
Publicado: (2025)
por: Kadasi, Pritam, et al.
Publicado: (2025)
Eka-Eval: An Evaluation Framework for Low-Resource Multilingual Large Language Models
por: Sinha, Samridhi Raj, et al.
Publicado: (2025)
por: Sinha, Samridhi Raj, et al.
Publicado: (2025)
Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs
por: Beniwal, Himanshu, et al.
Publicado: (2025)
por: Beniwal, Himanshu, et al.
Publicado: (2025)
Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation
por: Kadasi, Pritam, et al.
Publicado: (2025)
por: Kadasi, Pritam, et al.
Publicado: (2025)
Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models
por: kurra, Sailesh kiran, et al.
Publicado: (2026)
por: kurra, Sailesh kiran, et al.
Publicado: (2026)
Evaluating LLMs' Reasoning Over Ordered Procedural Steps
por: Anika, Adrita, et al.
Publicado: (2025)
por: Anika, Adrita, et al.
Publicado: (2025)
ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure
por: Fujisawa, Ippei, et al.
Publicado: (2024)
por: Fujisawa, Ippei, et al.
Publicado: (2024)
Know When To Stop: A Study of Semantic Drift in Text Generation
por: Spataru, Ava, et al.
Publicado: (2024)
por: Spataru, Ava, et al.
Publicado: (2024)
Stop When Enough: Adaptive Early-Stopping for Chain-of-Thought Reasoning
por: Sun, Renliang, et al.
Publicado: (2025)
por: Sun, Renliang, et al.
Publicado: (2025)
Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models
por: Beniwal, Himanshu, et al.
Publicado: (2026)
por: Beniwal, Himanshu, et al.
Publicado: (2026)
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
por: Sun, Simeng, et al.
Publicado: (2025)
por: Sun, Simeng, et al.
Publicado: (2025)
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs
por: Yadav, Ankit, et al.
Publicado: (2024)
por: Yadav, Ankit, et al.
Publicado: (2024)
SmoGVLM: A Small, Graph-enhanced Vision-Language Model
por: Mondal, Debjyoti, et al.
Publicado: (2026)
por: Mondal, Debjyoti, et al.
Publicado: (2026)
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
por: Nrusimha, Aniruddha, et al.
Publicado: (2024)
por: Nrusimha, Aniruddha, et al.
Publicado: (2024)
Cross-lingual Editing in Multilingual Language Models
por: Beniwal, Himanshu, et al.
Publicado: (2024)
por: Beniwal, Himanshu, et al.
Publicado: (2024)
Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models
por: Nie, Shuo, et al.
Publicado: (2026)
por: Nie, Shuo, et al.
Publicado: (2026)
When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs
por: Li, Xiaomin, et al.
Publicado: (2025)
por: Li, Xiaomin, et al.
Publicado: (2025)
Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models
por: Ren, Qingyu, et al.
Publicado: (2025)
por: Ren, Qingyu, et al.
Publicado: (2025)
Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models
por: Min, Dehai, et al.
Publicado: (2026)
por: Min, Dehai, et al.
Publicado: (2026)
Joint Action Language Modelling for Transparent Policy Execution
por: Wulff, Theodor, et al.
Publicado: (2025)
por: Wulff, Theodor, et al.
Publicado: (2025)
A Call for Clarity in Beam Search: How It Works and When It Stops
por: Kasai, Jungo, et al.
Publicado: (2022)
por: Kasai, Jungo, et al.
Publicado: (2022)
Thinking Out of Order: When Output Order Stops Reflecting Reasoning Order in Diffusion Language Models
por: Yu, Longxuan, et al.
Publicado: (2026)
por: Yu, Longxuan, et al.
Publicado: (2026)
The Model's Language Matters: A Comparative Privacy Analysis of LLMs
por: Mishra, Abhishek K., et al.
Publicado: (2025)
por: Mishra, Abhishek K., et al.
Publicado: (2025)
How Robust are the Tabular QA Models for Scientific Tables? A Study using Customized Dataset
por: Ghosh, Akash, et al.
Publicado: (2024)
por: Ghosh, Akash, et al.
Publicado: (2024)
TRACES: Tagging Reasoning Steps for Adaptive Cost-Efficient Early-Stopping
por: Belkhiter, Yannis, et al.
Publicado: (2026)
por: Belkhiter, Yannis, et al.
Publicado: (2026)
Beyond Monolingual Assumptions: A Survey of Code-Switched NLP in the Era of Large Language Models across Modalities
por: Sheth, Rajvee, et al.
Publicado: (2025)
por: Sheth, Rajvee, et al.
Publicado: (2025)
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning
por: Sheng, Leheng, et al.
Publicado: (2026)
por: Sheng, Leheng, et al.
Publicado: (2026)
SOPBench: Evaluating Language Agents at Following Standard Operating Procedures and Constraints
por: Li, Zekun, et al.
Publicado: (2025)
por: Li, Zekun, et al.
Publicado: (2025)
When Context Leads but Parametric Memory Follows in Large Language Models
por: Tao, Yufei, et al.
Publicado: (2024)
por: Tao, Yufei, et al.
Publicado: (2024)
One Instruction Does Not Fit All: How Well Do Embeddings Align Personas and Instructions in Low-Resource Indian Languages?
por: Shah, Arya, et al.
Publicado: (2026)
por: Shah, Arya, et al.
Publicado: (2026)
Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference
por: Shen, Ke, et al.
Publicado: (2024)
por: Shen, Ke, et al.
Publicado: (2024)
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
por: Agarwal, Aradhye, et al.
Publicado: (2026)
por: Agarwal, Aradhye, et al.
Publicado: (2026)
Early Stopping Chain-of-thoughts in Large Language Models
por: Mao, Minjia, et al.
Publicado: (2025)
por: Mao, Minjia, et al.
Publicado: (2025)
HALO: An Ontology for Representing and Categorizing Hallucinations in Large Language Models
por: Nananukul, Navapat, et al.
Publicado: (2023)
por: Nananukul, Navapat, et al.
Publicado: (2023)
Humanlike Cognitive Patterns as Emergent Phenomena in Large Language Models
por: Tang, Zhisheng, et al.
Publicado: (2024)
por: Tang, Zhisheng, et al.
Publicado: (2024)
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
por: Sui, Yang, et al.
Publicado: (2025)
por: Sui, Yang, et al.
Publicado: (2025)
Knowing When to Stop: Efficient Context Processing via Latent Sufficiency Signals
por: Xie, Roy, et al.
Publicado: (2025)
por: Xie, Roy, et al.
Publicado: (2025)
How Linguistics Learned to Stop Worrying and Love the Language Models
por: Futrell, Richard, et al.
Publicado: (2025)
por: Futrell, Richard, et al.
Publicado: (2025)
Strong Memory, Weak Control: An Empirical Study of Executive Functioning in LLMs
por: de Langis, Karin, et al.
Publicado: (2025)
por: de Langis, Karin, et al.
Publicado: (2025)
Ejemplares similares
-
Task--Specificity Score: Measuring How Much Instructions Really Matter for Supervision
por: Kadasi, Pritam, et al.
Publicado: (2026) -
ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning
por: Kadasi, Pritam, et al.
Publicado: (2025) -
Eka-Eval: An Evaluation Framework for Low-Resource Multilingual Large Language Models
por: Sinha, Samridhi Raj, et al.
Publicado: (2025) -
Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs
por: Beniwal, Himanshu, et al.
Publicado: (2025) -
Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation
por: Kadasi, Pritam, et al.
Publicado: (2025)