Saved in:
| Main Authors: | Kadasi, Pritam, Upperwal, Abhishek, Singh, Mayank |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.03103 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning
by: Kadasi, Pritam, et al.
Published: (2025)
by: Kadasi, Pritam, et al.
Published: (2025)
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models
by: Panda, Sailesh, et al.
Published: (2026)
by: Panda, Sailesh, et al.
Published: (2026)
One Instruction Does Not Fit All: How Well Do Embeddings Align Personas and Instructions in Low-Resource Indian Languages?
by: Shah, Arya, et al.
Published: (2026)
by: Shah, Arya, et al.
Published: (2026)
Eka-Eval: An Evaluation Framework for Low-Resource Multilingual Large Language Models
by: Sinha, Samridhi Raj, et al.
Published: (2025)
by: Sinha, Samridhi Raj, et al.
Published: (2025)
How Many Parameters Does Your Task Really Need? Task Specific Pruning with LLM-Sieve
by: Reda, Waleed, et al.
Published: (2025)
by: Reda, Waleed, et al.
Published: (2025)
Short-Context Dominance: How Much Local Context Natural Language Actually Needs?
by: Vakilian, Vala, et al.
Published: (2025)
by: Vakilian, Vala, et al.
Published: (2025)
How Much Can RAG Help the Reasoning of LLM?
by: Liu, Jingyu, et al.
Published: (2024)
by: Liu, Jingyu, et al.
Published: (2024)
How Much Heavy Lifting Can an Agent Harness Do?: Measuring the LLM's Residual Role in a Planning Agent
by: Jung, Sungwoo, et al.
Published: (2026)
by: Jung, Sungwoo, et al.
Published: (2026)
Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions
by: Murugadoss, Bhuvanashree, et al.
Published: (2024)
by: Murugadoss, Bhuvanashree, et al.
Published: (2024)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning
by: Zhao, Chenyang, et al.
Published: (2024)
by: Zhao, Chenyang, et al.
Published: (2024)
How Much is Too Much? Exploring LoRA Rank Trade-offs for Retaining Knowledge and Domain Robustness
by: Rathore, Darshita, et al.
Published: (2025)
by: Rathore, Darshita, et al.
Published: (2025)
DETAIL Matters: Measuring the Impact of Prompt Specificity on Reasoning in Large Language Models
by: Kim, Olivia
Published: (2025)
by: Kim, Olivia
Published: (2025)
Do Large Language Models Know How Much They Know?
by: Prato, Gabriele, et al.
Published: (2025)
by: Prato, Gabriele, et al.
Published: (2025)
How Much Can We Forget about Data Contamination?
by: Bordt, Sebastian, et al.
Published: (2024)
by: Bordt, Sebastian, et al.
Published: (2024)
COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in Hindi-English Code-Mixing
by: Sheth, Rajvee, et al.
Published: (2025)
by: Sheth, Rajvee, et al.
Published: (2025)
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs
by: Yadav, Ankit, et al.
Published: (2024)
by: Yadav, Ankit, et al.
Published: (2024)
Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models
by: Beniwal, Himanshu, et al.
Published: (2026)
by: Beniwal, Himanshu, et al.
Published: (2026)
Quality-Conditioned Agreement in Automated Short Answer Scoring: Mid-Range Degradation and the Impact of Task-Specific Adaptation
by: Schleifer, Abigail Victoria Gurin, et al.
Published: (2026)
by: Schleifer, Abigail Victoria Gurin, et al.
Published: (2026)
How Much Do LLMs Hallucinate across Languages? On Realistic Multilingual Estimation of LLM Hallucination
by: Islam, Saad Obaid ul, et al.
Published: (2025)
by: Islam, Saad Obaid ul, et al.
Published: (2025)
Cross-lingual Editing in Multilingual Language Models
by: Beniwal, Himanshu, et al.
Published: (2024)
by: Beniwal, Himanshu, et al.
Published: (2024)
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
by: Zhang, Ran, et al.
Published: (2024)
by: Zhang, Ran, et al.
Published: (2024)
Where It Really Matters: Few-Shot Environmental Conservation Media Monitoring for Low-Resource Languages
by: Jain, Sameer, et al.
Published: (2024)
by: Jain, Sameer, et al.
Published: (2024)
Measuring What Matters -- or What's Convenient?: Robustness of LLM-Based Scoring Systems to Construct-Irrelevant Factors
by: Walsh, Cole, et al.
Published: (2026)
by: Walsh, Cole, et al.
Published: (2026)
Every Answer Matters: Evaluating Commonsense with Probabilistic Measures
by: Cheng, Qi, et al.
Published: (2024)
by: Cheng, Qi, et al.
Published: (2024)
Instruction Embedding: Latent Representations of Instructions Towards Task Identification
by: Li, Yiwei, et al.
Published: (2024)
by: Li, Yiwei, et al.
Published: (2024)
How Much Would a Clinician Edit This Draft? Evaluating LLM Alignment for Patient Message Response Drafting
by: Seegmiller, Parker, et al.
Published: (2026)
by: Seegmiller, Parker, et al.
Published: (2026)
From No to Know: Taxonomy, Challenges, and Opportunities for Negation Understanding in Multimodal Foundation Models
by: Vatsa, Mayank, et al.
Published: (2025)
by: Vatsa, Mayank, et al.
Published: (2025)
Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs
by: Beniwal, Himanshu, et al.
Published: (2025)
by: Beniwal, Himanshu, et al.
Published: (2025)
No Universal Prompt: Unifying Reasoning through Adaptive Prompting for Temporal Table Reasoning
by: Rajgaria, Abhishek, et al.
Published: (2025)
by: Rajgaria, Abhishek, et al.
Published: (2025)
Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations
by: Aravindan, Ashwath Vaithinathan, et al.
Published: (2026)
by: Aravindan, Ashwath Vaithinathan, et al.
Published: (2026)
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
by: Wu, Yang, et al.
Published: (2024)
by: Wu, Yang, et al.
Published: (2024)
How effective are VLMs in assisting humans in inferring the quality of mental models from Multimodal short answers?
by: Sil, Pritam, et al.
Published: (2026)
by: Sil, Pritam, et al.
Published: (2026)
What Really is Commonsense Knowledge?
by: Do, Quyet V., et al.
Published: (2024)
by: Do, Quyet V., et al.
Published: (2024)
Intent-conditioned and Non-toxic Counterspeech Generation using Multi-Task Instruction Tuning with RLAIF
by: Hengle, Amey, et al.
Published: (2024)
by: Hengle, Amey, et al.
Published: (2024)
BertaQA: How Much Do Language Models Know About Local Culture?
by: Etxaniz, Julen, et al.
Published: (2024)
by: Etxaniz, Julen, et al.
Published: (2024)
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
by: Lu, Yi, et al.
Published: (2025)
by: Lu, Yi, et al.
Published: (2025)
Error Taxonomy-Guided Prompt Optimization
by: Singh, Mayank, et al.
Published: (2026)
by: Singh, Mayank, et al.
Published: (2026)
Theory of Mind in Action: The Instruction Inference Task in Dynamic Human-Agent Collaboration
by: Saad, Fardin, et al.
Published: (2025)
by: Saad, Fardin, et al.
Published: (2025)
Outlier Dimensions Encode Task-Specific Knowledge
by: Rudman, William, et al.
Published: (2023)
by: Rudman, William, et al.
Published: (2023)
Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance
by: Chen, Jingyi, et al.
Published: (2025)
by: Chen, Jingyi, et al.
Published: (2025)
Similar Items
-
ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning
by: Kadasi, Pritam, et al.
Published: (2025) -
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models
by: Panda, Sailesh, et al.
Published: (2026) -
One Instruction Does Not Fit All: How Well Do Embeddings Align Personas and Instructions in Low-Resource Indian Languages?
by: Shah, Arya, et al.
Published: (2026) -
Eka-Eval: An Evaluation Framework for Low-Resource Multilingual Large Language Models
by: Sinha, Samridhi Raj, et al.
Published: (2025) -
How Many Parameters Does Your Task Really Need? Task Specific Pruning with LLM-Sieve
by: Reda, Waleed, et al.
Published: (2025)