Saved in:
| Main Authors: | Bonthu, Sridevi, Sree, S. Rama, Prasad, M. H. M. Krishna |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.15837 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ASAG2024: A Combined Benchmark for Short Answer Grading
by: Meyer, Gérôme, et al.
Published: (2024)
by: Meyer, Gérôme, et al.
Published: (2024)
Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis
by: Bennion, Jonathan, et al.
Published: (2025)
by: Bennion, Jonathan, et al.
Published: (2025)
Exploring the Performance of ML/DL Architectures on the MNIST-1D Dataset
by: Beebe, Michael, et al.
Published: (2026)
by: Beebe, Michael, et al.
Published: (2026)
Atlas-Alignment: Making Interpretability Transferable Across Language Models
by: Puri, Bruno, et al.
Published: (2025)
by: Puri, Bruno, et al.
Published: (2025)
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation
by: Rahman, A B M Ashikur, et al.
Published: (2024)
by: Rahman, A B M Ashikur, et al.
Published: (2024)
Explaining Datasets in Words: Statistical Models with Natural Language Parameters
by: Zhong, Ruiqi, et al.
Published: (2024)
by: Zhong, Ruiqi, et al.
Published: (2024)
Bias Similarity Measurement: A Black-Box Audit of Fairness Across LLMs
by: Jeong, Hyejun, et al.
Published: (2024)
by: Jeong, Hyejun, et al.
Published: (2024)
Mitigate Negative Transfer with Similarity Heuristic Lifelong Prompt Tuning
by: Wu, Chenyuan, et al.
Published: (2024)
by: Wu, Chenyuan, et al.
Published: (2024)
Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation
by: Tian, Yijun, et al.
Published: (2024)
by: Tian, Yijun, et al.
Published: (2024)
Semantic Anchors in In-Context Learning: Why Small LLMs Cannot Flip Their Labels
by: Kumar, Anantha Padmanaban Krishna
Published: (2025)
by: Kumar, Anantha Padmanaban Krishna
Published: (2025)
Comparative Analysis of 47 Context-Based Question Answer Models Across 8 Diverse Datasets
by: Muneeb, Muhammad, et al.
Published: (2025)
by: Muneeb, Muhammad, et al.
Published: (2025)
Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains
by: Ramprasad, Sanjana, et al.
Published: (2024)
by: Ramprasad, Sanjana, et al.
Published: (2024)
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities
by: Nikitin, Alexander, et al.
Published: (2024)
by: Nikitin, Alexander, et al.
Published: (2024)
Efficient data selection employing Semantic Similarity-based Graph Structures for model training
by: Petcu, Roxana, et al.
Published: (2024)
by: Petcu, Roxana, et al.
Published: (2024)
Answer Matching Outperforms Multiple Choice for Language Model Evaluation
by: Chandak, Nikhil, et al.
Published: (2025)
by: Chandak, Nikhil, et al.
Published: (2025)
When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition
by: Yao, Siyang, et al.
Published: (2026)
by: Yao, Siyang, et al.
Published: (2026)
Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models
by: Mackraz, Natalie, et al.
Published: (2024)
by: Mackraz, Natalie, et al.
Published: (2024)
Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models
by: Jha, Abha, et al.
Published: (2026)
by: Jha, Abha, et al.
Published: (2026)
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation
by: Qi, Jirui, et al.
Published: (2024)
by: Qi, Jirui, et al.
Published: (2024)
Linguistic Patterns in Pandemic-Related Content: A Comparative Analysis of COVID-19, Constraint, and Monkeypox Datasets
by: Sikosana, Mkululi, et al.
Published: (2025)
by: Sikosana, Mkululi, et al.
Published: (2025)
MultiQ&A: An Analysis in Measuring Robustness via Automated Crowdsourcing of Question Perturbations and Answers
by: Cho, Nicole, et al.
Published: (2025)
by: Cho, Nicole, et al.
Published: (2025)
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
by: Li, Kenneth, et al.
Published: (2023)
by: Li, Kenneth, et al.
Published: (2023)
Explicit Diversity Conditions for Effective Question Answer Generation with Large Language Models
by: Yadav, Vikas, et al.
Published: (2024)
by: Yadav, Vikas, et al.
Published: (2024)
Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis
by: Nishio, S., et al.
Published: (2024)
by: Nishio, S., et al.
Published: (2024)
Proving that Cryptic Crossword Clue Answers are Correct
by: Andrews, Martin, et al.
Published: (2024)
by: Andrews, Martin, et al.
Published: (2024)
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
by: Yan, Tianyi Lorena, et al.
Published: (2025)
by: Yan, Tianyi Lorena, et al.
Published: (2025)
How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse
by: Seddik, Mohamed El Amine, et al.
Published: (2024)
by: Seddik, Mohamed El Amine, et al.
Published: (2024)
Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset
by: Henkel, Owen, et al.
Published: (2023)
by: Henkel, Owen, et al.
Published: (2023)
SecureCode: A Production-Grade Multi-Turn Dataset for Training Security-Aware Code Generation Models
by: Thornton, Scott
Published: (2025)
by: Thornton, Scott
Published: (2025)
Evaluating Open-Source Vision Language Models for Facial Emotion Recognition against Traditional Deep Learning Models
by: Mulukutla, Vamsi Krishna, et al.
Published: (2025)
by: Mulukutla, Vamsi Krishna, et al.
Published: (2025)
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
by: Wei, Jiaheng, et al.
Published: (2024)
by: Wei, Jiaheng, et al.
Published: (2024)
Clarify, Abstain or Answer? Strategising in Conversation with Belief-Augmented Generation
by: Baan, Joris, et al.
Published: (2026)
by: Baan, Joris, et al.
Published: (2026)
SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?
by: Kirchhof, Michael, et al.
Published: (2025)
by: Kirchhof, Michael, et al.
Published: (2025)
When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment
by: Zhang, Long, et al.
Published: (2026)
by: Zhang, Long, et al.
Published: (2026)
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
by: Zhao, Shuai, et al.
Published: (2025)
by: Zhao, Shuai, et al.
Published: (2025)
From Associations to Activations: Comparing Behavioral and Hidden-State Semantic Geometry in LLMs
by: Schiekiera, Louis, et al.
Published: (2026)
by: Schiekiera, Louis, et al.
Published: (2026)
Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs
by: Bhattacharyya, Sree, et al.
Published: (2026)
by: Bhattacharyya, Sree, et al.
Published: (2026)
Mechanistic Interpretability as Statistical Estimation: A Variance Analysis
by: Méloux, Maxime, et al.
Published: (2025)
by: Méloux, Maxime, et al.
Published: (2025)
The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection
by: Hu, Zhengyu, et al.
Published: (2026)
by: Hu, Zhengyu, et al.
Published: (2026)
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
by: Bordt, Sebastian, et al.
Published: (2025)
by: Bordt, Sebastian, et al.
Published: (2025)
Similar Items
-
ASAG2024: A Combined Benchmark for Short Answer Grading
by: Meyer, Gérôme, et al.
Published: (2024) -
Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis
by: Bennion, Jonathan, et al.
Published: (2025) -
Exploring the Performance of ML/DL Architectures on the MNIST-1D Dataset
by: Beebe, Michael, et al.
Published: (2026) -
Atlas-Alignment: Making Interpretability Transferable Across Language Models
by: Puri, Bruno, et al.
Published: (2025) -
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation
by: Rahman, A B M Ashikur, et al.
Published: (2024)