Saved in:
| Main Authors: | Bandooni, Ashutosh, Subburaj, Brindha |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.03737 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning
by: Ovcharov, Volodymyr
Published: (2026)
by: Ovcharov, Volodymyr
Published: (2026)
EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models
by: Paech, Samuel J.
Published: (2023)
by: Paech, Samuel J.
Published: (2023)
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles
by: Budagam, Devichand, et al.
Published: (2024)
by: Budagam, Devichand, et al.
Published: (2024)
A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
by: Oesterheld, Caspar, et al.
Published: (2024)
by: Oesterheld, Caspar, et al.
Published: (2024)
Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review
by: Peters, Sydney, et al.
Published: (2025)
by: Peters, Sydney, et al.
Published: (2025)
USTCCTSU at SemEval-2024 Task 1: Reducing Anisotropy for Cross-lingual Semantic Textual Relatedness Task
by: Li, Jianjian, et al.
Published: (2024)
by: Li, Jianjian, et al.
Published: (2024)
Multilingual LLMs Inherently Reward In-Language Time-Sensitive Semantic Alignment for Low-Resource Languages
by: Bajpai, Ashutosh, et al.
Published: (2024)
by: Bajpai, Ashutosh, et al.
Published: (2024)
Partially Recentralization Softmax Loss for Vision-Language Models Robustness
by: Wang, Hao, et al.
Published: (2024)
by: Wang, Hao, et al.
Published: (2024)
Swiss-Bench SBP-002: A Frontier Model Comparison on Swiss Legal and Regulatory Tasks
by: Uenal, Fatih
Published: (2026)
by: Uenal, Fatih
Published: (2026)
A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models
by: Balasubramanian, Sriram, et al.
Published: (2025)
by: Balasubramanian, Sriram, et al.
Published: (2025)
UniHetero: Could Generation Enhance Understanding for Vision-Language-Model at Large Data Scale?
by: Chen, Fengjiao, et al.
Published: (2025)
by: Chen, Fengjiao, et al.
Published: (2025)
RomanLens: The Role Of Latent Romanization In Multilinguality In LLMs
by: Saji, Alan, et al.
Published: (2025)
by: Saji, Alan, et al.
Published: (2025)
ExpressivityBench: Can LLMs Communicate Implicitly?
by: Tint, Joshua, et al.
Published: (2024)
by: Tint, Joshua, et al.
Published: (2024)
PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI
by: Mukhopadhyay, Srija, et al.
Published: (2025)
by: Mukhopadhyay, Srija, et al.
Published: (2025)
UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models
by: Chen, Chen, et al.
Published: (2025)
by: Chen, Chen, et al.
Published: (2025)
Large Language Model (LLM) Bias Index -- LLMBI
by: Oketunji, Abiodun Finbarrs, et al.
Published: (2023)
by: Oketunji, Abiodun Finbarrs, et al.
Published: (2023)
SomaliBench Eval: Measuring English-to-Somali Refusal Gaps in Open-Weight Language Models
by: Dahir, Khalid Yusuf
Published: (2026)
by: Dahir, Khalid Yusuf
Published: (2026)
Robustness of Large Language Models to Perturbations in Text
by: Singh, Ayush, et al.
Published: (2024)
by: Singh, Ayush, et al.
Published: (2024)
Performance Evaluation of Sentiment Analysis on Text and Emoji Data Using End-to-End, Transfer Learning, Distributed and Explainable AI Models
by: Velampalli, Sirisha, et al.
Published: (2025)
by: Velampalli, Sirisha, et al.
Published: (2025)
Mechanistic evaluation of Transformers and state space models
by: Arora, Aryaman, et al.
Published: (2025)
by: Arora, Aryaman, et al.
Published: (2025)
Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text
by: Oketunji, Abiodun Finbarrs
Published: (2023)
by: Oketunji, Abiodun Finbarrs
Published: (2023)
Skill Availability and Presentation Granularity in Large-Language-Model Agents: A Controlled SkillsBench Study
by: Xu, Xiaonan, et al.
Published: (2026)
by: Xu, Xiaonan, et al.
Published: (2026)
A comprehensive taxonomy of hallucinations in Large Language Models
by: Cossio, Manuel
Published: (2025)
by: Cossio, Manuel
Published: (2025)
Language Models are Crossword Solvers
by: Saha, Soumadeep, et al.
Published: (2024)
by: Saha, Soumadeep, et al.
Published: (2024)
Super Tiny Language Models
by: Hillier, Dylan, et al.
Published: (2024)
by: Hillier, Dylan, et al.
Published: (2024)
Egalitarian Language Representation in Language Models: It All Begins with Tokenizers
by: Velayuthan, Menan, et al.
Published: (2024)
by: Velayuthan, Menan, et al.
Published: (2024)
PatentGPT: A Large Language Model for Intellectual Property
by: Bai, Zilong, et al.
Published: (2024)
by: Bai, Zilong, et al.
Published: (2024)
BabyReasoningBench: Generating Developmentally-Inspired Reasoning Tasks for Evaluating Baby Language Models
by: Dhole, Kaustubh D.
Published: (2026)
by: Dhole, Kaustubh D.
Published: (2026)
Adaptive Focus Memory for Language Models
by: Cruz, Christopher
Published: (2025)
by: Cruz, Christopher
Published: (2025)
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
by: Rai, Daking, et al.
Published: (2024)
by: Rai, Daking, et al.
Published: (2024)
Predictive Simultaneous Interpretation: Harnessing Large Language Models for Democratizing Real-Time Multilingual Communication
by: Iida, Kurando, et al.
Published: (2024)
by: Iida, Kurando, et al.
Published: (2024)
UrduBench: An Urdu Reasoning Benchmark using Contextually Ensembled Translations with Human-in-the-Loop
by: Shafique, Muhammad Ali, et al.
Published: (2026)
by: Shafique, Muhammad Ali, et al.
Published: (2026)
Plain language adaptations of biomedical text using LLMs: Comparision of evaluation metrics
by: Kocbek, Primoz, et al.
Published: (2025)
by: Kocbek, Primoz, et al.
Published: (2025)
FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models
by: Shamsfard, Mehrnoush, et al.
Published: (2025)
by: Shamsfard, Mehrnoush, et al.
Published: (2025)
Language Model Circuits Are Sparse in the Neuron Basis
by: Arora, Aryaman, et al.
Published: (2026)
by: Arora, Aryaman, et al.
Published: (2026)
Inference to the Best Explanation in Large Language Models
by: Dalal, Dhairya, et al.
Published: (2024)
by: Dalal, Dhairya, et al.
Published: (2024)
Bielik 11B v3: Multilingual Large Language Model for European Languages
by: Ociepa, Krzysztof, et al.
Published: (2025)
by: Ociepa, Krzysztof, et al.
Published: (2025)
Integrating Emotional and Linguistic Models for Ethical Compliance in Large Language Models
by: Chang, Edward Y.
Published: (2024)
by: Chang, Edward Y.
Published: (2024)
Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models
by: Benkirane, Kenza, et al.
Published: (2024)
by: Benkirane, Kenza, et al.
Published: (2024)
TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models
by: Badshah, Sher, et al.
Published: (2025)
by: Badshah, Sher, et al.
Published: (2025)
Similar Items
-
UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning
by: Ovcharov, Volodymyr
Published: (2026) -
EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models
by: Paech, Samuel J.
Published: (2023) -
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles
by: Budagam, Devichand, et al.
Published: (2024) -
A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
by: Oesterheld, Caspar, et al.
Published: (2024) -
Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review
by: Peters, Sydney, et al.
Published: (2025)