Saved in:
| Main Authors: | Au, Steven, Noronha, Sujit |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.07749 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Epistemic Integrity in Large Language Models
by: Ghafouri, Bijean, et al.
Published: (2024)
by: Ghafouri, Bijean, et al.
Published: (2024)
Benchmarking Gaslighting Attacks Against Speech Large Language Models
by: Wu, Jinyang, et al.
Published: (2025)
by: Wu, Jinyang, et al.
Published: (2025)
Benchmarking Gaslighting Negation Attacks Against Multimodal Large Language Models
by: Zhu, Bin, et al.
Published: (2025)
by: Zhu, Bin, et al.
Published: (2025)
SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models
by: Xu, Zixiang, et al.
Published: (2025)
by: Xu, Zixiang, et al.
Published: (2025)
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models
by: Castillo-Bolado, David, et al.
Published: (2024)
by: Castillo-Bolado, David, et al.
Published: (2024)
Epistemic Diversity and Knowledge Collapse in Large Language Models
by: Wright, Dustin, et al.
Published: (2025)
by: Wright, Dustin, et al.
Published: (2025)
Personalized Graph-Based Retrieval for Large Language Models
by: Au, Steven, et al.
Published: (2025)
by: Au, Steven, et al.
Published: (2025)
Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models
by: Liu, Jin, et al.
Published: (2024)
by: Liu, Jin, et al.
Published: (2024)
PeReGrINE: Evaluating Personalized Review Fidelity with User Item Graph Context
by: Au, Steven, et al.
Published: (2026)
by: Au, Steven, et al.
Published: (2026)
Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes
by: Li, Meng, et al.
Published: (2025)
by: Li, Meng, et al.
Published: (2025)
Beyond Facts: Benchmarking Distributional Reading Comprehension in Large Language Models
by: Guo, Pei-Fu, et al.
Published: (2026)
by: Guo, Pei-Fu, et al.
Published: (2026)
Beyond Prediction -- Structuring Epistemic Integrity in Artificial Reasoning Systems
by: Wright, Craig Steven
Published: (2025)
by: Wright, Craig Steven
Published: (2025)
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
by: Yi, Jingwei, et al.
Published: (2023)
by: Yi, Jingwei, et al.
Published: (2023)
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
by: Hagendorff, Thilo, et al.
Published: (2025)
by: Hagendorff, Thilo, et al.
Published: (2025)
JBBQ: Japanese Bias Benchmark for Analyzing Social Biases in Large Language Models
by: Yanaka, Hitomi, et al.
Published: (2024)
by: Yanaka, Hitomi, et al.
Published: (2024)
Epistemic Observability in Language Models
by: Mason, Tony, et al.
Published: (2026)
by: Mason, Tony, et al.
Published: (2026)
The Polite Liar: Epistemic Pathology in Language Models
by: DeVilling, Bentley
Published: (2025)
by: DeVilling, Bentley
Published: (2025)
Human-Level and Beyond: Benchmarking Large Language Models Against Clinical Pharmacists in Prescription Review
by: Yang, Yan, et al.
Published: (2025)
by: Yang, Yan, et al.
Published: (2025)
Beyond the Leaderboard: Rethinking Medical Benchmarks for Large Language Models
by: Chen, Wenting, et al.
Published: (2025)
by: Chen, Wenting, et al.
Published: (2025)
On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models
by: Seo, Hoigi, et al.
Published: (2025)
by: Seo, Hoigi, et al.
Published: (2025)
TF-Attack: Transferable and Fast Adversarial Attacks on Large Language Models
by: Li, Zelin, et al.
Published: (2024)
by: Li, Zelin, et al.
Published: (2024)
SI-Bench: Benchmarking Social Intelligence of Large Language Models in Human-to-Human Conversations
by: Huang, Shuai, et al.
Published: (2025)
by: Huang, Shuai, et al.
Published: (2025)
SNS-Bench-VL: Benchmarking Multimodal Large Language Models in Social Networking Services
by: Guo, Hongcheng, et al.
Published: (2025)
by: Guo, Hongcheng, et al.
Published: (2025)
Jailbreaking Large Language Models with Morality Attacks
by: Su, Ying, et al.
Published: (2026)
by: Su, Ying, et al.
Published: (2026)
Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya
by: Sathish, Sharath
Published: (2026)
by: Sathish, Sharath
Published: (2026)
Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
by: Liu, Jiayu, et al.
Published: (2025)
by: Liu, Jiayu, et al.
Published: (2025)
Large Language Models for Medical Forecasting -- Foresight 2
by: Kraljevic, Zeljko, et al.
Published: (2024)
by: Kraljevic, Zeljko, et al.
Published: (2024)
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
by: Zheng, Shen, et al.
Published: (2023)
by: Zheng, Shen, et al.
Published: (2023)
Beyond Context to Cognitive Appraisal: Emotion Reasoning as a Theory of Mind Benchmark for Large Language Models
by: Yeo, Gerard Christopher, et al.
Published: (2025)
by: Yeo, Gerard Christopher, et al.
Published: (2025)
MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms
by: Jin, Yiqiao, et al.
Published: (2024)
by: Jin, Yiqiao, et al.
Published: (2024)
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models
by: Shu, Dong, et al.
Published: (2024)
by: Shu, Dong, et al.
Published: (2024)
Pet-Bench: Benchmarking the Abilities of Large Language Models as E-Pets in Social Network Services
by: Guo, Hongcheng, et al.
Published: (2025)
by: Guo, Hongcheng, et al.
Published: (2025)
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models
by: Zheng, Jiasheng, et al.
Published: (2024)
by: Zheng, Jiasheng, et al.
Published: (2024)
Robustness of Large Language Models Against Adversarial Attacks
by: Tao, Yiyi, et al.
Published: (2024)
by: Tao, Yiyi, et al.
Published: (2024)
EpiK-Eval: Evaluation for Language Models as Epistemic Models
by: Prato, Gabriele, et al.
Published: (2023)
by: Prato, Gabriele, et al.
Published: (2023)
Social Bias Probing: Fairness Benchmarking for Language Models
by: Manerba, Marta Marchiori, et al.
Published: (2023)
by: Manerba, Marta Marchiori, et al.
Published: (2023)
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering
by: Chang, Hwan, et al.
Published: (2025)
by: Chang, Hwan, et al.
Published: (2025)
BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks
by: Sokol, Anna, et al.
Published: (2024)
by: Sokol, Anna, et al.
Published: (2024)
TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models
by: Xu, Zhi, et al.
Published: (2026)
by: Xu, Zhi, et al.
Published: (2026)
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
by: Kang, Zhaolu, et al.
Published: (2025)
by: Kang, Zhaolu, et al.
Published: (2025)
Similar Items
-
Epistemic Integrity in Large Language Models
by: Ghafouri, Bijean, et al.
Published: (2024) -
Benchmarking Gaslighting Attacks Against Speech Large Language Models
by: Wu, Jinyang, et al.
Published: (2025) -
Benchmarking Gaslighting Negation Attacks Against Multimodal Large Language Models
by: Zhu, Bin, et al.
Published: (2025) -
SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models
by: Xu, Zixiang, et al.
Published: (2025) -
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models
by: Castillo-Bolado, David, et al.
Published: (2024)