:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Eshuijs, Leon, Chaudhury, Archie, McBeth, Alan, Nguyen, Ethan
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.17760
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beyond Next Word Prediction: Developing Comprehensive Evaluation Frameworks for measuring LLM performance on real world applications
by: Agrawal, Vishakha, et al.
Published: (2025)

Evidential Physics-Informed Neural Networks
by: Tan, Hai Siong, et al.
Published: (2025)

Automatic Evaluation Metrics for Artificially Generated Scientific Research
by: Höpner, Niklas, et al.
Published: (2025)

On the Psychology of GPT-4: Moderately anxious, slightly masculine, honest, and humble
by: Barua, Adrita, et al.
Published: (2024)

Can sparse autoencoders be used to decompose and interpret steering vectors?
by: Mayne, Harry, et al.
Published: (2024)

Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge
by: Raju, Ravi, et al.
Published: (2024)

Alignment is Localized: A Causal Probe into Preference Layers
by: Chaudhury, Archie
Published: (2025)

Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge
by: Fathullah, Yassir, et al.
Published: (2025)

Preference Leakage: A Contamination Problem in LLM-as-a-judge
by: Li, Dawei, et al.
Published: (2025)

DEQuify your force field: More efficient simulations using deep equilibrium models
by: Burger, Andreas, et al.
Published: (2025)

How important are the genes to explain the outcome - the asymmetric Shapley value as an honest importance metric for high-dimensional features
by: van de Wiel, Mark A., et al.
Published: (2026)

Secret mixtures of experts inside your LLM
by: Boix-Adsera, Enric
Published: (2025)

Deceptive AI systems that give explanations are more convincing than honest AI systems and can amplify belief in misinformation
by: Danry, Valdemar, et al.
Published: (2024)

Didactic to Constructive: Turning Expert Solutions into Learnable Reasoning
by: Mendes, Ethan, et al.
Published: (2026)

Language Models can Self-Improve at State-Value Estimation for Better Search
by: Mendes, Ethan, et al.
Published: (2025)

Uncertainty quantification in neural network-based glucose prediction for diabetes
by: Tan, Hai Siong, et al.
Published: (2026)

Agentic retrieval-augmented reasoning reshapes collective reliability under model variability in radiology question answering
by: Farajiamiri, Mina, et al.
Published: (2026)

Generation Constraint Scaling Can Mitigate Hallucination
by: Kollias, Georgios, et al.
Published: (2024)

Hybrid Quantum Deep Learning Model for Emotion Detection using raw EEG Signal Analysis
by: Chandanwala, Ali Asgar, et al.
Published: (2024)

What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering
by: Maar, Jim, et al.
Published: (2026)

Agribot: agriculture-specific question answer system
by: Jain, Naman, et al.
Published: (2025)

Toward universal steering and monitoring of AI models
by: Beaglehole, Daniel, et al.
Published: (2025)

A Practical Guide for Evaluating LLMs and LLM-Reliant Systems
by: Rudd, Ethan M., et al.
Published: (2025)

Multi-step retrieval and reasoning improves radiology question answering with large language models
by: Wind, Sebastian, et al.
Published: (2025)

Low-cost Real-world Implementation of the Swing-up Pendulum for Deep Reinforcement Learning Experiments
by: Böhm, Peter, et al.
Published: (2025)

Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
by: Gloaguen, Thibaud, et al.
Published: (2025)

Data-Prep-Kit: getting your data ready for LLM application development
by: Wood, David, et al.
Published: (2024)

GAN-SLAM: Real-Time GAN Aided Floor Plan Creation Through SLAM
by: Davies, Leon, et al.
Published: (2025)

Failure to Mix: Large language models struggle to answer according to desired probability distributions
by: Yang, Ivy Yuqian, et al.
Published: (2025)

MetaLLMix : An XAI Aided LLM-Meta-learning Based Approach for Hyper-parameters Optimization
by: Bal-Ghaoui, Mohamed, et al.
Published: (2025)

Token-Efficient RL for LLM Reasoning
by: Lee, Alan, et al.
Published: (2025)

Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression
by: Park, Jungsoo, et al.
Published: (2026)

Attacks and Defenses Against LLM Fingerprinting
by: Kurian, Kevin, et al.
Published: (2025)

Q3R: Quadratic Reweighted Rank Regularizer for Effective Low-Rank Training
by: Ghosh, Ipsita, et al.
Published: (2025)

Tool-Aided Evolutionary LLM for Generative Policy Toward Efficient Resource Management in Wireless Federated Learning
by: Tan, Chongyang, et al.
Published: (2025)

Automatic AI controller that can drive with confidence: steering vehicle with uncertainty knowledge
by: Kumari, Neha, et al.
Published: (2024)

SALAD: Systematic Assessment of Machine Unlearning on LLM-Aided Hardware Design
by: Wang, Zeng, et al.
Published: (2025)

Safety Training Modulates Harmful Misalignment Under On-Policy RL, But Direction Depends on Environment Design
by: Eshuijs, Leon, et al.
Published: (2026)

Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
by: Eshuijs, Leon, et al.
Published: (2025)

Anticipatory Evaluation of Language Models
by: Park, Jungsoo, et al.
Published: (2025)