:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rahman, Md Awsafur, Gabrys, Adam, Kang, Doug, Sun, Jingjing, Tan, Tian, Chandramouli, Ashwin
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2512.13077
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hyperspectral Trajectory Image for Multi-Month Trajectory Anomaly Detection
by: Rahman, Md Awsafur, et al.
Published: (2026)

Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation
by: Ruschel, Raphael, et al.
Published: (2024)

SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
by: Rahman, Md Awsafur, et al.
Published: (2024)

Hallucination Detection in LLMs Using Spectral Features of Attention Maps
by: Binkowski, Jakub, et al.
Published: (2025)

AgentBench: Evaluating LLMs as Agents
by: Liu, Xiao, et al.
Published: (2023)

FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
by: Sawczyn, Albert, et al.
Published: (2025)

Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
by: Ma, Qiyao, et al.
Published: (2026)

LeMo-NADe: Multi-Parameter Neural Architecture Discovery with LLMs
by: Rahman, Md Hafizur, et al.
Published: (2024)

Steering Code LLMs with Activation Directions for Language and Library Control
by: Rahman, Md Mahbubur, et al.
Published: (2026)

Personalized Benchmarking: Evaluating LLMs by Individual Preferences
by: Garbacea, Cristina, et al.
Published: (2026)

The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs
by: Janiak, Denis, et al.
Published: (2025)

OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
by: Aggarwal, Pranjal, et al.
Published: (2025)

DM-Bench: Benchmarking LLMs for Personalized Decision Making in Diabetes Management
by: Cardei, Maria Ana, et al.
Published: (2025)

CausalBench: A Comprehensive Benchmark for Causal Learning Capability of LLMs
by: Zhou, Yu, et al.
Published: (2024)

Learning Under Extreme Data Scarcity: Subject-Level Evaluation of Lightweight CNNs for fMRI-Based Prodromal Parkinsons Detection
by: Rahman, Naimur
Published: (2026)

Can LLMs Reason Like Automated Theorem Provers for Rust Verification? VCoT-Bench: Evaluating via Verification Chain of Thought
by: Xie, Zichen, et al.
Published: (2026)

MirrorBench: A Benchmark to Evaluate Conversational User-Proxy Agents for Human-Likeness
by: Hathidara, Ashutosh, et al.
Published: (2026)

InductionBench: LLMs Fail in the Simplest Complexity Class
by: Hua, Wenyue, et al.
Published: (2025)

Towards Unbiased Evaluation of Time-series Anomaly Detector
by: Bhattacharya, Debarpan, et al.
Published: (2024)

From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench
by: Shi, Weikang, et al.
Published: (2026)

Comparative Evaluation of Weather Forecasting using Machine Learning Models
by: Rahman, Md Saydur, et al.
Published: (2024)

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
by: Kang, Wonjun, et al.
Published: (2025)

CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
by: Lin, Zicheng, et al.
Published: (2024)

RESOLVE: Relational Reasoning with Symbolic and Object-Level Features Using Vector Symbolic Processing
by: Mejri, Mohamed, et al.
Published: (2024)

LARS-VSA: A Vector Symbolic Architecture For Learning with Abstract Rules
by: Mejri, Mohamed, et al.
Published: (2024)

A Novel Hyperdimensional Computing Framework for Online Time Series Forecasting on the Edge
by: Mejri, Mohamed, et al.
Published: (2024)

Machine Learning Methods for Small Data and Upstream Bioprocessing Applications: A Comprehensive Review
by: Peng, Johnny, et al.
Published: (2025)

Chaining thoughts and LLMs to learn DNA structural biophysics
by: Ross, Tyler D., et al.
Published: (2024)

Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
by: Zhao, Siyan, et al.
Published: (2025)

Evaluating LLMs' Reasoning Over Ordered Procedural Steps
by: Anika, Adrita, et al.
Published: (2025)

Forget What You Know about LLMs Evaluations -- LLMs are Like a Chameleon
by: Cohen-Inger, Nurit, et al.
Published: (2025)

TrustGLM: Evaluating the Robustness of GraphLLMs Against Prompt, Text, and Structure Attacks
by: Zhang, Qihai, et al.
Published: (2025)

Relative Positioning Based Code Chunking Method For Rich Context Retrieval In Repository Level Code Completion Task With Code Language Model
by: Rahman, Imranur, et al.
Published: (2025)

AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability
by: Yang, Siwei, et al.
Published: (2024)

PersonaGym: Evaluating Persona Agents and LLMs
by: Samuel, Vinay, et al.
Published: (2024)

LLM-Guided Co-Training for Text Classification
by: Rahman, Md Mezbaur, et al.
Published: (2025)

SmartBench: Evaluating LLMs in Smart Homes with Anomalous Device States and Behavioral Contexts
by: Zou, Qingsong, et al.
Published: (2026)

Analysis, Identification and Prediction of Parkinson Disease Sub-Types and Progression through Machine Learning
by: Ram, Ashwin
Published: (2023)

The Laminar Flow Hypothesis: Detecting Jailbreaks via Semantic Turbulence in Large Language Models
by: Rahman, Md. Hasib Ur
Published: (2025)

Strategic Fusion Optimizes Transformer Compression
by: Rahman, Md Shoaibur
Published: (2025)