Saved in:
| Main Authors: | Rahman, Md Awsafur, Gabrys, Adam, Kang, Doug, Sun, Jingjing, Tan, Tian, Chandramouli, Ashwin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.13077 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hyperspectral Trajectory Image for Multi-Month Trajectory Anomaly Detection
by: Rahman, Md Awsafur, et al.
Published: (2026)
by: Rahman, Md Awsafur, et al.
Published: (2026)
Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation
by: Ruschel, Raphael, et al.
Published: (2024)
by: Ruschel, Raphael, et al.
Published: (2024)
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
by: Rahman, Md Awsafur, et al.
Published: (2024)
by: Rahman, Md Awsafur, et al.
Published: (2024)
Hallucination Detection in LLMs Using Spectral Features of Attention Maps
by: Binkowski, Jakub, et al.
Published: (2025)
by: Binkowski, Jakub, et al.
Published: (2025)
AgentBench: Evaluating LLMs as Agents
by: Liu, Xiao, et al.
Published: (2023)
by: Liu, Xiao, et al.
Published: (2023)
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
by: Sawczyn, Albert, et al.
Published: (2025)
by: Sawczyn, Albert, et al.
Published: (2025)
Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
by: Ma, Qiyao, et al.
Published: (2026)
by: Ma, Qiyao, et al.
Published: (2026)
LeMo-NADe: Multi-Parameter Neural Architecture Discovery with LLMs
by: Rahman, Md Hafizur, et al.
Published: (2024)
by: Rahman, Md Hafizur, et al.
Published: (2024)
Steering Code LLMs with Activation Directions for Language and Library Control
by: Rahman, Md Mahbubur, et al.
Published: (2026)
by: Rahman, Md Mahbubur, et al.
Published: (2026)
Personalized Benchmarking: Evaluating LLMs by Individual Preferences
by: Garbacea, Cristina, et al.
Published: (2026)
by: Garbacea, Cristina, et al.
Published: (2026)
The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs
by: Janiak, Denis, et al.
Published: (2025)
by: Janiak, Denis, et al.
Published: (2025)
OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
by: Aggarwal, Pranjal, et al.
Published: (2025)
by: Aggarwal, Pranjal, et al.
Published: (2025)
DM-Bench: Benchmarking LLMs for Personalized Decision Making in Diabetes Management
by: Cardei, Maria Ana, et al.
Published: (2025)
by: Cardei, Maria Ana, et al.
Published: (2025)
CausalBench: A Comprehensive Benchmark for Causal Learning Capability of LLMs
by: Zhou, Yu, et al.
Published: (2024)
by: Zhou, Yu, et al.
Published: (2024)
Learning Under Extreme Data Scarcity: Subject-Level Evaluation of Lightweight CNNs for fMRI-Based Prodromal Parkinsons Detection
by: Rahman, Naimur
Published: (2026)
by: Rahman, Naimur
Published: (2026)
Can LLMs Reason Like Automated Theorem Provers for Rust Verification? VCoT-Bench: Evaluating via Verification Chain of Thought
by: Xie, Zichen, et al.
Published: (2026)
by: Xie, Zichen, et al.
Published: (2026)
MirrorBench: A Benchmark to Evaluate Conversational User-Proxy Agents for Human-Likeness
by: Hathidara, Ashutosh, et al.
Published: (2026)
by: Hathidara, Ashutosh, et al.
Published: (2026)
InductionBench: LLMs Fail in the Simplest Complexity Class
by: Hua, Wenyue, et al.
Published: (2025)
by: Hua, Wenyue, et al.
Published: (2025)
Towards Unbiased Evaluation of Time-series Anomaly Detector
by: Bhattacharya, Debarpan, et al.
Published: (2024)
by: Bhattacharya, Debarpan, et al.
Published: (2024)
From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench
by: Shi, Weikang, et al.
Published: (2026)
by: Shi, Weikang, et al.
Published: (2026)
Comparative Evaluation of Weather Forecasting using Machine Learning Models
by: Rahman, Md Saydur, et al.
Published: (2024)
by: Rahman, Md Saydur, et al.
Published: (2024)
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
by: Kang, Wonjun, et al.
Published: (2025)
by: Kang, Wonjun, et al.
Published: (2025)
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
by: Lin, Zicheng, et al.
Published: (2024)
by: Lin, Zicheng, et al.
Published: (2024)
RESOLVE: Relational Reasoning with Symbolic and Object-Level Features Using Vector Symbolic Processing
by: Mejri, Mohamed, et al.
Published: (2024)
by: Mejri, Mohamed, et al.
Published: (2024)
LARS-VSA: A Vector Symbolic Architecture For Learning with Abstract Rules
by: Mejri, Mohamed, et al.
Published: (2024)
by: Mejri, Mohamed, et al.
Published: (2024)
A Novel Hyperdimensional Computing Framework for Online Time Series Forecasting on the Edge
by: Mejri, Mohamed, et al.
Published: (2024)
by: Mejri, Mohamed, et al.
Published: (2024)
Machine Learning Methods for Small Data and Upstream Bioprocessing Applications: A Comprehensive Review
by: Peng, Johnny, et al.
Published: (2025)
by: Peng, Johnny, et al.
Published: (2025)
Chaining thoughts and LLMs to learn DNA structural biophysics
by: Ross, Tyler D., et al.
Published: (2024)
by: Ross, Tyler D., et al.
Published: (2024)
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
by: Zhao, Siyan, et al.
Published: (2025)
by: Zhao, Siyan, et al.
Published: (2025)
Evaluating LLMs' Reasoning Over Ordered Procedural Steps
by: Anika, Adrita, et al.
Published: (2025)
by: Anika, Adrita, et al.
Published: (2025)
Forget What You Know about LLMs Evaluations -- LLMs are Like a Chameleon
by: Cohen-Inger, Nurit, et al.
Published: (2025)
by: Cohen-Inger, Nurit, et al.
Published: (2025)
TrustGLM: Evaluating the Robustness of GraphLLMs Against Prompt, Text, and Structure Attacks
by: Zhang, Qihai, et al.
Published: (2025)
by: Zhang, Qihai, et al.
Published: (2025)
Relative Positioning Based Code Chunking Method For Rich Context Retrieval In Repository Level Code Completion Task With Code Language Model
by: Rahman, Imranur, et al.
Published: (2025)
by: Rahman, Imranur, et al.
Published: (2025)
AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability
by: Yang, Siwei, et al.
Published: (2024)
by: Yang, Siwei, et al.
Published: (2024)
PersonaGym: Evaluating Persona Agents and LLMs
by: Samuel, Vinay, et al.
Published: (2024)
by: Samuel, Vinay, et al.
Published: (2024)
LLM-Guided Co-Training for Text Classification
by: Rahman, Md Mezbaur, et al.
Published: (2025)
by: Rahman, Md Mezbaur, et al.
Published: (2025)
SmartBench: Evaluating LLMs in Smart Homes with Anomalous Device States and Behavioral Contexts
by: Zou, Qingsong, et al.
Published: (2026)
by: Zou, Qingsong, et al.
Published: (2026)
Analysis, Identification and Prediction of Parkinson Disease Sub-Types and Progression through Machine Learning
by: Ram, Ashwin
Published: (2023)
by: Ram, Ashwin
Published: (2023)
The Laminar Flow Hypothesis: Detecting Jailbreaks via Semantic Turbulence in Large Language Models
by: Rahman, Md. Hasib Ur
Published: (2025)
by: Rahman, Md. Hasib Ur
Published: (2025)
Strategic Fusion Optimizes Transformer Compression
by: Rahman, Md Shoaibur
Published: (2025)
by: Rahman, Md Shoaibur
Published: (2025)
Similar Items
-
Hyperspectral Trajectory Image for Multi-Month Trajectory Anomaly Detection
by: Rahman, Md Awsafur, et al.
Published: (2026) -
Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation
by: Ruschel, Raphael, et al.
Published: (2024) -
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
by: Rahman, Md Awsafur, et al.
Published: (2024) -
Hallucination Detection in LLMs Using Spectral Features of Attention Maps
by: Binkowski, Jakub, et al.
Published: (2025) -
AgentBench: Evaluating LLMs as Agents
by: Liu, Xiao, et al.
Published: (2023)