:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chaudhary, Isha, Hu, Qian, Kumar, Manoj, Ziyadi, Morteza, Gupta, Rahul, Singh, Gagandeep
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2405.18780
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Certifying Knowledge Comprehension in LLMs
by: Chaudhary, Isha, et al.
Published: (2024)

How Catastrophic is Your LLM? Certifying Risk in Conversation
by: Wang, Chengxiao, et al.
Published: (2025)

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
by: Vega, Jason, et al.
Published: (2023)

Revealing Interpretable Failure Modes of VLMs
by: Chaudhary, Isha, et al.
Published: (2026)

C-MORAL: Controllable Multi-Objective Molecular Optimization with Reinforcement Alignment for LLMs
by: Gao, Rui, et al.
Published: (2026)

Formal Synthesis of Certifiably Robust Neural Lyapunov-Barrier Certificates
by: Wang, Chengxiao, et al.
Published: (2026)

Quantitative Certification of Agentic Tool Selection
by: Yeon, Jehyeok, et al.
Published: (2025)

Specification Generation for Neural Networks in Systems
by: Chaudhary, Isha, et al.
Published: (2024)

Data Shifts Hurt CoT: A Theoretical Study
by: Yin, Lang, et al.
Published: (2025)

Probabilistic Trust Intervals for Out of Distribution Detection
by: Singh, Gagandeep, et al.
Published: (2021)

BeyondBench: Contamination-Resistant Evaluation of Reasoning in Language Models
by: Srivastava, Gaurav, et al.
Published: (2025)

Explainable AML Triage with LLMs: Evidence Retrieval and Counterfactual Checks
by: Torres, Dorothy, et al.
Published: (2026)

COMET: Neural Cost Model Explanation Framework
by: Chaudhary, Isha, et al.
Published: (2023)

Mitigating Gender Bias in Depression Detection via Counterfactual Inference
by: Hu, Mingxuan, et al.
Published: (2025)

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels
by: Rath, Plawan Kumar, et al.
Published: (2026)

Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning
by: Xu, Yinglun, et al.
Published: (2024)

Compression Aware Certified Training
by: Xu, Changming, et al.
Published: (2025)

Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis
by: Makhija, Disha, et al.
Published: (2025)

Ensuring Equitable Financial Decisions: Leveraging Counterfactual Fairness and Deep Learning for Bias
by: Shinde, Saish
Published: (2024)

Can LLMs Reconcile Knowledge Conflicts in Counterfactual Reasoning
by: Yamin, Khurram, et al.
Published: (2025)

Certified Signed Graph Unlearning
by: Zhao, Junpeng, et al.
Published: (2025)

Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
by: Gupta, Isha, et al.
Published: (2025)

Weight Pruning Amplifies Bias: A Multi-Method Study of Compressed LLMs for Edge AI
by: Rath, Plawan Kumar, et al.
Published: (2026)

Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
by: Puri, Isha, et al.
Published: (2025)

Constraint-Anchored Attribution: Feasibility-Certified Counterfactuals and Bonferroni-PAC Sufficient Subsets for Neural CO Policies
by: Lafifi, Sohaib
Published: (2026)

Relative Counterfactual Contrastive Learning for Mitigating Pretrained Stance Bias in Stance Detection
by: Zhang, Jiarui, et al.
Published: (2024)

Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
by: Vega, Jason, et al.
Published: (2024)

SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors
by: Chaudhary, Maheep, et al.
Published: (2025)

Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer
by: Nguyen, Minh Hoang, et al.
Published: (2025)

Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
by: Wang, Ganghua, et al.
Published: (2025)

Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces
by: Kar, Avik, et al.
Published: (2024)

ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking
by: Hu, Miaobo, et al.
Published: (2026)

Cross-Input Certified Training for Universal Perturbations
by: Xu, Changming, et al.
Published: (2024)

Explaining Fine Tuned LLMs via Counterfactuals A Knowledge Graph Driven Framework
by: Wang, Yucheng, et al.
Published: (2025)

From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs
by: Mushtaq, Erum, et al.
Published: (2025)

GCFX: Generative Counterfactual Explanations for Deep Graph Models at the Model Level
by: Hu, Jinlong, et al.
Published: (2026)

Representer Theorems for Metric and Preference Learning: Geometric Insights and Algorithms
by: Morteza, Peyman
Published: (2023)

CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing
by: Wang, Zixia, et al.
Published: (2025)

Intelligent Truck Matching in Full Truckload Shipments using Ping2Hex approach
by: Ramdas, Srinivas Kumar, et al.
Published: (2026)

P2C: Path to Counterfactuals
by: Dasgupta, Sopam, et al.
Published: (2025)