Saved in:
| Main Authors: | Chaudhary, Isha, Hu, Qian, Kumar, Manoj, Ziyadi, Morteza, Gupta, Rahul, Singh, Gagandeep |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.18780 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Certifying Knowledge Comprehension in LLMs
by: Chaudhary, Isha, et al.
Published: (2024)
by: Chaudhary, Isha, et al.
Published: (2024)
How Catastrophic is Your LLM? Certifying Risk in Conversation
by: Wang, Chengxiao, et al.
Published: (2025)
by: Wang, Chengxiao, et al.
Published: (2025)
Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
by: Vega, Jason, et al.
Published: (2023)
by: Vega, Jason, et al.
Published: (2023)
Revealing Interpretable Failure Modes of VLMs
by: Chaudhary, Isha, et al.
Published: (2026)
by: Chaudhary, Isha, et al.
Published: (2026)
C-MORAL: Controllable Multi-Objective Molecular Optimization with Reinforcement Alignment for LLMs
by: Gao, Rui, et al.
Published: (2026)
by: Gao, Rui, et al.
Published: (2026)
Formal Synthesis of Certifiably Robust Neural Lyapunov-Barrier Certificates
by: Wang, Chengxiao, et al.
Published: (2026)
by: Wang, Chengxiao, et al.
Published: (2026)
Quantitative Certification of Agentic Tool Selection
by: Yeon, Jehyeok, et al.
Published: (2025)
by: Yeon, Jehyeok, et al.
Published: (2025)
Specification Generation for Neural Networks in Systems
by: Chaudhary, Isha, et al.
Published: (2024)
by: Chaudhary, Isha, et al.
Published: (2024)
Data Shifts Hurt CoT: A Theoretical Study
by: Yin, Lang, et al.
Published: (2025)
by: Yin, Lang, et al.
Published: (2025)
Probabilistic Trust Intervals for Out of Distribution Detection
by: Singh, Gagandeep, et al.
Published: (2021)
by: Singh, Gagandeep, et al.
Published: (2021)
BeyondBench: Contamination-Resistant Evaluation of Reasoning in Language Models
by: Srivastava, Gaurav, et al.
Published: (2025)
by: Srivastava, Gaurav, et al.
Published: (2025)
Explainable AML Triage with LLMs: Evidence Retrieval and Counterfactual Checks
by: Torres, Dorothy, et al.
Published: (2026)
by: Torres, Dorothy, et al.
Published: (2026)
COMET: Neural Cost Model Explanation Framework
by: Chaudhary, Isha, et al.
Published: (2023)
by: Chaudhary, Isha, et al.
Published: (2023)
Mitigating Gender Bias in Depression Detection via Counterfactual Inference
by: Hu, Mingxuan, et al.
Published: (2025)
by: Hu, Mingxuan, et al.
Published: (2025)
Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels
by: Rath, Plawan Kumar, et al.
Published: (2026)
by: Rath, Plawan Kumar, et al.
Published: (2026)
Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning
by: Xu, Yinglun, et al.
Published: (2024)
by: Xu, Yinglun, et al.
Published: (2024)
Compression Aware Certified Training
by: Xu, Changming, et al.
Published: (2025)
by: Xu, Changming, et al.
Published: (2025)
Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis
by: Makhija, Disha, et al.
Published: (2025)
by: Makhija, Disha, et al.
Published: (2025)
Ensuring Equitable Financial Decisions: Leveraging Counterfactual Fairness and Deep Learning for Bias
by: Shinde, Saish
Published: (2024)
by: Shinde, Saish
Published: (2024)
Can LLMs Reconcile Knowledge Conflicts in Counterfactual Reasoning
by: Yamin, Khurram, et al.
Published: (2025)
by: Yamin, Khurram, et al.
Published: (2025)
Certified Signed Graph Unlearning
by: Zhao, Junpeng, et al.
Published: (2025)
by: Zhao, Junpeng, et al.
Published: (2025)
Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
by: Gupta, Isha, et al.
Published: (2025)
by: Gupta, Isha, et al.
Published: (2025)
Weight Pruning Amplifies Bias: A Multi-Method Study of Compressed LLMs for Edge AI
by: Rath, Plawan Kumar, et al.
Published: (2026)
by: Rath, Plawan Kumar, et al.
Published: (2026)
Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
by: Puri, Isha, et al.
Published: (2025)
by: Puri, Isha, et al.
Published: (2025)
Constraint-Anchored Attribution: Feasibility-Certified Counterfactuals and Bonferroni-PAC Sufficient Subsets for Neural CO Policies
by: Lafifi, Sohaib
Published: (2026)
by: Lafifi, Sohaib
Published: (2026)
Relative Counterfactual Contrastive Learning for Mitigating Pretrained Stance Bias in Stance Detection
by: Zhang, Jiarui, et al.
Published: (2024)
by: Zhang, Jiarui, et al.
Published: (2024)
Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
by: Vega, Jason, et al.
Published: (2024)
by: Vega, Jason, et al.
Published: (2024)
SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors
by: Chaudhary, Maheep, et al.
Published: (2025)
by: Chaudhary, Maheep, et al.
Published: (2025)
Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer
by: Nguyen, Minh Hoang, et al.
Published: (2025)
by: Nguyen, Minh Hoang, et al.
Published: (2025)
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
by: Wang, Ganghua, et al.
Published: (2025)
by: Wang, Ganghua, et al.
Published: (2025)
Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces
by: Kar, Avik, et al.
Published: (2024)
by: Kar, Avik, et al.
Published: (2024)
ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking
by: Hu, Miaobo, et al.
Published: (2026)
by: Hu, Miaobo, et al.
Published: (2026)
Cross-Input Certified Training for Universal Perturbations
by: Xu, Changming, et al.
Published: (2024)
by: Xu, Changming, et al.
Published: (2024)
Explaining Fine Tuned LLMs via Counterfactuals A Knowledge Graph Driven Framework
by: Wang, Yucheng, et al.
Published: (2025)
by: Wang, Yucheng, et al.
Published: (2025)
From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs
by: Mushtaq, Erum, et al.
Published: (2025)
by: Mushtaq, Erum, et al.
Published: (2025)
GCFX: Generative Counterfactual Explanations for Deep Graph Models at the Model Level
by: Hu, Jinlong, et al.
Published: (2026)
by: Hu, Jinlong, et al.
Published: (2026)
Representer Theorems for Metric and Preference Learning: Geometric Insights and Algorithms
by: Morteza, Peyman
Published: (2023)
by: Morteza, Peyman
Published: (2023)
CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing
by: Wang, Zixia, et al.
Published: (2025)
by: Wang, Zixia, et al.
Published: (2025)
Intelligent Truck Matching in Full Truckload Shipments using Ping2Hex approach
by: Ramdas, Srinivas Kumar, et al.
Published: (2026)
by: Ramdas, Srinivas Kumar, et al.
Published: (2026)
P2C: Path to Counterfactuals
by: Dasgupta, Sopam, et al.
Published: (2025)
by: Dasgupta, Sopam, et al.
Published: (2025)
Similar Items
-
Certifying Knowledge Comprehension in LLMs
by: Chaudhary, Isha, et al.
Published: (2024) -
How Catastrophic is Your LLM? Certifying Risk in Conversation
by: Wang, Chengxiao, et al.
Published: (2025) -
Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
by: Vega, Jason, et al.
Published: (2023) -
Revealing Interpretable Failure Modes of VLMs
by: Chaudhary, Isha, et al.
Published: (2026) -
C-MORAL: Controllable Multi-Objective Molecular Optimization with Reinforcement Alignment for LLMs
by: Gao, Rui, et al.
Published: (2026)