:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	You, Doohee, Chon, Dan
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2412.02113
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Evaluating Deduplication Techniques for Economic Research Paper Titles with a Focus on Semantic Similarity using NLP and LLMs
by: You, Doohee, et al.
Published: (2024)

Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense
by: Shen, Guobin, et al.
Published: (2025)

Surviving the Unseen: Predictive Defense for Novel Multi-Turn Multimodal Attacks
by: You, Doohee
Published: (2026)

Adjudicator: Correcting Noisy Labels with a KG-Informed Council of LLM Agents
by: You, Doohee, et al.
Published: (2025)

Building Trust: Foundations of Security, Safety and Transparency in AI
by: Sidhpurwala, Huzaifa, et al.
Published: (2024)

Who Do LLMs Trust? Human Experts Matter More Than Other LLMs
by: Bajaj, Anooshka, et al.
Published: (2026)

Promoting Online Safety by Simulating Unsafe Conversations with LLMs
by: Hoffman, Owen, et al.
Published: (2025)

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
by: Hong, Junyuan, et al.
Published: (2024)

Understanding and Preserving Safety in Fine-Tuned LLMs
by: Zhang, Jiawen, et al.
Published: (2026)

From Logic to Language: A Trust Index for Problem Solving with LLMs
by: Rug, Tehseen, et al.
Published: (2025)

Mapping the Trust Terrain: LLMs in Software Engineering -- Insights and Perspectives
by: Khati, Dipin, et al.
Published: (2025)

LongSafety: Enhance Safety for Long-Context LLMs
by: Huang, Mianqiu, et al.
Published: (2024)

AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use
by: Yang, Chenglin
Published: (2026)

Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine
by: Yang, Yifan, et al.
Published: (2024)

SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering
by: Cao, Zouying, et al.
Published: (2024)

Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs
by: Brown, Nik Bear
Published: (2024)

ClawSafety: "Safe" LLMs, Unsafe Agents
by: Wei, Bowen, et al.
Published: (2026)

Relationship-Aware Safety Unlearning for Multimodal LLMs
by: Anilkumar, Vishnu Narayanan, et al.
Published: (2026)

Classifier-free guidance in LLMs Safety
by: Smirnov, Roman
Published: (2024)

FESTA: Functionally Equivalent Sampling for Trust Assessment of Multimodal LLMs
by: Bhattacharya, Debarpan, et al.
Published: (2025)

Enhancing Trust and Safety in Digital Payments: An LLM-Powered Approach
by: Dahiphale, Devendra, et al.
Published: (2024)

Permissive Safety Through Trusted Inference: Verifiable Belief-Space Neural Safety Filters for Assured Interactive Robotics
by: Hu, Haimin
Published: (2026)

Building Trust in the Skies: A Knowledge-Grounded LLM-based Framework for Aviation Safety
by: Iyengar, Anirudh, et al.
Published: (2026)

Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments
by: Schnabl, Christoph, et al.
Published: (2025)

NomicLaw: Emergent Trust and Strategic Argumentation in LLMs During Collaborative Law-Making
by: Hota, Asutosh, et al.
Published: (2025)

Know When to Trust the Skill: Delayed Appraisal and Epistemic Vigilance for Single-Agent LLMs
by: Unlu, Eren
Published: (2026)

Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders
by: Zhu, Xiaofeng, et al.
Published: (2024)

Decomposed Trust: Privacy, Adversarial Robustness, Ethics, and Fairness in Low-Rank LLMs
by: Asante, Daniel Agyei, et al.
Published: (2025)

Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
by: Brehme, Lorenz, et al.
Published: (2025)

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs
by: Wang, Yifei, et al.
Published: (2026)

SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs
by: Siu, Vincent, et al.
Published: (2025)

SAGE-Eval: Evaluating LLMs for Systematic Generalizations of Safety Facts
by: Yueh-Han, Chen, et al.
Published: (2025)

TrustGLM: Evaluating the Robustness of GraphLLMs Against Prompt, Text, and Structure Attacks
by: Zhang, Qihai, et al.
Published: (2025)

Benchmarking Source-Sensitive Reasoning in Turkish: Humans and LLMs under Evidential Trust Manipulation
by: Karakaş, Sercan, et al.
Published: (2026)

Efficient Safety Retrofitting Against Jailbreaking for LLMs
by: Garcia-Gasulla, Dario, et al.
Published: (2025)

JT-Safe: Intrinsically Enhancing the Safety and Trustworthiness of LLMs
by: Feng, Junlan, et al.
Published: (2025)

Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs
by: Ferrand, Jean-Charles Noirot, et al.
Published: (2025)

Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts
by: Chen, Hongyu, et al.
Published: (2025)

Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs
by: Cho, Dongkyu Derek, et al.
Published: (2025)

Trust the PRoC3S: Solving Long-Horizon Robotics Problems with LLMs and Constraint Satisfaction
by: Curtis, Aidan, et al.
Published: (2024)