Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Han, Shanshan, Avestimehr, Salman, He, Chaoyang
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.08142
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909489284775936
author	Han, Shanshan Avestimehr, Salman He, Chaoyang
author_facet	Han, Shanshan Avestimehr, Salman He, Chaoyang
contents	We present Wildflare GuardRail, a guardrail pipeline designed to enhance the safety and reliability of Large Language Model (LLM) inferences by systematically addressing risks across the entire processing workflow. Wildflare GuardRail integrates several core functional modules, including Safety Detector that identifies unsafe inputs and detects hallucinations in model outputs while generating root-cause explanations, Grounding that contextualizes user queries with information retrieved from vector databases, Customizer that adjusts outputs in real time using lightweight, rule-based wrappers, and Repairer that corrects erroneous LLM outputs using hallucination explanations provided by Safety Detector. Results show that our unsafe content detection model in Safety Detector achieves comparable performance with OpenAI API, though trained on a small dataset constructed with several public datasets. Meanwhile, the lightweight wrappers can address malicious URLs in model outputs in 1.06s per query with 100% accuracy without costly model calls. Moreover, the hallucination fixing model demonstrates effectiveness in reducing hallucinations with an accuracy of 80.7%.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_08142
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences Han, Shanshan Avestimehr, Salman He, Chaoyang Artificial Intelligence We present Wildflare GuardRail, a guardrail pipeline designed to enhance the safety and reliability of Large Language Model (LLM) inferences by systematically addressing risks across the entire processing workflow. Wildflare GuardRail integrates several core functional modules, including Safety Detector that identifies unsafe inputs and detects hallucinations in model outputs while generating root-cause explanations, Grounding that contextualizes user queries with information retrieved from vector databases, Customizer that adjusts outputs in real time using lightweight, rule-based wrappers, and Repairer that corrects erroneous LLM outputs using hallucination explanations provided by Safety Detector. Results show that our unsafe content detection model in Safety Detector achieves comparable performance with OpenAI API, though trained on a small dataset constructed with several public datasets. Meanwhile, the lightweight wrappers can address malicious URLs in model outputs in 1.06s per query with 100% accuracy without costly model calls. Moreover, the hallucination fixing model demonstrates effectiveness in reducing hallucinations with an accuracy of 80.7%.
title	Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
topic	Artificial Intelligence
url	https://arxiv.org/abs/2502.08142

Similar Items