Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Raza, Shaina, Qureshi, Rizwan, Zahid, Anam, Muneer, Amgad, Zafar, Anas, Kamawal, Safiullah, Sadak, Ferhat, Fioresi, Joseph, Saeed, Muhammaed, Sapkota, Ranjan, Jain, Aditya, Hassan, Muneeb Ul, Zafar, Aizan, Maqbool, Hasan, Vayani, Ashmal, Wu, Jia, Shoman, Maged
Format:	Preprint
Published:	2025
Subjects:	Computers and Society
Online Access:	https://arxiv.org/abs/2502.08650
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Generative AI is rapidly moving from research to deployment, elevating the need for responsible development, evaluation, and governance. We conduct a PRISMA guided review of 232 studies (November 2022 - December 2025), spanning large language models, vision language models, diffusion models, and agentic pipelines. We make four contributions: (1) the first survey bridging governance principles, technical evaluation, and domain deployment across all four system types; (2) a ten-criterion rubric (C1-C10) scoring major AI safety benchmarks on risk-surface coverage, paired with a policy crosswalk mapping benchmarks to regulatory requirements; (3) twelve lifecycle KPIs, explainability guidance for foundation models, and a testbed catalogue; and (4) domain-specific analysis across healthcare, finance, education, arts, agriculture, and defense. Three findings emerge: benchmark coverage is dense for bias and toxicity but sparse for privacy, provenance, deepfakes, and system-level failures in agentic settings; evaluations remain largely static and task local, limiting audit portability; and inconsistent documentation complicates cross-release comparison. We outline a research agenda prioritizing adaptive multimodal evaluation, privacy and provenance testing, deepfake risk assessment, calibration reporting, versioned artifacts, and continuous monitoring. This survey offers a structured path to align generative AI evaluation with governance needs for safe and accountable deployment.

Similar Items