Saved in:
Bibliographic Details
Main Authors: Raza, Shaina, Qureshi, Rizwan, Zahid, Anam, Muneer, Amgad, Zafar, Anas, Kamawal, Safiullah, Sadak, Ferhat, Fioresi, Joseph, Saeed, Muhammaed, Sapkota, Ranjan, Jain, Aditya, Hassan, Muneeb Ul, Zafar, Aizan, Maqbool, Hasan, Vayani, Ashmal, Wu, Jia, Shoman, Maged
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.08650
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Generative AI is rapidly moving from research to deployment, elevating the need for responsible development, evaluation, and governance. We conduct a PRISMA guided review of 232 studies (November 2022 - December 2025), spanning large language models, vision language models, diffusion models, and agentic pipelines. We make four contributions: (1) the first survey bridging governance principles, technical evaluation, and domain deployment across all four system types; (2) a ten-criterion rubric (C1-C10) scoring major AI safety benchmarks on risk-surface coverage, paired with a policy crosswalk mapping benchmarks to regulatory requirements; (3) twelve lifecycle KPIs, explainability guidance for foundation models, and a testbed catalogue; and (4) domain-specific analysis across healthcare, finance, education, arts, agriculture, and defense. Three findings emerge: benchmark coverage is dense for bias and toxicity but sparse for privacy, provenance, deepfakes, and system-level failures in agentic settings; evaluations remain largely static and task local, limiting audit portability; and inconsistent documentation complicates cross-release comparison. We outline a research agenda prioritizing adaptive multimodal evaluation, privacy and provenance testing, deepfake risk assessment, calibration reporting, versioned artifacts, and continuous monitoring. This survey offers a structured path to align generative AI evaluation with governance needs for safe and accountable deployment.