:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dewis, Zack, Sen, Apratim, Wong, Jeffrey, Zhang, Yujia
Format:	Preprint
Published:	2024
Subjects:	Computers and Society Artificial Intelligence
Online Access:	https://arxiv.org/abs/2407.21163
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion
by: Zhou, Guanghao, et al.
Published: (2026)

PHORECAST: Enabling AI Understanding of Public Health Outreach Across Populations
by: Qadri, Rifaa, et al.
Published: (2025)

Trends in AI Supercomputers
by: Pilz, Konstantin F., et al.
Published: (2025)

From Complexity to Clarity: How AI Enhances Perceptions of Scientists and the Public's Understanding of Science
by: Markowitz, David M.
Published: (2024)

Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
by: Nghiem, Huy, et al.
Published: (2025)

AI Safety is Stuck in Technical Terms -- A System Safety Response to the International AI Safety Report
by: Dobbe, Roel
Published: (2025)

AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies
by: Zeng, Yi, et al.
Published: (2024)

Safety Cases: How to Justify the Safety of Advanced AI Systems
by: Clymer, Joshua, et al.
Published: (2024)

Safety Cases: A Scalable Approach to Frontier AI Safety
by: Hilton, Benjamin, et al.
Published: (2025)

The Singapore Consensus on Global AI Safety Research Priorities
by: Bengio, Yoshua, et al.
Published: (2025)

ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety
by: Lee, Michael S., et al.
Published: (2026)

LLM Safety for Children
by: Rath, Prasanjit, et al.
Published: (2025)

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
by: Zhou, Zhenhong, et al.
Published: (2024)

International Agreements on AI Safety: Review and Recommendations for a Conditional AI Safety Treaty
by: Scholefield, Rebecca, et al.
Published: (2025)

Simple Role Assignment is Extraordinarily Effective for Safety Alignment
by: Ziheng, Zhou, et al.
Published: (2026)

Revolutionizing Pharma: Unveiling the AI and LLM Trends in the Pharmaceutical Industry
by: Han, Yu, et al.
Published: (2024)

Trends in Frontier AI Model Count: A Forecast to 2028
by: Kumar, Iyngkarran, et al.
Published: (2025)

Introduction to Artificial Consciousness: History, Current Trends and Ethical Challenges
by: Elamrani, Aïda
Published: (2025)

LLM Agents in Law: Taxonomy, Applications, and Challenges
by: Liu, Shuang, et al.
Published: (2026)

Toward an African Agenda for AI Safety
by: Segun, Samuel T., et al.
Published: (2025)

Concrete Problems in AI Safety, Revisited
by: Raji, Inioluwa Deborah, et al.
Published: (2023)

Public Constitutional AI
by: Abiri, Gilad
Published: (2024)

Chinese Court Simulation with LLM-Based Agent System
by: Zhang, Kaiyuan, et al.
Published: (2025)

AI and the Future of Digital Public Squares
by: Goldberg, Beth, et al.
Published: (2024)

CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
by: An, Heajun, et al.
Published: (2026)

AI Safety: Necessary, but insufficient and possibly problematic
by: P, Deepak
Published: (2024)

Emerging Practices in Frontier AI Safety Frameworks
by: Buhl, Marie Davidsen, et al.
Published: (2025)

Leveraging Social Media Analytics for Sustainability Trend Detection in Saudi Arabias Evolving Market
by: Aalijah, Kanwal
Published: (2025)

International Scientific Report on the Safety of Advanced AI (Interim Report)
by: Bengio, Yoshua, et al.
Published: (2024)

Upstream and Downstream AI Safety: Both on the Same River?
by: McDermid, John, et al.
Published: (2024)

Probabilistic Analysis of Copyright Disputes and Generative AI Safety
by: Chiba-Okabe, Hiroaki
Published: (2024)

The Ghost in the Grammar: Methodological Anthropomorphism in AI Safety Evaluations
by: Costa, Mariana Lins
Published: (2026)

Combining Cost-Constrained Runtime Monitors for AI Safety
by: Hua, Tim Tian, et al.
Published: (2025)

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents
by: Li, Miles Q., et al.
Published: (2026)

Agentic Microphysics: A Manifesto for Generative AI Safety
by: Pierucci, Federico, et al.
Published: (2026)

Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025)

Interoperability in AI Safety Governance: Ethics, Regulations, and Standards
by: Chin, Yik Chan, et al.
Published: (2026)

What Is AI Safety? What Do We Want It to Be?
by: Harding, Jacqueline, et al.
Published: (2025)

RailEstate: An Interactive System for Metro Linked Property Trends
by: Chang, Chen-Wei, et al.
Published: (2025)

"This is not a data problem": Algorithms and Power in Public Higher Education in Canada
by: McConvey, Kelly, et al.
Published: (2024)