Saved in:
| Main Authors: | Dewis, Zack, Sen, Apratim, Wong, Jeffrey, Zhang, Yujia |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.21163 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion
by: Zhou, Guanghao, et al.
Published: (2026)
by: Zhou, Guanghao, et al.
Published: (2026)
PHORECAST: Enabling AI Understanding of Public Health Outreach Across Populations
by: Qadri, Rifaa, et al.
Published: (2025)
by: Qadri, Rifaa, et al.
Published: (2025)
Trends in AI Supercomputers
by: Pilz, Konstantin F., et al.
Published: (2025)
by: Pilz, Konstantin F., et al.
Published: (2025)
From Complexity to Clarity: How AI Enhances Perceptions of Scientists and the Public's Understanding of Science
by: Markowitz, David M.
Published: (2024)
by: Markowitz, David M.
Published: (2024)
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
by: Nghiem, Huy, et al.
Published: (2025)
by: Nghiem, Huy, et al.
Published: (2025)
AI Safety is Stuck in Technical Terms -- A System Safety Response to the International AI Safety Report
by: Dobbe, Roel
Published: (2025)
by: Dobbe, Roel
Published: (2025)
AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies
by: Zeng, Yi, et al.
Published: (2024)
by: Zeng, Yi, et al.
Published: (2024)
Safety Cases: How to Justify the Safety of Advanced AI Systems
by: Clymer, Joshua, et al.
Published: (2024)
by: Clymer, Joshua, et al.
Published: (2024)
Safety Cases: A Scalable Approach to Frontier AI Safety
by: Hilton, Benjamin, et al.
Published: (2025)
by: Hilton, Benjamin, et al.
Published: (2025)
The Singapore Consensus on Global AI Safety Research Priorities
by: Bengio, Yoshua, et al.
Published: (2025)
by: Bengio, Yoshua, et al.
Published: (2025)
ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety
by: Lee, Michael S., et al.
Published: (2026)
by: Lee, Michael S., et al.
Published: (2026)
LLM Safety for Children
by: Rath, Prasanjit, et al.
Published: (2025)
by: Rath, Prasanjit, et al.
Published: (2025)
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
by: Zhou, Zhenhong, et al.
Published: (2024)
by: Zhou, Zhenhong, et al.
Published: (2024)
International Agreements on AI Safety: Review and Recommendations for a Conditional AI Safety Treaty
by: Scholefield, Rebecca, et al.
Published: (2025)
by: Scholefield, Rebecca, et al.
Published: (2025)
Simple Role Assignment is Extraordinarily Effective for Safety Alignment
by: Ziheng, Zhou, et al.
Published: (2026)
by: Ziheng, Zhou, et al.
Published: (2026)
Revolutionizing Pharma: Unveiling the AI and LLM Trends in the Pharmaceutical Industry
by: Han, Yu, et al.
Published: (2024)
by: Han, Yu, et al.
Published: (2024)
Trends in Frontier AI Model Count: A Forecast to 2028
by: Kumar, Iyngkarran, et al.
Published: (2025)
by: Kumar, Iyngkarran, et al.
Published: (2025)
Introduction to Artificial Consciousness: History, Current Trends and Ethical Challenges
by: Elamrani, Aïda
Published: (2025)
by: Elamrani, Aïda
Published: (2025)
LLM Agents in Law: Taxonomy, Applications, and Challenges
by: Liu, Shuang, et al.
Published: (2026)
by: Liu, Shuang, et al.
Published: (2026)
Toward an African Agenda for AI Safety
by: Segun, Samuel T., et al.
Published: (2025)
by: Segun, Samuel T., et al.
Published: (2025)
Concrete Problems in AI Safety, Revisited
by: Raji, Inioluwa Deborah, et al.
Published: (2023)
by: Raji, Inioluwa Deborah, et al.
Published: (2023)
Public Constitutional AI
by: Abiri, Gilad
Published: (2024)
by: Abiri, Gilad
Published: (2024)
Chinese Court Simulation with LLM-Based Agent System
by: Zhang, Kaiyuan, et al.
Published: (2025)
by: Zhang, Kaiyuan, et al.
Published: (2025)
AI and the Future of Digital Public Squares
by: Goldberg, Beth, et al.
Published: (2024)
by: Goldberg, Beth, et al.
Published: (2024)
CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
by: An, Heajun, et al.
Published: (2026)
by: An, Heajun, et al.
Published: (2026)
AI Safety: Necessary, but insufficient and possibly problematic
by: P, Deepak
Published: (2024)
by: P, Deepak
Published: (2024)
Emerging Practices in Frontier AI Safety Frameworks
by: Buhl, Marie Davidsen, et al.
Published: (2025)
by: Buhl, Marie Davidsen, et al.
Published: (2025)
Leveraging Social Media Analytics for Sustainability Trend Detection in Saudi Arabias Evolving Market
by: Aalijah, Kanwal
Published: (2025)
by: Aalijah, Kanwal
Published: (2025)
International Scientific Report on the Safety of Advanced AI (Interim Report)
by: Bengio, Yoshua, et al.
Published: (2024)
by: Bengio, Yoshua, et al.
Published: (2024)
Upstream and Downstream AI Safety: Both on the Same River?
by: McDermid, John, et al.
Published: (2024)
by: McDermid, John, et al.
Published: (2024)
Probabilistic Analysis of Copyright Disputes and Generative AI Safety
by: Chiba-Okabe, Hiroaki
Published: (2024)
by: Chiba-Okabe, Hiroaki
Published: (2024)
The Ghost in the Grammar: Methodological Anthropomorphism in AI Safety Evaluations
by: Costa, Mariana Lins
Published: (2026)
by: Costa, Mariana Lins
Published: (2026)
Combining Cost-Constrained Runtime Monitors for AI Safety
by: Hua, Tim Tian, et al.
Published: (2025)
by: Hua, Tim Tian, et al.
Published: (2025)
Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents
by: Li, Miles Q., et al.
Published: (2026)
by: Li, Miles Q., et al.
Published: (2026)
Agentic Microphysics: A Manifesto for Generative AI Safety
by: Pierucci, Federico, et al.
Published: (2026)
by: Pierucci, Federico, et al.
Published: (2026)
Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025)
by: Clark, Hannah-Beth, et al.
Published: (2025)
Interoperability in AI Safety Governance: Ethics, Regulations, and Standards
by: Chin, Yik Chan, et al.
Published: (2026)
by: Chin, Yik Chan, et al.
Published: (2026)
What Is AI Safety? What Do We Want It to Be?
by: Harding, Jacqueline, et al.
Published: (2025)
by: Harding, Jacqueline, et al.
Published: (2025)
RailEstate: An Interactive System for Metro Linked Property Trends
by: Chang, Chen-Wei, et al.
Published: (2025)
by: Chang, Chen-Wei, et al.
Published: (2025)
"This is not a data problem": Algorithms and Power in Public Higher Education in Canada
by: McConvey, Kelly, et al.
Published: (2024)
by: McConvey, Kelly, et al.
Published: (2024)
Similar Items
-
LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion
by: Zhou, Guanghao, et al.
Published: (2026) -
PHORECAST: Enabling AI Understanding of Public Health Outreach Across Populations
by: Qadri, Rifaa, et al.
Published: (2025) -
Trends in AI Supercomputers
by: Pilz, Konstantin F., et al.
Published: (2025) -
From Complexity to Clarity: How AI Enhances Perceptions of Scientists and the Public's Understanding of Science
by: Markowitz, David M.
Published: (2024) -
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
by: Nghiem, Huy, et al.
Published: (2025)