Guardado en:
| Autores principales: | Johns, Sydney, Jin, Heng, Zhang, Chaoyu, Hou, Y. Thomas, Lou, Wenjing |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2605.00245 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Enabling Trustworthy Federated Learning via Remote Attestation for Mitigating Byzantine Threats
por: Zhang, Chaoyu, et al.
Publicado: (2025)
por: Zhang, Chaoyu, et al.
Publicado: (2025)
ProFLingo: A Fingerprinting-based Intellectual Property Protection Scheme for Large Language Models
por: Jin, Heng, et al.
Publicado: (2024)
por: Jin, Heng, et al.
Publicado: (2024)
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models
por: Zhang, Wenjing, et al.
Publicado: (2024)
por: Zhang, Wenjing, et al.
Publicado: (2024)
LongSafety: Evaluating Long-Context Safety of Large Language Models
por: Lu, Yida, et al.
Publicado: (2025)
por: Lu, Yida, et al.
Publicado: (2025)
When Context Flips, Safety Breaks: Diagnosing Brittle Safety in Aligned Language Models
por: Choi, Dasol, et al.
Publicado: (2026)
por: Choi, Dasol, et al.
Publicado: (2026)
Quantum-Cognitive Tunnelling Neural Networks for Military-Civilian Vehicle Classification and Sentiment Analysis
por: Maksimovic, Milan, et al.
Publicado: (2025)
por: Maksimovic, Milan, et al.
Publicado: (2025)
On the Military Applications of Large Language Models
por: Johansson, Satu, et al.
Publicado: (2025)
por: Johansson, Satu, et al.
Publicado: (2025)
Safety Evaluation of DeepSeek Models in Chinese Contexts
por: Zhang, Wenjing, et al.
Publicado: (2025)
por: Zhang, Wenjing, et al.
Publicado: (2025)
Beyond Output Correctness: Benchmarking and Evaluating Large Language Model Reasoning in Coding Tasks
por: Li, Yuangang, et al.
Publicado: (2026)
por: Li, Yuangang, et al.
Publicado: (2026)
SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond
por: Zhu, Xiangyang, et al.
Publicado: (2026)
por: Zhu, Xiangyang, et al.
Publicado: (2026)
SPRI: Aligning Large Language Models with Context-Situated Principles
por: Zhan, Hongli, et al.
Publicado: (2025)
por: Zhan, Hongli, et al.
Publicado: (2025)
Measuring and Eliminating Refusals in Military Large Language Models
por: FitzGerald, Jack, et al.
Publicado: (2026)
por: FitzGerald, Jack, et al.
Publicado: (2026)
Safety Layers in Aligned Large Language Models: The Key to LLM Security
por: Li, Shen, et al.
Publicado: (2024)
por: Li, Shen, et al.
Publicado: (2024)
TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine
por: Yue, Wenjing, et al.
Publicado: (2024)
por: Yue, Wenjing, et al.
Publicado: (2024)
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
por: Wu, Zhanglin, et al.
Publicado: (2025)
por: Wu, Zhanglin, et al.
Publicado: (2025)
JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese Large Language Models
por: Liu, Junyu, et al.
Publicado: (2026)
por: Liu, Junyu, et al.
Publicado: (2026)
Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving
por: Chen, Andong, et al.
Publicado: (2024)
por: Chen, Andong, et al.
Publicado: (2024)
ARMOR: Empowering Multimodal Understanding Model with Interleaved Multimodal Generation Capability
por: Sun, Jianwen, et al.
Publicado: (2025)
por: Sun, Jianwen, et al.
Publicado: (2025)
USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models
por: Zheng, Baolin, et al.
Publicado: (2025)
por: Zheng, Baolin, et al.
Publicado: (2025)
MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers
por: Zong, Xuanjun, et al.
Publicado: (2025)
por: Zong, Xuanjun, et al.
Publicado: (2025)
ARMOR: Shielding Unlearnable Examples against Data Augmentation
por: Gong, Xueluan, et al.
Publicado: (2025)
por: Gong, Xueluan, et al.
Publicado: (2025)
SeCoKD: Aligning Large Language Models for In-Context Learning with Fewer Shots
por: Wang, Weixing, et al.
Publicado: (2024)
por: Wang, Weixing, et al.
Publicado: (2024)
Robustifying Safety-Aligned Large Language Models through Clean Data Curation
por: Liu, Xiaoqun, et al.
Publicado: (2024)
por: Liu, Xiaoqun, et al.
Publicado: (2024)
Safety Evaluation and Enhancement of DeepSeek Models in Chinese Contexts
por: Zhang, Wenjing, et al.
Publicado: (2025)
por: Zhang, Wenjing, et al.
Publicado: (2025)
Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context
por: Zhang, Zhihao, et al.
Publicado: (2026)
por: Zhang, Zhihao, et al.
Publicado: (2026)
Towards Context-Invariant Safety Alignment for Large Language Models
por: Wang, Yixu, et al.
Publicado: (2026)
por: Wang, Yixu, et al.
Publicado: (2026)
A Critical Evaluation of AI Feedback for Aligning Large Language Models
por: Sharma, Archit, et al.
Publicado: (2024)
por: Sharma, Archit, et al.
Publicado: (2024)
AlignBench: Benchmarking Chinese Alignment of Large Language Models
por: Liu, Xiao, et al.
Publicado: (2023)
por: Liu, Xiao, et al.
Publicado: (2023)
Enterprise Large Language Model Evaluation Benchmark
por: Wang, Liya, et al.
Publicado: (2025)
por: Wang, Liya, et al.
Publicado: (2025)
MSDiagnosis: A Benchmark for Evaluating Large Language Models in Multi-Step Clinical Diagnosis
por: Hou, Ruihui, et al.
Publicado: (2024)
por: Hou, Ruihui, et al.
Publicado: (2024)
MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models
por: Han, Tessa, et al.
Publicado: (2024)
por: Han, Tessa, et al.
Publicado: (2024)
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
por: Xiao, Jiancong, et al.
Publicado: (2025)
por: Xiao, Jiancong, et al.
Publicado: (2025)
CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models
por: Zhou, Guanghao, et al.
Publicado: (2025)
por: Zhou, Guanghao, et al.
Publicado: (2025)
Context-DPO: Aligning Language Models for Context-Faithfulness
por: Bi, Baolong, et al.
Publicado: (2024)
por: Bi, Baolong, et al.
Publicado: (2024)
WARBENCH: A Comprehensive Benchmark for Evaluating LLMs in Military Decision-Making
por: Li, Zongjie, et al.
Publicado: (2026)
por: Li, Zongjie, et al.
Publicado: (2026)
Invasive Context Engineering to Control Large Language Models
por: Rivasseau, Thomas
Publicado: (2025)
por: Rivasseau, Thomas
Publicado: (2025)
Beyond Labels: Aligning Large Language Models with Human-like Reasoning
por: Kabir, Muhammad Rafsan, et al.
Publicado: (2024)
por: Kabir, Muhammad Rafsan, et al.
Publicado: (2024)
Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models
por: Chen, Kejia, et al.
Publicado: (2025)
por: Chen, Kejia, et al.
Publicado: (2025)
Steering Multimodal Large Language Models Decoding for Context-Aware Safety
por: Liu, Zheyuan, et al.
Publicado: (2025)
por: Liu, Zheyuan, et al.
Publicado: (2025)
Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization
por: Liu, Zhihao, et al.
Publicado: (2026)
por: Liu, Zhihao, et al.
Publicado: (2026)
Ejemplares similares
-
Enabling Trustworthy Federated Learning via Remote Attestation for Mitigating Byzantine Threats
por: Zhang, Chaoyu, et al.
Publicado: (2025) -
ProFLingo: A Fingerprinting-based Intellectual Property Protection Scheme for Large Language Models
por: Jin, Heng, et al.
Publicado: (2024) -
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models
por: Zhang, Wenjing, et al.
Publicado: (2024) -
LongSafety: Evaluating Long-Context Safety of Large Language Models
por: Lu, Yida, et al.
Publicado: (2025) -
When Context Flips, Safety Breaks: Diagnosing Brittle Safety in Aligned Language Models
por: Choi, Dasol, et al.
Publicado: (2026)