Saved in:
| Main Authors: | Bell, Andrew, Fonseca, Joao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.05305 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs
by: Fonseca, Joao, et al.
Published: (2025)
by: Fonseca, Joao, et al.
Published: (2025)
Revisiting Catastrophic Forgetting in Large Language Model Tuning
by: Li, Hongyu, et al.
Published: (2024)
by: Li, Hongyu, et al.
Published: (2024)
AuditWen:An Open-Source Large Language Model for Audit
by: Huang, Jiajia, et al.
Published: (2024)
by: Huang, Jiajia, et al.
Published: (2024)
The Outputs of Large Language Models are Meaningless
by: Hattiangadi, Anandi, et al.
Published: (2025)
by: Hattiangadi, Anandi, et al.
Published: (2025)
Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal
by: Huang, Jianheng, et al.
Published: (2024)
by: Huang, Jianheng, et al.
Published: (2024)
Improved Supervised Fine-Tuning for Large Language Models to Mitigate Catastrophic Forgetting
by: Ding, Fei, et al.
Published: (2025)
by: Ding, Fei, et al.
Published: (2025)
Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models
by: Berger, Armin, et al.
Published: (2025)
by: Berger, Armin, et al.
Published: (2025)
CALM: Curiosity-Driven Auditing for Large Language Models
by: Zheng, Xiang, et al.
Published: (2025)
by: Zheng, Xiang, et al.
Published: (2025)
What Is Missing: Interpretable Ratings for Large Language Model Outputs
by: Stranges, Nicholas, et al.
Published: (2026)
by: Stranges, Nicholas, et al.
Published: (2026)
Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging
by: Lyu, Mengxian, et al.
Published: (2026)
by: Lyu, Mengxian, et al.
Published: (2026)
PRISM: A Methodology for Auditing Biases in Large Language Models
by: Azzopardi, Leif, et al.
Published: (2024)
by: Azzopardi, Leif, et al.
Published: (2024)
SLOT: Structuring the Output of Large Language Models
by: Wang, Darren Yow-Bang, et al.
Published: (2025)
by: Wang, Darren Yow-Bang, et al.
Published: (2025)
MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting
by: Li, Tianhao, et al.
Published: (2024)
by: Li, Tianhao, et al.
Published: (2024)
The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models
by: Singh, Abhinav Kumar, et al.
Published: (2026)
by: Singh, Abhinav Kumar, et al.
Published: (2026)
INACIA: Integrating Large Language Models in Brazilian Audit Courts: Opportunities and Challenges
by: Pereira, Jayr, et al.
Published: (2024)
by: Pereira, Jayr, et al.
Published: (2024)
Don't Change My View: Ideological Bias Auditing in Large Language Models
by: Kröger, Paul, et al.
Published: (2025)
by: Kröger, Paul, et al.
Published: (2025)
Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models
by: Chen, Haokun, et al.
Published: (2025)
by: Chen, Haokun, et al.
Published: (2025)
Detoxification of Large Language Models through Output-layer Fusion with a Calibration Model
by: Tian, Yuanhe, et al.
Published: (2025)
by: Tian, Yuanhe, et al.
Published: (2025)
An Evaluation on Large Language Model Outputs: Discourse and Memorization
by: de Wynter, Adrian, et al.
Published: (2023)
by: de Wynter, Adrian, et al.
Published: (2023)
Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?
by: Fonseca, Marcio, et al.
Published: (2024)
by: Fonseca, Marcio, et al.
Published: (2024)
Large Human Language Models: A Need and the Challenges
by: Soni, Nikita, et al.
Published: (2023)
by: Soni, Nikita, et al.
Published: (2023)
Preventing Catastrophic Forgetting: Behavior-Aware Sampling for Safer Language Model Fine-Tuning
by: Pham, Anh, et al.
Published: (2025)
by: Pham, Anh, et al.
Published: (2025)
Information Suppression in Large Language Models: Auditing, Quantifying, and Characterizing Censorship in DeepSeek
by: Qiu, Peiran, et al.
Published: (2025)
by: Qiu, Peiran, et al.
Published: (2025)
SoK: Large Language Model Copyright Auditing via Fingerprinting
by: Shao, Shuo, et al.
Published: (2025)
by: Shao, Shuo, et al.
Published: (2025)
StruEdit: Structured Outputs Enable the Fast and Accurate Knowledge Editing for Large Language Models
by: Bi, Baolong, et al.
Published: (2024)
by: Bi, Baolong, et al.
Published: (2024)
Convergence of Outputs When Two Large Language Models Interact in a Multi-Agentic Setup
by: Maiti, Aniruddha, et al.
Published: (2025)
by: Maiti, Aniruddha, et al.
Published: (2025)
Mind the Gap: Conformative Decoding to Improve Output Diversity of Instruction-Tuned Large Language Models
by: Peeperkorn, Max, et al.
Published: (2025)
by: Peeperkorn, Max, et al.
Published: (2025)
Can Large Language Models Follow Concept Annotation Guidelines? A Case Study on Scientific and Financial Domains
by: Fonseca, Marcio, et al.
Published: (2023)
by: Fonseca, Marcio, et al.
Published: (2023)
Group-Aware Reinforcement Learning for Output Diversity in Large Language Models
by: Anschel, Oron, et al.
Published: (2025)
by: Anschel, Oron, et al.
Published: (2025)
Challenges and Responses in the Practice of Large Language Models
by: Zhu, Hongyin
Published: (2024)
by: Zhu, Hongyin
Published: (2024)
Concept-Level Explainability for Auditing & Steering LLM Responses
by: Amara, Kenza, et al.
Published: (2025)
by: Amara, Kenza, et al.
Published: (2025)
Political Alignment in Large Language Models: A Multidimensional Audit of Psychometric Identity and Behavioral Bias
by: Sakhawat, Adib, et al.
Published: (2026)
by: Sakhawat, Adib, et al.
Published: (2026)
Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Model
by: Wang, Siyin, et al.
Published: (2024)
by: Wang, Siyin, et al.
Published: (2024)
DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation
by: Sun, Jiashuo, et al.
Published: (2025)
by: Sun, Jiashuo, et al.
Published: (2025)
Phase Transitions in the Output Distribution of Large Language Models
by: Arnold, Julian, et al.
Published: (2024)
by: Arnold, Julian, et al.
Published: (2024)
How Independent are Large Language Models? A Statistical Framework for Auditing Behavioral Entanglement and Reweighting Verifier Ensembles
by: Kuai, Chenchen, et al.
Published: (2026)
by: Kuai, Chenchen, et al.
Published: (2026)
Large Language Models Produce Responses Perceived to be Empathic
by: Lee, Yoon Kyung, et al.
Published: (2024)
by: Lee, Yoon Kyung, et al.
Published: (2024)
Enhancing Human-Like Responses in Large Language Models
by: Çalık, Ethem Yağız, et al.
Published: (2025)
by: Çalık, Ethem Yağız, et al.
Published: (2025)
GlórIA -- A Generative and Open Large Language Model for Portuguese
by: Lopes, Ricardo, et al.
Published: (2024)
by: Lopes, Ricardo, et al.
Published: (2024)
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities
by: Du, Dong, et al.
Published: (2025)
by: Du, Dong, et al.
Published: (2025)
Similar Items
-
Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs
by: Fonseca, Joao, et al.
Published: (2025) -
Revisiting Catastrophic Forgetting in Large Language Model Tuning
by: Li, Hongyu, et al.
Published: (2024) -
AuditWen:An Open-Source Large Language Model for Audit
by: Huang, Jiajia, et al.
Published: (2024) -
The Outputs of Large Language Models are Meaningless
by: Hattiangadi, Anandi, et al.
Published: (2025) -
Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal
by: Huang, Jianheng, et al.
Published: (2024)