Saved in:
| Main Authors: | Yuan, Chenchen, Ma, Bolei, Zhang, Zheyu, Prenkaj, Bardh, Kreuter, Frauke, Kasneci, Gjergji |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.08634 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models
by: Yuan, Chenchen, et al.
Published: (2025)
by: Yuan, Chenchen, et al.
Published: (2025)
From Confidence to Collapse in LLM Factual Robustness
by: Fastowski, Alina, et al.
Published: (2025)
by: Fastowski, Alina, et al.
Published: (2025)
Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs
by: Fastowski, Alina, et al.
Published: (2025)
by: Fastowski, Alina, et al.
Published: (2025)
CURE: Controlled Unlearning for Robust Embeddings -- Mitigating Conceptual Shortcuts in Pre-Trained Language Models
by: Kocak, Aysenur, et al.
Published: (2025)
by: Kocak, Aysenur, et al.
Published: (2025)
Where Paths Split: Localized, Calibrated Control of Moral Reasoning in Large Language Models
by: Yuan, Chenchen, et al.
Published: (2026)
by: Yuan, Chenchen, et al.
Published: (2026)
Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs
by: Yang, Shuo, et al.
Published: (2025)
by: Yang, Shuo, et al.
Published: (2025)
Active Tabular Augmentation via Policy-Guided Diffusion Inpainting
by: Zhang, Zheyu, et al.
Published: (2026)
by: Zhang, Zheyu, et al.
Published: (2026)
Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models
by: Zhang, Zheyu, et al.
Published: (2025)
by: Zhang, Zheyu, et al.
Published: (2025)
RAZOR: Sharpening Knowledge by Cutting Bias with Unsupervised Text Rewriting
by: Yang, Shuo, et al.
Published: (2024)
by: Yang, Shuo, et al.
Published: (2024)
Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents
by: Kirchhof, Michael, et al.
Published: (2025)
by: Kirchhof, Michael, et al.
Published: (2025)
Analysing the Safety Pitfalls of Steering Vectors
by: Li, Yuxiao, et al.
Published: (2026)
by: Li, Yuxiao, et al.
Published: (2026)
SAGE: Sparse Adaptive Guidance for Dependency-Aware Tabular Data Generation
by: Yang, Shuo, et al.
Published: (2026)
by: Yang, Shuo, et al.
Published: (2026)
Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization
by: Yang, Shuo, et al.
Published: (2024)
by: Yang, Shuo, et al.
Published: (2024)
MoralBench: Moral Evaluation of LLMs
by: Ji, Jianchao, et al.
Published: (2024)
by: Ji, Jianchao, et al.
Published: (2024)
SCISSOR: Mitigating Semantic Bias through Cluster-Aware Siamese Networks for Robust Classification
by: Yang, Shuo, et al.
Published: (2025)
by: Yang, Shuo, et al.
Published: (2025)
Reinforcement Unlearning via Group Relative Policy Optimization
by: Zaradoukas, Efstratios, et al.
Published: (2026)
by: Zaradoukas, Efstratios, et al.
Published: (2026)
Graph Inverse Style Transfer for Counterfactual Explainability
by: Prenkaj, Bardh, et al.
Published: (2025)
by: Prenkaj, Bardh, et al.
Published: (2025)
Emergent Abilities in Large Language Models: A Survey
by: Berti, Leonardo, et al.
Published: (2025)
by: Berti, Leonardo, et al.
Published: (2025)
Towards Non-Adversarial Algorithmic Recourse
by: Leemann, Tobias, et al.
Published: (2024)
by: Leemann, Tobias, et al.
Published: (2024)
Political Bias in LLMs: Unaligned Moral Values in Agent-centric Simulations
by: Münker, Simon
Published: (2024)
by: Münker, Simon
Published: (2024)
Sycophancy is an Educational Safety Risk: Why LLM Tutors Need Sycophancy Benchmarks
by: Kasneci, Enkelejda, et al.
Published: (2026)
by: Kasneci, Enkelejda, et al.
Published: (2026)
Adoption of Explainable Natural Language Processing: Perspectives from Industry and Academia on Practices and Challenges
by: Dhaini, Mahdi, et al.
Published: (2025)
by: Dhaini, Mahdi, et al.
Published: (2025)
Consolidating Rewarded Perturbations for LLM Post-Training
by: Zhang, Zheyu, et al.
Published: (2026)
by: Zhang, Zheyu, et al.
Published: (2026)
Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers
by: Leemann, Tobias, et al.
Published: (2024)
by: Leemann, Tobias, et al.
Published: (2024)
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods
by: Dhaini, Mahdi, et al.
Published: (2025)
by: Dhaini, Mahdi, et al.
Published: (2025)
Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models
by: Ball, Sarah, et al.
Published: (2024)
by: Ball, Sarah, et al.
Published: (2024)
Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study
by: Dhaini, Mahdi, et al.
Published: (2025)
by: Dhaini, Mahdi, et al.
Published: (2025)
EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP Models
by: Dhaini, Mahdi, et al.
Published: (2025)
by: Dhaini, Mahdi, et al.
Published: (2025)
ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs
by: Thomas, Rohan Subramanian, et al.
Published: (2026)
by: Thomas, Rohan Subramanian, et al.
Published: (2026)
Moral Mazes in the Era of LLMs
by: Nguyen, Dang, et al.
Published: (2026)
by: Nguyen, Dang, et al.
Published: (2026)
AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers
by: Wuttke, Alexander, et al.
Published: (2024)
by: Wuttke, Alexander, et al.
Published: (2024)
Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models
by: Ball, Sarah, et al.
Published: (2025)
by: Ball, Sarah, et al.
Published: (2025)
TabSCM: A practical Framework for Generating Realistic Tabular Data
by: Jacob, Sven, et al.
Published: (2026)
by: Jacob, Sven, et al.
Published: (2026)
Exploring the psychology of LLMs' Moral and Legal Reasoning
by: Almeida, Guilherme F. C. F., et al.
Published: (2023)
by: Almeida, Guilherme F. C. F., et al.
Published: (2023)
Bias Beyond Borders: Political Ideology Evaluation and Steering in Multilingual LLMs
by: Nadeem, Afrozah, et al.
Published: (2026)
by: Nadeem, Afrozah, et al.
Published: (2026)
From Ground Truth to Measurement: A Statistical Framework for Human Labeling
by: Chew, Robert, et al.
Published: (2026)
by: Chew, Robert, et al.
Published: (2026)
Language, Culture, and Ideology: Personalizing Offensiveness Detection in Political Tweets with Reasoning LLMs
by: Pihulski, Dzmitry, et al.
Published: (2025)
by: Pihulski, Dzmitry, et al.
Published: (2025)
Understanding Knowledge Drift in LLMs through Misinformation
by: Fastowski, Alina, et al.
Published: (2024)
by: Fastowski, Alina, et al.
Published: (2024)
Do Language Models Understand Morality? Towards a Robust Detection of Moral Content
by: Bulla, Luana, et al.
Published: (2024)
by: Bulla, Luana, et al.
Published: (2024)
Histoires Morales: A French Dataset for Assessing Moral Alignment
by: Leteno, Thibaud, et al.
Published: (2025)
by: Leteno, Thibaud, et al.
Published: (2025)
Similar Items
-
Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models
by: Yuan, Chenchen, et al.
Published: (2025) -
From Confidence to Collapse in LLM Factual Robustness
by: Fastowski, Alina, et al.
Published: (2025) -
Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs
by: Fastowski, Alina, et al.
Published: (2025) -
CURE: Controlled Unlearning for Robust Embeddings -- Mitigating Conceptual Shortcuts in Pre-Trained Language Models
by: Kocak, Aysenur, et al.
Published: (2025) -
Where Paths Split: Localized, Calibrated Control of Moral Reasoning in Large Language Models
by: Yuan, Chenchen, et al.
Published: (2026)