:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yuan, Chenchen, Ma, Bolei, Zhang, Zheyu, Prenkaj, Bardh, Kreuter, Frauke, Kasneci, Gjergji
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.08634
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models
by: Yuan, Chenchen, et al.
Published: (2025)

From Confidence to Collapse in LLM Factual Robustness
by: Fastowski, Alina, et al.
Published: (2025)

Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs
by: Fastowski, Alina, et al.
Published: (2025)

CURE: Controlled Unlearning for Robust Embeddings -- Mitigating Conceptual Shortcuts in Pre-Trained Language Models
by: Kocak, Aysenur, et al.
Published: (2025)

Where Paths Split: Localized, Calibrated Control of Moral Reasoning in Large Language Models
by: Yuan, Chenchen, et al.
Published: (2026)

Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs
by: Yang, Shuo, et al.
Published: (2025)

Active Tabular Augmentation via Policy-Guided Diffusion Inpainting
by: Zhang, Zheyu, et al.
Published: (2026)

Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models
by: Zhang, Zheyu, et al.
Published: (2025)

RAZOR: Sharpening Knowledge by Cutting Bias with Unsupervised Text Rewriting
by: Yang, Shuo, et al.
Published: (2024)

Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents
by: Kirchhof, Michael, et al.
Published: (2025)

Analysing the Safety Pitfalls of Steering Vectors
by: Li, Yuxiao, et al.
Published: (2026)

SAGE: Sparse Adaptive Guidance for Dependency-Aware Tabular Data Generation
by: Yang, Shuo, et al.
Published: (2026)

Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization
by: Yang, Shuo, et al.
Published: (2024)

MoralBench: Moral Evaluation of LLMs
by: Ji, Jianchao, et al.
Published: (2024)

SCISSOR: Mitigating Semantic Bias through Cluster-Aware Siamese Networks for Robust Classification
by: Yang, Shuo, et al.
Published: (2025)

Reinforcement Unlearning via Group Relative Policy Optimization
by: Zaradoukas, Efstratios, et al.
Published: (2026)

Graph Inverse Style Transfer for Counterfactual Explainability
by: Prenkaj, Bardh, et al.
Published: (2025)

Emergent Abilities in Large Language Models: A Survey
by: Berti, Leonardo, et al.
Published: (2025)

Towards Non-Adversarial Algorithmic Recourse
by: Leemann, Tobias, et al.
Published: (2024)

Political Bias in LLMs: Unaligned Moral Values in Agent-centric Simulations
by: Münker, Simon
Published: (2024)

Sycophancy is an Educational Safety Risk: Why LLM Tutors Need Sycophancy Benchmarks
by: Kasneci, Enkelejda, et al.
Published: (2026)

Adoption of Explainable Natural Language Processing: Perspectives from Industry and Academia on Practices and Challenges
by: Dhaini, Mahdi, et al.
Published: (2025)

Consolidating Rewarded Perturbations for LLM Post-Training
by: Zhang, Zheyu, et al.
Published: (2026)

Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers
by: Leemann, Tobias, et al.
Published: (2024)

Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods
by: Dhaini, Mahdi, et al.
Published: (2025)

Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models
by: Ball, Sarah, et al.
Published: (2024)

Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study
by: Dhaini, Mahdi, et al.
Published: (2025)

EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP Models
by: Dhaini, Mahdi, et al.
Published: (2025)

ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs
by: Thomas, Rohan Subramanian, et al.
Published: (2026)

Moral Mazes in the Era of LLMs
by: Nguyen, Dang, et al.
Published: (2026)

AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers
by: Wuttke, Alexander, et al.
Published: (2024)

Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models
by: Ball, Sarah, et al.
Published: (2025)

TabSCM: A practical Framework for Generating Realistic Tabular Data
by: Jacob, Sven, et al.
Published: (2026)

Exploring the psychology of LLMs' Moral and Legal Reasoning
by: Almeida, Guilherme F. C. F., et al.
Published: (2023)

Bias Beyond Borders: Political Ideology Evaluation and Steering in Multilingual LLMs
by: Nadeem, Afrozah, et al.
Published: (2026)

From Ground Truth to Measurement: A Statistical Framework for Human Labeling
by: Chew, Robert, et al.
Published: (2026)

Language, Culture, and Ideology: Personalizing Offensiveness Detection in Political Tweets with Reasoning LLMs
by: Pihulski, Dzmitry, et al.
Published: (2025)

Understanding Knowledge Drift in LLMs through Misinformation
by: Fastowski, Alina, et al.
Published: (2024)

Do Language Models Understand Morality? Towards a Robust Detection of Moral Content
by: Bulla, Luana, et al.
Published: (2024)

Histoires Morales: A French Dataset for Assessing Moral Alignment
by: Leteno, Thibaud, et al.
Published: (2025)