:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Drechsel, Jonathan, Herbold, Steffen
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2502.01406
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The GRADIEND Python Package: An End-to-End System for Gradient-Based Feature Learning
by: Drechsel, Jonathan, et al.
Published: (2026)

Understanding or Memorizing? A Case Study of German Definite Articles in Language Models
by: Drechsel, Jonathan, et al.
Published: (2026)

MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training
by: Drechsel, Jonathan, et al.
Published: (2025)

Large Language Models can impersonate politicians and other public figures
by: Herbold, Steffen, et al.
Published: (2024)

SortBench: Benchmarking LLMs based on their ability to sort lists
by: Herbold, Steffen
Published: (2025)

Semantic similarity prediction is better than other semantic similarity measures
by: Herbold, Steffen
Published: (2023)

FairFlow: Mitigating Dataset Biases through Undecided Learning
by: Cheng, Jiali, et al.
Published: (2025)

A Formal Framework for Uncertainty Analysis of Text Generation with Large Language Models
by: Herbold, Steffen, et al.
Published: (2026)

On the Hidden Objective Biases of Group-based Reinforcement Learning
by: Fontana, Aleksandar, et al.
Published: (2026)

BiasGym: A Simple and Generalizable Framework for Analyzing and Removing Biases through Elicitation
by: Islam, Sekh Mainul, et al.
Published: (2025)

Do Large Language Models Show Biases in Causal Learning?
by: Carro, Maria Victoria, et al.
Published: (2024)

Inductive Biases for Zero-shot Systematic Generalization in Language-informed Reinforcement Learning
by: Dijujin, Negin Hashemi, et al.
Published: (2025)

Demo: Statistically Significant Results On Biases and Errors of LLMs Do Not Guarantee Generalizable Results
by: Liu, Jonathan, et al.
Published: (2025)

Superscopes: Amplifying Internal Feature Representations for Language Model Interpretation
by: Jacobi, Jonathan, et al.
Published: (2025)

Relative Value Biases in Large Language Models
by: Hayes, William M., et al.
Published: (2024)

Large Language Models are Biased Reinforcement Learners
by: Hayes, William M., et al.
Published: (2024)

Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models
by: Feng, Duanyu, et al.
Published: (2023)

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases
by: Hahm, Dongyoon, et al.
Published: (2026)

Agentifying Patient Dynamics within LLMs through Interacting with Clinical World Model
by: Wu, Minghao, et al.
Published: (2026)

Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs
by: Yang, Xikang, et al.
Published: (2025)

Self-Speculative Biased Decoding for Faster Re-Translation
by: Zeng, Linxiao, et al.
Published: (2025)

Text Injection for Neural Contextual Biasing
by: Meng, Zhong, et al.
Published: (2024)

Large Language Models are Geographically Biased
by: Manvi, Rohin, et al.
Published: (2024)

Heuristics and Biases in AI Decision-Making: Implications for Responsible AGI
by: Saeedi, Payam, et al.
Published: (2024)

Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of Compositional Communication
by: Kuciński, Łukasz, et al.
Published: (2021)

Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
by: Hu, Michael Y., et al.
Published: (2025)

Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination
by: Yang, Nakyeong, et al.
Published: (2023)

BiasJailbreak:Analyzing Ethical Biases and Jailbreak Vulnerabilities in Large Language Models
by: Lee, Isack, et al.
Published: (2024)

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation
by: Li, Xinran, et al.
Published: (2025)

Reward Models Inherit Value Biases from Pretraining
by: Christian, Brian, et al.
Published: (2026)

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
by: Itzhak, Itay, et al.
Published: (2025)

FairPy: A Toolkit for Evaluation of Prediction Biases and their Mitigation in Large Language Models
by: Viswanath, Hrishikesh, et al.
Published: (2023)

Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?
by: Opedal, Andreas, et al.
Published: (2024)

Switchable Decision: Dynamic Neural Generation Networks
by: Zhang, Shujian, et al.
Published: (2024)

Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?
by: Abbas, Chaymaa, et al.
Published: (2025)

Laissez-Faire Harms: Algorithmic Biases in Generative Language Models
by: Shieh, Evan, et al.
Published: (2024)

Enhancing Rare Codes via Probability-Biased Directed Graph Attention for Long-Tail ICD Coding
by: Chen, Tianlei, et al.
Published: (2025)

Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
by: Huang, Jerry, et al.
Published: (2024)

Interactive Training: Feedback-Driven Neural Network Optimization
by: Zhang, Wentao, et al.
Published: (2025)

Empirical Capacity Model for Self-Attention Neural Networks
by: Härmä, Aki, et al.
Published: (2024)