Saved in:
| Main Authors: | Kamp, Jonathan, Bakker, Roos, Blok, Dominique |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.11108 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Role of Syntactic Span Preferences in Post-Hoc Explanation Disagreement
by: Kamp, Jonathan, et al.
Published: (2024)
by: Kamp, Jonathan, et al.
Published: (2024)
Learning from Sufficient Rationales: Analysing the Relationship Between Explanation Faithfulness and Token-level Regularisation Strategies
by: Kamp, Jonathan, et al.
Published: (2025)
by: Kamp, Jonathan, et al.
Published: (2025)
Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning
by: Ming, Xiaoyang, et al.
Published: (2026)
by: Ming, Xiaoyang, et al.
Published: (2026)
Feature Attribution Stability Suite: How Stable Are Post-Hoc Attributions?
by: Subramaniakuppusamy, Kamalasankari, et al.
Published: (2026)
by: Subramaniakuppusamy, Kamalasankari, et al.
Published: (2026)
Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers
by: Xie, Roy, et al.
Published: (2024)
by: Xie, Roy, et al.
Published: (2024)
Evaluating Evidence Attribution in Generated Fact Checking Explanations
by: Xing, Rui, et al.
Published: (2024)
by: Xing, Rui, et al.
Published: (2024)
GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization
by: Bugueño, Margarita, et al.
Published: (2024)
by: Bugueño, Margarita, et al.
Published: (2024)
Self-supervised Attribute-aware Dynamic Preference Ranking Alignment
by: Yang, Hongyu, et al.
Published: (2025)
by: Yang, Hongyu, et al.
Published: (2025)
Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented Generation
by: Yao, Jiayu, et al.
Published: (2025)
by: Yao, Jiayu, et al.
Published: (2025)
Finding and Reactivating Post-Trained LLMs' Hidden Safety Mechanisms
by: Li, Mingjie, et al.
Published: (2026)
by: Li, Mingjie, et al.
Published: (2026)
EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP Models
by: Dhaini, Mahdi, et al.
Published: (2025)
by: Dhaini, Mahdi, et al.
Published: (2025)
Mitigation of Gender and Ethnicity Bias in AI-Generated Stories through Model Explanations
by: Dimgba, Martha O., et al.
Published: (2025)
by: Dimgba, Martha O., et al.
Published: (2025)
RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns
by: Chen, Xin, et al.
Published: (2025)
by: Chen, Xin, et al.
Published: (2025)
Saying the Unsaid: Revealing the Hidden Language of Multimodal Systems Through Telephone Games
by: Zhao, Juntu, et al.
Published: (2025)
by: Zhao, Juntu, et al.
Published: (2025)
Post-edits Are Preferences Too
by: Berger, Nathaniel, et al.
Published: (2024)
by: Berger, Nathaniel, et al.
Published: (2024)
Lexicalization Is All You Need: Examining the Impact of Lexical Knowledge in a Compositional QALD System
by: Schmidt, David Maria, et al.
Published: (2024)
by: Schmidt, David Maria, et al.
Published: (2024)
Faithfulness Serum: Mitigating the Faithfulness Gap in Textual Explanations of LLM Decisions via Attribution Guidance
by: Alon, Bar, et al.
Published: (2026)
by: Alon, Bar, et al.
Published: (2026)
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
by: Hsu, Aliyah R., et al.
Published: (2024)
by: Hsu, Aliyah R., et al.
Published: (2024)
Not All Preferences are What You Need for Post-Training: Selective Alignment Strategy for Preference Optimization
by: Dong, Zhijin
Published: (2025)
by: Dong, Zhijin
Published: (2025)
Improving Attributed Text Generation of Large Language Models via Preference Learning
by: Li, Dongfang, et al.
Published: (2024)
by: Li, Dongfang, et al.
Published: (2024)
Self-Preference Bias in Rubric-Based Evaluation of Large Language Models
by: Pombal, José, et al.
Published: (2026)
by: Pombal, José, et al.
Published: (2026)
Bi-directional Bias Attribution: Debiasing Large Language Models without Modifying Prompts
by: Lin, Yujie, et al.
Published: (2026)
by: Lin, Yujie, et al.
Published: (2026)
Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge
by: Shi, Lin, et al.
Published: (2024)
by: Shi, Lin, et al.
Published: (2024)
Layer-wise Positional Bias in Short-Context Language Modeling
by: Rahimi, Maryam, et al.
Published: (2026)
by: Rahimi, Maryam, et al.
Published: (2026)
Language Model Re-rankers are Fooled by Lexical Similarities
by: Hagström, Lovisa, et al.
Published: (2025)
by: Hagström, Lovisa, et al.
Published: (2025)
Using Language Models to Disambiguate Lexical Choices in Translation
by: Barua, Josh, et al.
Published: (2024)
by: Barua, Josh, et al.
Published: (2024)
Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics
by: Hou, Iyad Ait, et al.
Published: (2026)
by: Hou, Iyad Ait, et al.
Published: (2026)
Token Homogenization under Positional Bias
by: Yusupov, Viacheslav, et al.
Published: (2025)
by: Yusupov, Viacheslav, et al.
Published: (2025)
Hidden Heroes and Gradient Bloats: Layer-Wise Redundancy Inverts Attribution in Transformers
by: Ye, Donald
Published: (2026)
by: Ye, Donald
Published: (2026)
Post-hoc Reward Calibration: A Case Study on Length Bias
by: Huang, Zeyu, et al.
Published: (2024)
by: Huang, Zeyu, et al.
Published: (2024)
Quantifying and Mitigating Self-Preference Bias of LLM Judges
by: Yang, Jinming, et al.
Published: (2026)
by: Yang, Jinming, et al.
Published: (2026)
Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following
by: Zeng, Jie, et al.
Published: (2025)
by: Zeng, Jie, et al.
Published: (2025)
Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction
by: Saito, Kuniaki, et al.
Published: (2024)
by: Saito, Kuniaki, et al.
Published: (2024)
Technical Report: Impact of Position Bias on Language Models in Token Classification
by: Amor, Mehdi Ben, et al.
Published: (2023)
by: Amor, Mehdi Ben, et al.
Published: (2023)
Understanding Position Bias Effects on Fairness in Social Multi-Document Summarization
by: Olabisi, Olubusayo, et al.
Published: (2024)
by: Olabisi, Olubusayo, et al.
Published: (2024)
Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement
by: Zhan, Pengwei, et al.
Published: (2024)
by: Zhan, Pengwei, et al.
Published: (2024)
MultiLS: A Multi-task Lexical Simplification Framework
by: North, Kai, et al.
Published: (2024)
by: North, Kai, et al.
Published: (2024)
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations
by: Dumpala, Sri Harsha, et al.
Published: (2024)
by: Dumpala, Sri Harsha, et al.
Published: (2024)
ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification
by: North, Kai, et al.
Published: (2022)
by: North, Kai, et al.
Published: (2022)
BARD10: A New Benchmark Reveals Significance of Bangla Stop-Words in Authorship Attribution
by: Moosa, Abdullah Muhammad, et al.
Published: (2025)
by: Moosa, Abdullah Muhammad, et al.
Published: (2025)
Similar Items
-
The Role of Syntactic Span Preferences in Post-Hoc Explanation Disagreement
by: Kamp, Jonathan, et al.
Published: (2024) -
Learning from Sufficient Rationales: Analysing the Relationship Between Explanation Faithfulness and Token-level Regularisation Strategies
by: Kamp, Jonathan, et al.
Published: (2025) -
Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning
by: Ming, Xiaoyang, et al.
Published: (2026) -
Feature Attribution Stability Suite: How Stable Are Post-Hoc Attributions?
by: Subramaniakuppusamy, Kamalasankari, et al.
Published: (2026) -
Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers
by: Xie, Roy, et al.
Published: (2024)