:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Qi, Jirui, Chen, Shan, Xiong, Zidi, Fernández, Raquel, Bitterman, Danielle S., Bisazza, Arianna
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2505.22888
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models
by: Qi, Jirui, et al.
Published: (2023)

On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation
by: Qi, Jirui, et al.
Published: (2025)

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation
by: Qi, Jirui, et al.
Published: (2024)

Post-Training Language Models for Crosslingual Consistency
by: Liu, Tianyu, et al.
Published: (2026)

The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models
by: Chen, Xinyi, et al.
Published: (2024)

Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization
by: Neplenbroek, Vera, et al.
Published: (2025)

MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs
by: Neplenbroek, Vera, et al.
Published: (2024)

Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
by: Neplenbroek, Vera, et al.
Published: (2024)

Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments
by: Ye, Bingyang, et al.
Published: (2026)

Assessing the Impact of Typological Features on Multilingual Machine Translation in the Age of Large Language Models
by: Hirak, Vitalii, et al.
Published: (2026)

Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models
by: Padovani, Francesca, et al.
Published: (2025)

Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation
by: Liu, Tianyu, et al.
Published: (2024)

debiaSAE: Benchmarking and Mitigating Vision-Language Model Bias
by: Sasse, Kuleen, et al.
Published: (2024)

NeLLCom-X: A Comprehensive Neural-Agent Framework to Simulate Language Learning and Group Communication
by: Lian, Yuchen, et al.
Published: (2024)

Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses
by: Sarti, Gabriele, et al.
Published: (2024)

A Primer on the Inner Workings of Transformer-based Language Models
by: Ferrando, Javier, et al.
Published: (2024)

A Unified Assessment of the Poverty of the Stimulus Argument for Neural Language Models
by: Yang, Xiulin, et al.
Published: (2026)

KScope: A Framework for Characterizing the Knowledge Status of Language Models
by: Xiao, Yuxin, et al.
Published: (2025)

Is Child-Directed Language Optimized for Word Learning? A Computational Study of Verb Meaning Acquisition
by: Padovani, Francesca, et al.
Published: (2026)

BabyLM Challenge: Exploring the Effect of Variation Sets on Language Model Training Efficiency
by: Haga, Akari, et al.
Published: (2024)

Can Language Models Identify Side Effects of Breast Cancer Radiation Treatments?
by: Seah, Natalie, et al.
Published: (2026)

Steering Large Language Models for Machine Translation Personalization
by: Scalena, Daniel, et al.
Published: (2025)

When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications?
by: Gao, Yanjun, et al.
Published: (2024)

Improving Clinical NLP Performance through Language Model-Generated Synthetic Clinical Data
by: Chen, Shan, et al.
Published: (2024)

Simulating the Emergence of Differential Case Marking with Communicating Neural-Network Agents
by: Lian, Yuchen, et al.
Published: (2025)

NeLLCom-Lex: A Neural-agent Framework to Study the Interplay between Lexical Systems and Language Use
by: Zhang, Yuqing, et al.
Published: (2025)

Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation
by: Chen, Shan, et al.
Published: (2024)

Evaluation of ChatGPT Family of Models for Biomedical Reasoning and Classification
by: Chen, Shan, et al.
Published: (2023)

Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
by: Xiong, Zidi, et al.
Published: (2025)

Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks
by: Gallifant, Jack, et al.
Published: (2024)

TurBLiMP: A Turkish Benchmark of Linguistic Minimal Pairs
by: Başar, Ezgi, et al.
Published: (2025)

MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs
by: Jumelet, Jaap, et al.
Published: (2025)

Modeling Human-Like Color Naming Behavior in Context
by: Zhang, Yuqing, et al.
Published: (2026)

Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
by: Sarti, Gabriele, et al.
Published: (2025)

ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
by: He, Qianyu, et al.
Published: (2025)

Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability
by: Gao, Yanjun, et al.
Published: (2024)

ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?
by: Chen, Canyu, et al.
Published: (2024)

Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training
by: Yao, Yihang, et al.
Published: (2025)

Cognitive Decision Routing in Large Language Models: When to Think Fast, When to Think Slow
by: Du, Y., et al.
Published: (2025)

Sparse Autoencoder Features for Classifications and Transferability
by: Gallifant, Jack, et al.
Published: (2025)