Saved in:
| Main Authors: | Qi, Jirui, Chen, Shan, Xiong, Zidi, Fernández, Raquel, Bitterman, Danielle S., Bisazza, Arianna |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.22888 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models
by: Qi, Jirui, et al.
Published: (2023)
by: Qi, Jirui, et al.
Published: (2023)
On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation
by: Qi, Jirui, et al.
Published: (2025)
by: Qi, Jirui, et al.
Published: (2025)
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation
by: Qi, Jirui, et al.
Published: (2024)
by: Qi, Jirui, et al.
Published: (2024)
Post-Training Language Models for Crosslingual Consistency
by: Liu, Tianyu, et al.
Published: (2026)
by: Liu, Tianyu, et al.
Published: (2026)
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models
by: Chen, Xinyi, et al.
Published: (2024)
by: Chen, Xinyi, et al.
Published: (2024)
Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization
by: Neplenbroek, Vera, et al.
Published: (2025)
by: Neplenbroek, Vera, et al.
Published: (2025)
MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs
by: Neplenbroek, Vera, et al.
Published: (2024)
by: Neplenbroek, Vera, et al.
Published: (2024)
Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
by: Neplenbroek, Vera, et al.
Published: (2024)
by: Neplenbroek, Vera, et al.
Published: (2024)
Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments
by: Ye, Bingyang, et al.
Published: (2026)
by: Ye, Bingyang, et al.
Published: (2026)
Assessing the Impact of Typological Features on Multilingual Machine Translation in the Age of Large Language Models
by: Hirak, Vitalii, et al.
Published: (2026)
by: Hirak, Vitalii, et al.
Published: (2026)
Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models
by: Padovani, Francesca, et al.
Published: (2025)
by: Padovani, Francesca, et al.
Published: (2025)
Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation
by: Liu, Tianyu, et al.
Published: (2024)
by: Liu, Tianyu, et al.
Published: (2024)
debiaSAE: Benchmarking and Mitigating Vision-Language Model Bias
by: Sasse, Kuleen, et al.
Published: (2024)
by: Sasse, Kuleen, et al.
Published: (2024)
NeLLCom-X: A Comprehensive Neural-Agent Framework to Simulate Language Learning and Group Communication
by: Lian, Yuchen, et al.
Published: (2024)
by: Lian, Yuchen, et al.
Published: (2024)
Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses
by: Sarti, Gabriele, et al.
Published: (2024)
by: Sarti, Gabriele, et al.
Published: (2024)
A Primer on the Inner Workings of Transformer-based Language Models
by: Ferrando, Javier, et al.
Published: (2024)
by: Ferrando, Javier, et al.
Published: (2024)
A Unified Assessment of the Poverty of the Stimulus Argument for Neural Language Models
by: Yang, Xiulin, et al.
Published: (2026)
by: Yang, Xiulin, et al.
Published: (2026)
KScope: A Framework for Characterizing the Knowledge Status of Language Models
by: Xiao, Yuxin, et al.
Published: (2025)
by: Xiao, Yuxin, et al.
Published: (2025)
Is Child-Directed Language Optimized for Word Learning? A Computational Study of Verb Meaning Acquisition
by: Padovani, Francesca, et al.
Published: (2026)
by: Padovani, Francesca, et al.
Published: (2026)
BabyLM Challenge: Exploring the Effect of Variation Sets on Language Model Training Efficiency
by: Haga, Akari, et al.
Published: (2024)
by: Haga, Akari, et al.
Published: (2024)
Can Language Models Identify Side Effects of Breast Cancer Radiation Treatments?
by: Seah, Natalie, et al.
Published: (2026)
by: Seah, Natalie, et al.
Published: (2026)
Steering Large Language Models for Machine Translation Personalization
by: Scalena, Daniel, et al.
Published: (2025)
by: Scalena, Daniel, et al.
Published: (2025)
When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications?
by: Gao, Yanjun, et al.
Published: (2024)
by: Gao, Yanjun, et al.
Published: (2024)
Improving Clinical NLP Performance through Language Model-Generated Synthetic Clinical Data
by: Chen, Shan, et al.
Published: (2024)
by: Chen, Shan, et al.
Published: (2024)
Simulating the Emergence of Differential Case Marking with Communicating Neural-Network Agents
by: Lian, Yuchen, et al.
Published: (2025)
by: Lian, Yuchen, et al.
Published: (2025)
NeLLCom-Lex: A Neural-agent Framework to Study the Interplay between Lexical Systems and Language Use
by: Zhang, Yuqing, et al.
Published: (2025)
by: Zhang, Yuqing, et al.
Published: (2025)
Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation
by: Chen, Shan, et al.
Published: (2024)
by: Chen, Shan, et al.
Published: (2024)
Evaluation of ChatGPT Family of Models for Biomedical Reasoning and Classification
by: Chen, Shan, et al.
Published: (2023)
by: Chen, Shan, et al.
Published: (2023)
Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
by: Xiong, Zidi, et al.
Published: (2025)
by: Xiong, Zidi, et al.
Published: (2025)
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks
by: Gallifant, Jack, et al.
Published: (2024)
by: Gallifant, Jack, et al.
Published: (2024)
TurBLiMP: A Turkish Benchmark of Linguistic Minimal Pairs
by: Başar, Ezgi, et al.
Published: (2025)
by: Başar, Ezgi, et al.
Published: (2025)
MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs
by: Jumelet, Jaap, et al.
Published: (2025)
by: Jumelet, Jaap, et al.
Published: (2025)
Modeling Human-Like Color Naming Behavior in Context
by: Zhang, Yuqing, et al.
Published: (2026)
by: Zhang, Yuqing, et al.
Published: (2026)
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
by: Sarti, Gabriele, et al.
Published: (2025)
by: Sarti, Gabriele, et al.
Published: (2025)
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
by: He, Qianyu, et al.
Published: (2025)
by: He, Qianyu, et al.
Published: (2025)
Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability
by: Gao, Yanjun, et al.
Published: (2024)
by: Gao, Yanjun, et al.
Published: (2024)
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?
by: Chen, Canyu, et al.
Published: (2024)
by: Chen, Canyu, et al.
Published: (2024)
Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training
by: Yao, Yihang, et al.
Published: (2025)
by: Yao, Yihang, et al.
Published: (2025)
Cognitive Decision Routing in Large Language Models: When to Think Fast, When to Think Slow
by: Du, Y., et al.
Published: (2025)
by: Du, Y., et al.
Published: (2025)
Sparse Autoencoder Features for Classifications and Transferability
by: Gallifant, Jack, et al.
Published: (2025)
by: Gallifant, Jack, et al.
Published: (2025)
Similar Items
-
Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models
by: Qi, Jirui, et al.
Published: (2023) -
On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation
by: Qi, Jirui, et al.
Published: (2025) -
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation
by: Qi, Jirui, et al.
Published: (2024) -
Post-Training Language Models for Crosslingual Consistency
by: Liu, Tianyu, et al.
Published: (2026) -
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models
by: Chen, Xinyi, et al.
Published: (2024)