:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Arunasalam, Arjun, Pickering, Madison, Celik, Z. Berkay, Ur, Blase
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.03384
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Rethinking How to Evaluate Language Model Jailbreak
by: Cai, Hongyu, et al.
Published: (2024)

Exploring and Developing a Pre-Model Safeguard with Draft Models
by: Cai, Hongyu, et al.
Published: (2026)

ClawBench: Can AI Agents Complete Everyday Online Tasks?
by: Zhang, Yuxuan, et al.
Published: (2026)

Catch Me If You Can? Not Yet: LLMs Still Struggle to Imitate the Implicit Writing Styles of Everyday Authors
by: Wang, Zhengxiang, et al.
Published: (2025)

QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion Tasks
by: Sosto, Mae, et al.
Published: (2026)

Multi-Task Learning with LLMs for Implicit Sentiment Analysis: Data-level and Task-level Automatic Weight Learning
by: Lai, Wenna, et al.
Published: (2024)

FLANS at SemEval-2026 Task 7: RAG with Open-Sourced Smaller LLMs for Everyday Knowledge Across Diverse Languages and Cultures
by: Bogdanova, Liliia, et al.
Published: (2026)

Teaching Values to Machines: Simulating Human-Like Behavior in LLMs
by: Yehudai, Asaf, et al.
Published: (2026)

International Students and Scams: At Risk Abroad
by: Zhang, Katherine, et al.
Published: (2025)

Growth First, Care Second? Tracing the Landscape of LLM Value Preferences in Everyday Dilemmas
by: Chen, Zhiyi, et al.
Published: (2026)

Implicit Bias in LLMs: A Survey
by: Lin, Xinru, et al.
Published: (2025)

Cash or Comfort? How LLMs Value Your Inconvenience
by: Cedro, Mateusz, et al.
Published: (2025)

How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
by: Zhang, Ran, et al.
Published: (2024)

MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks
by: Qi, Jingyuan, et al.
Published: (2023)

How Reliable are LLMs as Knowledge Bases? Re-thinking Facutality and Consistency
by: Zheng, Danna, et al.
Published: (2024)

Say Anything but This: When Tokenizer Betrays Reasoning in LLMs
by: Ayoobi, Navid, et al.
Published: (2026)

ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
by: Shen, Hua, et al.
Published: (2024)

BRIDGE: Predicting Human Task Completion Time From Model Performance
by: Liu, Fengyuan, et al.
Published: (2026)

MADIAVE: Multi-Agent Debate for Implicit Attribute Value Extraction
by: Huang, Wei-Chieh, et al.
Published: (2025)

Larger Language Models Don't Care How You Think: Why Chain-of-Thought Prompting Fails in Subjective Tasks
by: Chochlakis, Georgios, et al.
Published: (2024)

Evaluating Computational Accuracy of Large Language Models in Numerical Reasoning Tasks for Healthcare Applications
by: Malghan, Arjun R.
Published: (2025)

ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
by: Zheng, Jingnan, et al.
Published: (2024)

Uncovering Implicit Bias in Large Language Models with Concept Learning Dataset
by: Wang, Leroy Z.
Published: (2025)

Towards Semantically Enriched Embeddings for Knowledge Graph Completion
by: Alam, Mehwish, et al.
Published: (2023)

ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction
by: Zou, Henry Peng, et al.
Published: (2024)

Are the Values of LLMs Structurally Aligned with Humans? A Causal Perspective
by: Kang, Yipeng, et al.
Published: (2024)

Perspective Transition of Large Language Models for Solving Subjective Tasks
by: Wang, Xiaolong, et al.
Published: (2025)

When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks
by: Fleisig, Eve, et al.
Published: (2023)

Assessing LLMs Suitability for Knowledge Graph Completion
by: Iga, Vasile Ionut Remus, et al.
Published: (2024)

How Humans and LLMs Organize Conceptual Knowledge: Exploring Subordinate Categories in Italian
by: Pedrotti, Andrea, et al.
Published: (2025)

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
by: Zeng, Yi, et al.
Published: (2024)

Seeing Through AI's Lens: Enhancing Human Skepticism Towards LLM-Generated Fake News
by: Ayoobi, Navid, et al.
Published: (2024)

Do LLMs Really Think Step-by-step In Implicit Reasoning?
by: Yu, Yijiong
Published: (2024)

Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors
by: Chochlakis, Georgios, et al.
Published: (2024)

Exploring the Performance of Large Language Models on Subjective Span Identification Tasks
by: Dmonte, Alphaeus, et al.
Published: (2026)

Do LLMs have Consistent Values?
by: Rozen, Naama, et al.
Published: (2024)

Implicit Humanization in Everyday LLM Moral Judgments
by: Ayad, Hoda, et al.
Published: (2026)

The Grounding Gap: How LLMs Anchor the Meaning of Abstract Concepts Differently from Humans
by: Chlapanis, Odysseas S., et al.
Published: (2026)

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases
by: Ford, Casey, et al.
Published: (2026)

Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs
by: Mor-Lan, Guy, et al.
Published: (2026)