Saved in:
| Main Authors: | Ford, Casey, Van Doren, Madison, Dix, Emily |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.04739 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Red Teaming Multimodal Language Models: Evaluating Harm Across Prompt Modalities and Models
by: Van Doren, Madison, et al.
Published: (2025)
by: Van Doren, Madison, et al.
Published: (2025)
Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs
by: Ford, Casey, et al.
Published: (2026)
by: Ford, Casey, et al.
Published: (2026)
CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions
by: Kim, Tae Soo, et al.
Published: (2025)
by: Kim, Tae Soo, et al.
Published: (2025)
Multimodal Transformer Models for Turn-taking Prediction: Effects on Conversational Dynamics of Human-Agent Interaction during Cooperative Gameplay
by: Bae, Young-Ho, et al.
Published: (2025)
by: Bae, Young-Ho, et al.
Published: (2025)
Mitigating Harmful Erraticism in LLMs Through Dialectical Behavior Therapy Based De-Escalation Strategies
by: Rangarajan, Pooja, et al.
Published: (2025)
by: Rangarajan, Pooja, et al.
Published: (2025)
Alignment Drift in Long-Term Human-LLM Interaction: A Mechanism-Oriented Framework
by: Yao, Xintong
Published: (2026)
by: Yao, Xintong
Published: (2026)
Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?
by: Shen, Hua, et al.
Published: (2025)
by: Shen, Hua, et al.
Published: (2025)
The Alignment Floor: How Persona Customization Breaks Safety in Weakly-Aligned LLMs
by: Zhang, Xing, et al.
Published: (2026)
by: Zhang, Xing, et al.
Published: (2026)
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
by: Shen, Hua, et al.
Published: (2024)
by: Shen, Hua, et al.
Published: (2024)
Evaluating LLMs as Human Surrogates in Controlled Experiments
by: Hoq, Adnan, et al.
Published: (2026)
by: Hoq, Adnan, et al.
Published: (2026)
A Metasemantic-Metapragmatic Framework for Taxonomizing Multimodal Communicative Alignment
by: Ji, Eugene Yu
Published: (2025)
by: Ji, Eugene Yu
Published: (2025)
Heterogeneous Value Alignment Evaluation for Large Language Models
by: Zhang, Zhaowei, et al.
Published: (2023)
by: Zhang, Zhaowei, et al.
Published: (2023)
Detecting and Preventing Harmful Behaviors in AI Companions: Development and Evaluation of the SHIELD Supervisory System
by: Ben-Zion, Ziv, et al.
Published: (2025)
by: Ben-Zion, Ziv, et al.
Published: (2025)
Evaluating AI Alignment in LLMs: Output Analysis of Value Priorities Across 75 Models with Human Benchmarking
by: Lau, Gabriel Rongyang, et al.
Published: (2025)
by: Lau, Gabriel Rongyang, et al.
Published: (2025)
Meta-Evaluating Local LLMs: Rethinking Performance Metrics for Serious Games
by: Isaza-Giraldo, Andrés, et al.
Published: (2025)
by: Isaza-Giraldo, Andrés, et al.
Published: (2025)
Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework
by: Petrova, Nora, et al.
Published: (2026)
by: Petrova, Nora, et al.
Published: (2026)
Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans
by: Kwon, Deuksin, et al.
Published: (2025)
by: Kwon, Deuksin, et al.
Published: (2025)
Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs
by: Wang, Yifan, et al.
Published: (2024)
by: Wang, Yifan, et al.
Published: (2024)
CURE: Cultural Understanding and Reasoning Evaluation - A Framework for "Thick" Culture Alignment Evaluation in LLMs
by: Vo, Truong, et al.
Published: (2025)
by: Vo, Truong, et al.
Published: (2025)
Direct Language Model Alignment from Online AI Feedback
by: Guo, Shangmin, et al.
Published: (2024)
by: Guo, Shangmin, et al.
Published: (2024)
Understanding the Dataset Practitioners Behind Large Language Model Development
by: Qian, Crystal, et al.
Published: (2024)
by: Qian, Crystal, et al.
Published: (2024)
CHBench: A Cognitive Hierarchy Benchmark for Evaluating Strategic Reasoning Capability of LLMs
by: Liu, Hongtao, et al.
Published: (2025)
by: Liu, Hongtao, et al.
Published: (2025)
DAVIS: Planning Agent with Knowledge Graph-Powered Inner Monologue
by: Dinh, Minh Pham, et al.
Published: (2024)
by: Dinh, Minh Pham, et al.
Published: (2024)
Human-AI Interaction Alignment: Designing, Evaluating, and Evolving Value-Centered AI For Reciprocal Human-AI Futures
by: Shen, Hua, et al.
Published: (2025)
by: Shen, Hua, et al.
Published: (2025)
Automatic Histograms: Leveraging Language Models for Text Dataset Exploration
by: Reif, Emily, et al.
Published: (2024)
by: Reif, Emily, et al.
Published: (2024)
Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation
by: Ma, Cheng Charles, et al.
Published: (2024)
by: Ma, Cheng Charles, et al.
Published: (2024)
Visualization Literacy of Multimodal Large Language Models: A Comparative Study
by: Li, Zhimin, et al.
Published: (2024)
by: Li, Zhimin, et al.
Published: (2024)
Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation
by: Hosseini, Kasra, et al.
Published: (2024)
by: Hosseini, Kasra, et al.
Published: (2024)
Position: Towards Bidirectional Human-AI Alignment
by: Shen, Hua, et al.
Published: (2024)
by: Shen, Hua, et al.
Published: (2024)
The Ultimate Tutorial for AI-driven Scale Development in Generative Psychometrics: Releasing AIGENIE from its Bottle
by: Russell-Lasalandra, Lara, et al.
Published: (2026)
by: Russell-Lasalandra, Lara, et al.
Published: (2026)
Performance Gains of LLMs With Humans in a World of LLMs Versus Humans
by: McCullum, Lucas, et al.
Published: (2025)
by: McCullum, Lucas, et al.
Published: (2025)
Human Preferences for Constructive Interactions in Language Model Alignment
by: Kyrychenko, Yara, et al.
Published: (2025)
by: Kyrychenko, Yara, et al.
Published: (2025)
Minion: A Technology Probe to Explore How Users Negotiate Harmful Value Conflicts with AI Companions
by: Fan, Xianzhe, et al.
Published: (2024)
by: Fan, Xianzhe, et al.
Published: (2024)
SensorPersona: An LLM-Empowered System for Continual Persona Extraction from Longitudinal Mobile Sensor Streams
by: Yang, Bufang, et al.
Published: (2026)
by: Yang, Bufang, et al.
Published: (2026)
When AI Writes, Whose Voice Remains? Quantifying Cultural Marker Erasure Across World English Varieties in Large Language Models
by: Navneet, Satyam Kumar, et al.
Published: (2026)
by: Navneet, Satyam Kumar, et al.
Published: (2026)
Can LLMs Model Incorrect Student Reasoning? A Case Study on Distractor Generation
by: Zengaffinen, Yanick, et al.
Published: (2026)
by: Zengaffinen, Yanick, et al.
Published: (2026)
An Evaluation of Estimative Uncertainty in Large Language Models
by: Tang, Zhisheng, et al.
Published: (2024)
by: Tang, Zhisheng, et al.
Published: (2024)
Evaluating the Prompt Steerability of Large Language Models
by: Miehling, Erik, et al.
Published: (2024)
by: Miehling, Erik, et al.
Published: (2024)
Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review
by: Pang, Rock Yuren, et al.
Published: (2025)
by: Pang, Rock Yuren, et al.
Published: (2025)
User-Assistant Bias in LLMs
by: Pan, Xu, et al.
Published: (2025)
by: Pan, Xu, et al.
Published: (2025)
Similar Items
-
Red Teaming Multimodal Language Models: Evaluating Harm Across Prompt Modalities and Models
by: Van Doren, Madison, et al.
Published: (2025) -
Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs
by: Ford, Casey, et al.
Published: (2026) -
CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions
by: Kim, Tae Soo, et al.
Published: (2025) -
Multimodal Transformer Models for Turn-taking Prediction: Effects on Conversational Dynamics of Human-Agent Interaction during Cooperative Gameplay
by: Bae, Young-Ho, et al.
Published: (2025) -
Mitigating Harmful Erraticism in LLMs Through Dialectical Behavior Therapy Based De-Escalation Strategies
by: Rangarajan, Pooja, et al.
Published: (2025)