Saved in:
| Main Authors: | Hutson, Dylan, Vennemeyer, Daniel, Deshmukh, Aneesh, Zhan, Justin, Jiang, Tianyu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.19593 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Open-Ended Wargames with Large Language Models
by: Hogan, Daniel P., et al.
Published: (2024)
by: Hogan, Daniel P., et al.
Published: (2024)
O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering
by: Mei, Jianbiao, et al.
Published: (2025)
by: Mei, Jianbiao, et al.
Published: (2025)
MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question Generation
by: Deroy, Aniket, et al.
Published: (2024)
by: Deroy, Aniket, et al.
Published: (2024)
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
by: Shaik, Hashmath, et al.
Published: (2024)
by: Shaik, Hashmath, et al.
Published: (2024)
Flexible Agent Alignment with Goal Inference from Open-Ended Dialog
by: Ma, Rachel, et al.
Published: (2025)
by: Ma, Rachel, et al.
Published: (2025)
Sycophancy Is Not One Thing: Causal Separation of Sycophantic Behaviors in LLMs
by: Vennemeyer, Daniel, et al.
Published: (2025)
by: Vennemeyer, Daniel, et al.
Published: (2025)
MCU: An Evaluation Framework for Open-Ended Game Agents
by: Zheng, Xinyue, et al.
Published: (2023)
by: Zheng, Xinyue, et al.
Published: (2023)
A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering
by: Wang, Zhanliang, et al.
Published: (2026)
by: Wang, Zhanliang, et al.
Published: (2026)
Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions
by: Tsai, Chen Feng, et al.
Published: (2023)
by: Tsai, Chen Feng, et al.
Published: (2023)
From National Curricula to Cultural Awareness: Constructing Open-Ended Culture-Specific Question Answering Dataset
by: Yoo, Haneul, et al.
Published: (2026)
by: Yoo, Haneul, et al.
Published: (2026)
In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models
by: Earle, Sam, et al.
Published: (2026)
by: Earle, Sam, et al.
Published: (2026)
CausalVLBench: Benchmarking Visual Causal Reasoning in Large Vision-Language Models
by: Komanduri, Aneesh, et al.
Published: (2025)
by: Komanduri, Aneesh, et al.
Published: (2025)
MiRD: Reliable Set-Valued Prediction for Open-Ended Question Answering via Miscoverage Risk Decomposition
by: Hu, Anqi, et al.
Published: (2026)
by: Hu, Anqi, et al.
Published: (2026)
Enhancing Commentary Strategies for Imperfect Information Card Games: A Study of Large Language Models in Guandan Commentary
by: Tao, Meiling, et al.
Published: (2024)
by: Tao, Meiling, et al.
Published: (2024)
The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation
by: Carlsson, Fredrik, et al.
Published: (2024)
by: Carlsson, Fredrik, et al.
Published: (2024)
Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading
by: Hadar, Cfir Avraham, et al.
Published: (2025)
by: Hadar, Cfir Avraham, et al.
Published: (2025)
Reverse-Engineered Reasoning for Open-Ended Generation
by: Wang, Haozhe, et al.
Published: (2025)
by: Wang, Haozhe, et al.
Published: (2025)
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
by: Li, Hengzhi, et al.
Published: (2025)
by: Li, Hengzhi, et al.
Published: (2025)
AIn't Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation
by: von der Heyde, Leah, et al.
Published: (2025)
by: von der Heyde, Leah, et al.
Published: (2025)
DIVERGE: Diversity-Enhanced RAG for Open-Ended Information Seeking
by: Hu, Tianyi, et al.
Published: (2026)
by: Hu, Tianyi, et al.
Published: (2026)
Towards Open-Ended Discovery for Low-Resource NLP
by: Dossou, Bonaventure F. P., et al.
Published: (2025)
by: Dossou, Bonaventure F. P., et al.
Published: (2025)
Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models
by: Wu, Cheng-Kuang, et al.
Published: (2025)
by: Wu, Cheng-Kuang, et al.
Published: (2025)
Assessing Bias in Metric Models for LLM Open-Ended Generation Bias Benchmarks
by: Demchak, Nathaniel, et al.
Published: (2024)
by: Demchak, Nathaniel, et al.
Published: (2024)
AuditWen:An Open-Source Large Language Model for Audit
by: Huang, Jiajia, et al.
Published: (2024)
by: Huang, Jiajia, et al.
Published: (2024)
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
by: Ignat, Oana, et al.
Published: (2023)
by: Ignat, Oana, et al.
Published: (2023)
Enhancing Large Language Models with Pseudo- and Multisource- Knowledge Graphs for Open-ended Question Answering
by: Liu, Jiaxiang, et al.
Published: (2024)
by: Liu, Jiaxiang, et al.
Published: (2024)
ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents
by: Shao, Jie-Jing, et al.
Published: (2024)
by: Shao, Jie-Jing, et al.
Published: (2024)
Measuring Stability Beyond Accuracy in Small Open-Source Medical Large Language Models for Pediatric Endocrinology
by: D'Amario, Vanessa, et al.
Published: (2025)
by: D'Amario, Vanessa, et al.
Published: (2025)
Second Guess: Detecting Uncertainty Through Abstention and Answer Stability in Small Language Models
by: Aravindan, Ashwath Vaithinathan, et al.
Published: (2026)
by: Aravindan, Ashwath Vaithinathan, et al.
Published: (2026)
Generating Planning Feedback for Open-Ended Programming Exercises with LLMs
by: Demirtaş, Mehmet Arif, et al.
Published: (2025)
by: Demirtaş, Mehmet Arif, et al.
Published: (2025)
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
by: Samvelyan, Mikayel, et al.
Published: (2024)
by: Samvelyan, Mikayel, et al.
Published: (2024)
Ask Good Questions for Large Language Models
by: Wu, Qi, et al.
Published: (2025)
by: Wu, Qi, et al.
Published: (2025)
Generation Space Size: Understanding and Calibrating Open-Endedness of LLM Generations
by: Yu, Sunny, et al.
Published: (2025)
by: Yu, Sunny, et al.
Published: (2025)
Measuring and Eliminating Refusals in Military Large Language Models
by: FitzGerald, Jack, et al.
Published: (2026)
by: FitzGerald, Jack, et al.
Published: (2026)
On the Temporal Question-Answering Capabilities of Large Language Models Over Anonymized Data
by: Ruiz, Alfredo Garrachón, et al.
Published: (2025)
by: Ruiz, Alfredo Garrachón, et al.
Published: (2025)
MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding
by: Jiang, Mohan, et al.
Published: (2025)
by: Jiang, Mohan, et al.
Published: (2025)
A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions
by: Shorinwa, Ola, et al.
Published: (2024)
by: Shorinwa, Ola, et al.
Published: (2024)
Enhancing Large Language Model Performance To Answer Questions and Extract Information More Accurately
by: Zhang, Liang, et al.
Published: (2024)
by: Zhang, Liang, et al.
Published: (2024)
Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering
by: Sun, Hongda, et al.
Published: (2024)
by: Sun, Hongda, et al.
Published: (2024)
Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning
by: Ye, Zhiling, et al.
Published: (2025)
by: Ye, Zhiling, et al.
Published: (2025)
Similar Items
-
Open-Ended Wargames with Large Language Models
by: Hogan, Daniel P., et al.
Published: (2024) -
O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering
by: Mei, Jianbiao, et al.
Published: (2025) -
MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question Generation
by: Deroy, Aniket, et al.
Published: (2024) -
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
by: Shaik, Hashmath, et al.
Published: (2024) -
Flexible Agent Alignment with Goal Inference from Open-Ended Dialog
by: Ma, Rachel, et al.
Published: (2025)