Saved in:
| Main Authors: | Xu, Yiming, Jiao, Junfeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.17527 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
by: Zhang, Yuhui, et al.
Published: (2025)
by: Zhang, Yuhui, et al.
Published: (2025)
Defining and Evaluating Physical Safety for Large Language Models
by: Tang, Yung-Chen, et al.
Published: (2024)
by: Tang, Yung-Chen, et al.
Published: (2024)
Evaluating Large Language Models for Fair and Reliable Organ Allocation
by: Kim, Brian Hyeongseok, et al.
Published: (2025)
by: Kim, Brian Hyeongseok, et al.
Published: (2025)
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
by: Sehwag, Udari Madhushani, et al.
Published: (2025)
by: Sehwag, Udari Madhushani, et al.
Published: (2025)
Free to play: UN Trade and Development's experience with developing its own open-source Retrieval Augmented Generation Large Language Model application
by: Hopp, Daniel
Published: (2024)
by: Hopp, Daniel
Published: (2024)
Hypothesis Generation with Large Language Models
by: Zhou, Yangqiaoyu, et al.
Published: (2024)
by: Zhou, Yangqiaoyu, et al.
Published: (2024)
Large Language Models for Travel Behavior Prediction
by: Mo, Baichuan, et al.
Published: (2023)
by: Mo, Baichuan, et al.
Published: (2023)
New Curriculum, New Chance -- Retrieval Augmented Generation for Lesson Planning in Ugandan Secondary Schools. Prototype Quality Evaluation
by: Kloker, Simon, et al.
Published: (2024)
by: Kloker, Simon, et al.
Published: (2024)
Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke
by: Kapoor, Anjali K., et al.
Published: (2026)
by: Kapoor, Anjali K., et al.
Published: (2026)
Predicting Learning Performance with Large Language Models: A Study in Adult Literacy
by: Zhang, Liang, et al.
Published: (2024)
by: Zhang, Liang, et al.
Published: (2024)
From Data to Behavior: Predicting Unintended Model Behaviors Before Training
by: Wang, Mengru, et al.
Published: (2026)
by: Wang, Mengru, et al.
Published: (2026)
Improving Consistency in Retrieval-Augmented Systems with Group Similarity Rewards
by: Hamman, Faisal, et al.
Published: (2025)
by: Hamman, Faisal, et al.
Published: (2025)
Evaluation of Machine Learning Models in Student Academic Performance Prediction
by: Sandeepa, A. G. R., et al.
Published: (2025)
by: Sandeepa, A. G. R., et al.
Published: (2025)
Richer Output for Richer Countries: Uncovering Geographical Disparities in Generated Stories and Travel Recommendations
by: Bhagat, Kirti, et al.
Published: (2024)
by: Bhagat, Kirti, et al.
Published: (2024)
Prediction-Powered Ranking of Large Language Models
by: Chatzi, Ivi, et al.
Published: (2024)
by: Chatzi, Ivi, et al.
Published: (2024)
Holistically Evaluating the Environmental Impact of Creating Language Models
by: Morrison, Jacob, et al.
Published: (2025)
by: Morrison, Jacob, et al.
Published: (2025)
Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy
by: Duffy, Alexander, et al.
Published: (2025)
by: Duffy, Alexander, et al.
Published: (2025)
Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation
by: Wang, Jiawei, et al.
Published: (2024)
by: Wang, Jiawei, et al.
Published: (2024)
Existential Conversations with Large Language Models: Content, Community, and Culture
by: Shanahan, Murray, et al.
Published: (2024)
by: Shanahan, Murray, et al.
Published: (2024)
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
by: Xu, Tianyang, et al.
Published: (2025)
by: Xu, Tianyang, et al.
Published: (2025)
Mastery Guided Non-parametric Clustering to Scale-up Strategy Prediction
by: Shakya, Anup, et al.
Published: (2024)
by: Shakya, Anup, et al.
Published: (2024)
Correlated Errors in Large Language Models
by: Kim, Elliot, et al.
Published: (2025)
by: Kim, Elliot, et al.
Published: (2025)
Large Language Models are Geographically Biased
by: Manvi, Rohin, et al.
Published: (2024)
by: Manvi, Rohin, et al.
Published: (2024)
Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models
by: Kumar, Abhishek, et al.
Published: (2024)
by: Kumar, Abhishek, et al.
Published: (2024)
Leveraging Large Language Models for Tacit Knowledge Discovery in Organizational Contexts
by: Zuin, Gianlucca, et al.
Published: (2025)
by: Zuin, Gianlucca, et al.
Published: (2025)
Participatory Assessment of Large Language Model Applications in an Academic Medical Center
by: Carra, Giorgia, et al.
Published: (2024)
by: Carra, Giorgia, et al.
Published: (2024)
Exploration of Adolescent Depression Risk Prediction Based on Census Surveys and General Life Issues
by: Li, Qiang, et al.
Published: (2024)
by: Li, Qiang, et al.
Published: (2024)
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
by: Xu, Xin, et al.
Published: (2025)
by: Xu, Xin, et al.
Published: (2025)
The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models
by: Ensign, Danielle, et al.
Published: (2025)
by: Ensign, Danielle, et al.
Published: (2025)
Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models
by: Simbeck, Katharina, et al.
Published: (2025)
by: Simbeck, Katharina, et al.
Published: (2025)
Harnessing Large Language Models for Mental Health: Opportunities, Challenges, and Ethical Considerations
by: Pandey, Hari Mohan
Published: (2024)
by: Pandey, Hari Mohan
Published: (2024)
Psychological Counseling Ability of Large Language Models
by: Peng, Fangyu, et al.
Published: (2025)
by: Peng, Fangyu, et al.
Published: (2025)
Assessing Large Language Models on Climate Information
by: Bulian, Jannis, et al.
Published: (2023)
by: Bulian, Jannis, et al.
Published: (2023)
Adapting Mental Health Prediction Tasks for Cross-lingual Learning via Meta-Training and In-context Learning with Large Language Model
by: Lifelo, Zita, et al.
Published: (2024)
by: Lifelo, Zita, et al.
Published: (2024)
FrontierScience: Evaluating AI's Ability to Perform Expert-Level Scientific Tasks
by: Wang, Miles, et al.
Published: (2026)
by: Wang, Miles, et al.
Published: (2026)
Machine Learners Should Acknowledge the Legal Implications of Large Language Models as Personal Data
by: Nolte, Henrik, et al.
Published: (2025)
by: Nolte, Henrik, et al.
Published: (2025)
AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy
by: Schoenegger, Philipp, et al.
Published: (2024)
by: Schoenegger, Philipp, et al.
Published: (2024)
Data Augmentation via Diffusion Model to Enhance AI Fairness
by: Blow, Christina Hastings, et al.
Published: (2024)
by: Blow, Christina Hastings, et al.
Published: (2024)
OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction
by: Li, Zhonghang, et al.
Published: (2024)
by: Li, Zhonghang, et al.
Published: (2024)
A Comparative Benchmark of Federated Learning Strategies for Mortality Prediction on Heterogeneous and Imbalanced Clinical Data
by: Tertulino, Rodrigo
Published: (2025)
by: Tertulino, Rodrigo
Published: (2025)
Similar Items
-
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
by: Zhang, Yuhui, et al.
Published: (2025) -
Defining and Evaluating Physical Safety for Large Language Models
by: Tang, Yung-Chen, et al.
Published: (2024) -
Evaluating Large Language Models for Fair and Reliable Organ Allocation
by: Kim, Brian Hyeongseok, et al.
Published: (2025) -
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
by: Sehwag, Udari Madhushani, et al.
Published: (2025) -
Free to play: UN Trade and Development's experience with developing its own open-source Retrieval Augmented Generation Large Language Model application
by: Hopp, Daniel
Published: (2024)