Saved in:
| Main Authors: | Rammouz, Veronica, Gonzalez, Aaron, Cruzportillo, Carlos, Tan, Adrian, Beebe, Nicole, Rios, Anthony |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.09519 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Telling Speculative Stories to Help Humans Imagine the Harms of Healthcare AI
by: Zhao, Xingmeng, et al.
Published: (2025)
by: Zhao, Xingmeng, et al.
Published: (2025)
Can We Predict Performance of Large Models across Vision-Language Tasks?
by: Zhao, Qinyu, et al.
Published: (2024)
by: Zhao, Qinyu, et al.
Published: (2024)
Can We Trust Machine Learning? The Reliability of Features from Open-Source Speech Analysis Tools for Speech Modeling
by: Chowdhury, Tahiya, et al.
Published: (2025)
by: Chowdhury, Tahiya, et al.
Published: (2025)
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet
by: Atil, Berk, et al.
Published: (2025)
by: Atil, Berk, et al.
Published: (2025)
Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths?
by: Chatrath, Veronica, et al.
Published: (2024)
by: Chatrath, Veronica, et al.
Published: (2024)
Crossing Domains without Labels: Distant Supervision for Term Extraction
by: Senger, Elena, et al.
Published: (2025)
by: Senger, Elena, et al.
Published: (2025)
Can We Afford The Perfect Prompt? Balancing Cost and Accuracy with the Economical Prompting Index
by: McDonald, Tyler, et al.
Published: (2024)
by: McDonald, Tyler, et al.
Published: (2024)
Lateral Phishing With Large Language Models: A Large Organization Comparative Study
by: Bethany, Mazal, et al.
Published: (2024)
by: Bethany, Mazal, et al.
Published: (2024)
When Can We Trust LLMs in Mental Health? Large-Scale Benchmarks for Reliable LLM Evaluation
by: Badawi, Abeer, et al.
Published: (2025)
by: Badawi, Abeer, et al.
Published: (2025)
LLM Compression: How Far Can We Go in Balancing Size and Performance?
by: Sk, Sahil, et al.
Published: (2025)
by: Sk, Sahil, et al.
Published: (2025)
Instruction-tuned Large Language Models for Machine Translation in the Medical Domain
by: Rios, Miguel
Published: (2024)
by: Rios, Miguel
Published: (2024)
Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
by: Dedhia, Bhishma, et al.
Published: (2025)
by: Dedhia, Bhishma, et al.
Published: (2025)
How Much of Your Data Can Suck? Thresholds for Domain Performance and Emergent Misalignment in LLMs
by: Ouyang, Jian, et al.
Published: (2025)
by: Ouyang, Jian, et al.
Published: (2025)
Can We Trust the Performance Evaluation of Uncertainty Estimation Methods in Text Summarization?
by: He, Jianfeng, et al.
Published: (2024)
by: He, Jianfeng, et al.
Published: (2024)
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
by: Fang, Qingkai, et al.
Published: (2024)
by: Fang, Qingkai, et al.
Published: (2024)
Exploring the Performance of ML/DL Architectures on the MNIST-1D Dataset
by: Beebe, Michael, et al.
Published: (2026)
by: Beebe, Michael, et al.
Published: (2026)
Can Language Models Represent the Past without Anachronism?
by: Underwood, Ted, et al.
Published: (2025)
by: Underwood, Ted, et al.
Published: (2025)
LLMs as Data Annotators: How Close Are We to Human Performance
by: Haq, Muhammad Uzair Ul, et al.
Published: (2025)
by: Haq, Muhammad Uzair Ul, et al.
Published: (2025)
Extracting Biomedical Entities from Noisy Audio Transcripts
by: Ebadi, Nima, et al.
Published: (2024)
by: Ebadi, Nima, et al.
Published: (2024)
How Much Can We Forget about Data Contamination?
by: Bordt, Sebastian, et al.
Published: (2024)
by: Bordt, Sebastian, et al.
Published: (2024)
A Comprehensive Study of Gender Bias in Chemical Named Entity Recognition Models
by: Zhao, Xingmeng, et al.
Published: (2022)
by: Zhao, Xingmeng, et al.
Published: (2022)
Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary
by: Zhao, Xingmeng, et al.
Published: (2024)
by: Zhao, Xingmeng, et al.
Published: (2024)
Can We Trust LLM Detectors?
by: Sandhan, Jivnesh, et al.
Published: (2026)
by: Sandhan, Jivnesh, et al.
Published: (2026)
Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach
by: Ford, James, et al.
Published: (2025)
by: Ford, James, et al.
Published: (2025)
Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems
by: Klisura, Đorđe, et al.
Published: (2024)
by: Klisura, Đorđe, et al.
Published: (2024)
Team UTSA-NLP at SemEval 2024 Task 5: Prompt Ensembling for Argument Reasoning in Civil Procedures with GPT4
by: Schumacher, Dan, et al.
Published: (2024)
by: Schumacher, Dan, et al.
Published: (2024)
Based on Data Balancing and Model Improvement for Multi-Label Sentiment Classification Performance Enhancement
by: Su, Zijin, et al.
Published: (2025)
by: Su, Zijin, et al.
Published: (2025)
Can We Evaluate Domain Adaptation Models Without Target-Domain Labels?
by: Yang, Jianfei, et al.
Published: (2023)
by: Yang, Jianfei, et al.
Published: (2023)
Two Directions for Clinical Data Generation with Large Language Models: Data-to-Label and Label-to-Data
by: Li, Rumeng, et al.
Published: (2023)
by: Li, Rumeng, et al.
Published: (2023)
Can We Infer Confidential Properties of Training Data from LLMs?
by: Huang, Pengrun, et al.
Published: (2025)
by: Huang, Pengrun, et al.
Published: (2025)
Ranking Large Language Models without Ground Truth
by: Dhurandhar, Amit, et al.
Published: (2024)
by: Dhurandhar, Amit, et al.
Published: (2024)
How Far Can We Extract Diverse Perspectives from Large Language Models?
by: Hayati, Shirley Anugrah, et al.
Published: (2023)
by: Hayati, Shirley Anugrah, et al.
Published: (2023)
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
by: Wang, Shaobo, et al.
Published: (2025)
by: Wang, Shaobo, et al.
Published: (2025)
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
by: Zverev, Egor, et al.
Published: (2024)
by: Zverev, Egor, et al.
Published: (2024)
LabelCoRank: Revolutionizing Long Tail Multi-Label Classification with Co-Occurrence Reranking
by: Yan, Yan, et al.
Published: (2025)
by: Yan, Yan, et al.
Published: (2025)
Editing Arbitrary Propositions in LLMs without Subject Labels
by: Feigenbaum, Itai, et al.
Published: (2024)
by: Feigenbaum, Itai, et al.
Published: (2024)
Can We Locate and Prevent Stereotypes in LLMs?
by: D'Souza, Alex
Published: (2026)
by: D'Souza, Alex
Published: (2026)
Generative Pseudo-Labeling for Pre-Ranking with LLMs
by: Bi, Junyu, et al.
Published: (2026)
by: Bi, Junyu, et al.
Published: (2026)
Leveraging Interview-Informed LLMs to Model Survey Responses: Comparative Insights from AI-Generated and Human Data
by: Zhang, Jihong, et al.
Published: (2025)
by: Zhang, Jihong, et al.
Published: (2025)
Ranking Over Scoring: Towards Reliable and Robust Automated Evaluation of LLM-Generated Medical Explanatory Arguments
by: De la Iglesia, Iker, et al.
Published: (2024)
by: De la Iglesia, Iker, et al.
Published: (2024)
Similar Items
-
Telling Speculative Stories to Help Humans Imagine the Harms of Healthcare AI
by: Zhao, Xingmeng, et al.
Published: (2025) -
Can We Predict Performance of Large Models across Vision-Language Tasks?
by: Zhao, Qinyu, et al.
Published: (2024) -
Can We Trust Machine Learning? The Reliability of Features from Open-Source Speech Analysis Tools for Speech Modeling
by: Chowdhury, Tahiya, et al.
Published: (2025) -
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet
by: Atil, Berk, et al.
Published: (2025) -
Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths?
by: Chatrath, Veronica, et al.
Published: (2024)