Saved in:
| Main Authors: | Jian, Mingyue, Siddharth, N. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.01562 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs
by: Aswal, Darpan, et al.
Published: (2025)
by: Aswal, Darpan, et al.
Published: (2025)
Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
by: Karia, Rushang, et al.
Published: (2024)
by: Karia, Rushang, et al.
Published: (2024)
Does GPT-4 surpass human performance in linguistic pragmatics?
by: Bojic, Ljubisa, et al.
Published: (2023)
by: Bojic, Ljubisa, et al.
Published: (2023)
Addressing speaker gender bias in large scale speech translation systems
by: Bansal, Shubham, et al.
Published: (2025)
by: Bansal, Shubham, et al.
Published: (2025)
Profiling learners' affective engagement: Emotion AI, intercultural pragmatics, and language learning
by: Godwin-Jones, Robert
Published: (2026)
by: Godwin-Jones, Robert
Published: (2026)
Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apology
by: Yu, Danni, et al.
Published: (2023)
by: Yu, Danni, et al.
Published: (2023)
$\forall$uto$\exists$val: Autonomous Assessment of LLMs in Formal Synthesis and Interpretation Tasks
by: Karia, Rushang, et al.
Published: (2024)
by: Karia, Rushang, et al.
Published: (2024)
Evaluating LLMs for Hardware Design and Test
by: Blocklove, Jason, et al.
Published: (2024)
by: Blocklove, Jason, et al.
Published: (2024)
Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables
by: Zhou, Yitong, et al.
Published: (2025)
by: Zhou, Yitong, et al.
Published: (2025)
Serialized EHR make for good text representations
by: Chou, Zhirong, et al.
Published: (2025)
by: Chou, Zhirong, et al.
Published: (2025)
Rethinking LLM Bias Probing Using Lessons from the Social Sciences
by: Morehouse, Kirsten N., et al.
Published: (2025)
by: Morehouse, Kirsten N., et al.
Published: (2025)
Brotherhood at WMT 2024: Leveraging LLM-Generated Contextual Conversations for Cross-Lingual Image Captioning
by: Betala, Siddharth, et al.
Published: (2024)
by: Betala, Siddharth, et al.
Published: (2024)
HALT-RAG: A Task-Adaptable Framework for Hallucination Detection with Calibrated NLI Ensembles and Abstention
by: Goswami, Saumya, et al.
Published: (2025)
by: Goswami, Saumya, et al.
Published: (2025)
Which LLMs Get the Joke? Probing Non-STEM Reasoning Abilities with HumorBench
by: Narad, Reuben, et al.
Published: (2025)
by: Narad, Reuben, et al.
Published: (2025)
How good is GPT at writing political speeches for the White House?
by: Savoy, Jacques
Published: (2024)
by: Savoy, Jacques
Published: (2024)
LLMs on a Budget? Say HOLA
by: Siddiqui, Zohaib Hasan, et al.
Published: (2025)
by: Siddiqui, Zohaib Hasan, et al.
Published: (2025)
REAMS: Reasoning Enhanced Algorithm for Maths Solving
by: Singh, Eishkaran, et al.
Published: (2025)
by: Singh, Eishkaran, et al.
Published: (2025)
Can we trust AI to detect healthy multilingual English speakers among the cognitively impaired cohort in the UK? An investigation using real-world conversational speech
by: Pahar, Madhurananda, et al.
Published: (2026)
by: Pahar, Madhurananda, et al.
Published: (2026)
Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs
by: Luo, Hang, et al.
Published: (2025)
by: Luo, Hang, et al.
Published: (2025)
Enhancing Public Speaking Skills in Engineering Students Through AI
by: Harsh, Amol, et al.
Published: (2025)
by: Harsh, Amol, et al.
Published: (2025)
How good is my story? Towards quantitative metrics for evaluating LLM-generated XAI narratives
by: Ichmoukhamedov, Timour, et al.
Published: (2024)
by: Ichmoukhamedov, Timour, et al.
Published: (2024)
Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use
by: Chandra, Mohit, et al.
Published: (2024)
by: Chandra, Mohit, et al.
Published: (2024)
The Diminishing Returns of Early-Exit Decoding in Modern LLMs
by: Wei, Rui, et al.
Published: (2026)
by: Wei, Rui, et al.
Published: (2026)
Hybrid-NL2SVA: Integrating RAG and Finetuning for LLM-based NL2SVA
by: Xiao, Weihua, et al.
Published: (2025)
by: Xiao, Weihua, et al.
Published: (2025)
From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation
by: Zhou, Chengliang, et al.
Published: (2025)
by: Zhou, Chengliang, et al.
Published: (2025)
IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings
by: Patel, Shubham, et al.
Published: (2024)
by: Patel, Shubham, et al.
Published: (2024)
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka
by: Chen, Li-Wei, et al.
Published: (2024)
by: Chen, Li-Wei, et al.
Published: (2024)
Parallelograms Strike Back: LLMs Generate Better Analogies than People
by: Liu, Qiawen Ella, et al.
Published: (2026)
by: Liu, Qiawen Ella, et al.
Published: (2026)
Learning Evidence Highlighting for Frozen LLMs
by: Li, Shaoang, et al.
Published: (2026)
by: Li, Shaoang, et al.
Published: (2026)
Discerning minds or generic tutors? Evaluating instructional guidance capabilities in Socratic LLMs
by: Liu, Ying, et al.
Published: (2025)
by: Liu, Ying, et al.
Published: (2025)
HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation
by: Ouyang, Jie, et al.
Published: (2025)
by: Ouyang, Jie, et al.
Published: (2025)
Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors
by: Wang, Jian, et al.
Published: (2025)
by: Wang, Jian, et al.
Published: (2025)
MedEthicsQA: A Comprehensive Question Answering Benchmark for Medical Ethics Evaluation of LLMs
by: Wei, Jianhui, et al.
Published: (2025)
by: Wei, Jianhui, et al.
Published: (2025)
Psychological Steering in LLMs: An Evaluation of Effectiveness and Trustworthiness
by: Banayeeanzade, Amin, et al.
Published: (2025)
by: Banayeeanzade, Amin, et al.
Published: (2025)
Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs
by: Zhou, Kuan Lok, et al.
Published: (2025)
by: Zhou, Kuan Lok, et al.
Published: (2025)
Do LLMs Triage Like Clinicians? A Dynamic Study of Outpatient Referral
by: Liu, Xiaoxiao, et al.
Published: (2025)
by: Liu, Xiaoxiao, et al.
Published: (2025)
Multimodal Forecasting of Sparse Intraoperative Hypotension Events Powered by Language Model
by: Zhang, Jintao, et al.
Published: (2025)
by: Zhang, Jintao, et al.
Published: (2025)
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
by: Zhang, Yuheng, et al.
Published: (2024)
by: Zhang, Yuheng, et al.
Published: (2024)
Where is this coming from? Making groundedness count in the evaluation of Document VQA models
by: Nourbakhsh, Armineh, et al.
Published: (2025)
by: Nourbakhsh, Armineh, et al.
Published: (2025)
ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases
by: Li, Yuchong, et al.
Published: (2025)
by: Li, Yuchong, et al.
Published: (2025)
Similar Items
-
Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs
by: Aswal, Darpan, et al.
Published: (2025) -
Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
by: Karia, Rushang, et al.
Published: (2024) -
Does GPT-4 surpass human performance in linguistic pragmatics?
by: Bojic, Ljubisa, et al.
Published: (2023) -
Addressing speaker gender bias in large scale speech translation systems
by: Bansal, Shubham, et al.
Published: (2025) -
Profiling learners' affective engagement: Emotion AI, intercultural pragmatics, and language learning
by: Godwin-Jones, Robert
Published: (2026)