:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jian, Mingyue, Siddharth, N.
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2411.01562
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs
by: Aswal, Darpan, et al.
Published: (2025)

Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
by: Karia, Rushang, et al.
Published: (2024)

Does GPT-4 surpass human performance in linguistic pragmatics?
by: Bojic, Ljubisa, et al.
Published: (2023)

Addressing speaker gender bias in large scale speech translation systems
by: Bansal, Shubham, et al.
Published: (2025)

Profiling learners' affective engagement: Emotion AI, intercultural pragmatics, and language learning
by: Godwin-Jones, Robert
Published: (2026)

Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apology
by: Yu, Danni, et al.
Published: (2023)

$\forall$uto$\exists$val: Autonomous Assessment of LLMs in Formal Synthesis and Interpretation Tasks
by: Karia, Rushang, et al.
Published: (2024)

Evaluating LLMs for Hardware Design and Test
by: Blocklove, Jason, et al.
Published: (2024)

Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables
by: Zhou, Yitong, et al.
Published: (2025)

Serialized EHR make for good text representations
by: Chou, Zhirong, et al.
Published: (2025)

Rethinking LLM Bias Probing Using Lessons from the Social Sciences
by: Morehouse, Kirsten N., et al.
Published: (2025)

Brotherhood at WMT 2024: Leveraging LLM-Generated Contextual Conversations for Cross-Lingual Image Captioning
by: Betala, Siddharth, et al.
Published: (2024)

HALT-RAG: A Task-Adaptable Framework for Hallucination Detection with Calibrated NLI Ensembles and Abstention
by: Goswami, Saumya, et al.
Published: (2025)

Which LLMs Get the Joke? Probing Non-STEM Reasoning Abilities with HumorBench
by: Narad, Reuben, et al.
Published: (2025)

How good is GPT at writing political speeches for the White House?
by: Savoy, Jacques
Published: (2024)

LLMs on a Budget? Say HOLA
by: Siddiqui, Zohaib Hasan, et al.
Published: (2025)

REAMS: Reasoning Enhanced Algorithm for Maths Solving
by: Singh, Eishkaran, et al.
Published: (2025)

Can we trust AI to detect healthy multilingual English speakers among the cognitively impaired cohort in the UK? An investigation using real-world conversational speech
by: Pahar, Madhurananda, et al.
Published: (2026)

Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs
by: Luo, Hang, et al.
Published: (2025)

Enhancing Public Speaking Skills in Engineering Students Through AI
by: Harsh, Amol, et al.
Published: (2025)

How good is my story? Towards quantitative metrics for evaluating LLM-generated XAI narratives
by: Ichmoukhamedov, Timour, et al.
Published: (2024)

Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use
by: Chandra, Mohit, et al.
Published: (2024)

The Diminishing Returns of Early-Exit Decoding in Modern LLMs
by: Wei, Rui, et al.
Published: (2026)

Hybrid-NL2SVA: Integrating RAG and Finetuning for LLM-based NL2SVA
by: Xiao, Weihua, et al.
Published: (2025)

From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation
by: Zhou, Chengliang, et al.
Published: (2025)

IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings
by: Patel, Shubham, et al.
Published: (2024)

VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka
by: Chen, Li-Wei, et al.
Published: (2024)

Parallelograms Strike Back: LLMs Generate Better Analogies than People
by: Liu, Qiawen Ella, et al.
Published: (2026)

Learning Evidence Highlighting for Frozen LLMs
by: Li, Shaoang, et al.
Published: (2026)

Discerning minds or generic tutors? Evaluating instructional guidance capabilities in Socratic LLMs
by: Liu, Ying, et al.
Published: (2025)

HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation
by: Ouyang, Jie, et al.
Published: (2025)

Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors
by: Wang, Jian, et al.
Published: (2025)

MedEthicsQA: A Comprehensive Question Answering Benchmark for Medical Ethics Evaluation of LLMs
by: Wei, Jianhui, et al.
Published: (2025)

Psychological Steering in LLMs: An Evaluation of Effectiveness and Trustworthiness
by: Banayeeanzade, Amin, et al.
Published: (2025)

Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs
by: Zhou, Kuan Lok, et al.
Published: (2025)

Do LLMs Triage Like Clinicians? A Dynamic Study of Outpatient Referral
by: Liu, Xiaoxiao, et al.
Published: (2025)

Multimodal Forecasting of Sparse Intraoperative Hypotension Events Powered by Language Model
by: Zhang, Jintao, et al.
Published: (2025)

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
by: Zhang, Yuheng, et al.
Published: (2024)

Where is this coming from? Making groundedness count in the evaluation of Document VQA models
by: Nourbakhsh, Armineh, et al.
Published: (2025)

ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases
by: Li, Yuchong, et al.
Published: (2025)