:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Cai, Dou, Yao, Heineman, David, Wu, Xiaofeng, Xu, Wei
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2508.10421
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improving Minimum Bayes Risk Decoding with Multi-Prompt
by: Heineman, David, et al.
Published: (2024)

Evaluating Large Language Models on Urdu Idiom Translation
by: Khan, Muhammad Farmal, et al.
Published: (2025)

Gavel: Agent Meets Checklist for Evaluating LLMs on Long-Context Legal Summarization
by: Dou, Yao, et al.
Published: (2026)

Creative and Context-Aware Translation of East Asian Idioms with GPT-4
by: Tang, Kenan, et al.
Published: (2024)

It's Not a Walk in the Park! Challenges of Idiom Translation in Speech-to-text Systems
by: Zaitova, Iuliia, et al.
Published: (2025)

The Impact of Visual Information in Chinese Characters: Evaluating Large Models' Ability to Recognize and Utilize Radicals
by: Wu, Xiaofeng, et al.
Published: (2024)

Memorization or Reasoning? Exploring the Idiom Understanding of LLMs
by: Kim, Jisu, et al.
Published: (2025)

Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese
by: Zhang, Jingshen, et al.
Published: (2024)

Chengyu-Bench: Benchmarking Large Language Models for Chinese Idiom Understanding and Use
by: Fu, Yicheng, et al.
Published: (2025)

Large Language Models for Persian $ \leftrightarrow $ English Idiom Translation
by: Rezaeimanesh, Sara, et al.
Published: (2024)

Towards a Path Dependent Account of Category Fluency
by: Heineman, David, et al.
Published: (2024)

Idiom Understanding as a Tool to Measure the Dialect Gap
by: Beauchemin, David, et al.
Published: (2025)

Idiom Detection in Sorani Kurdish Texts
by: Omer, Skala Kamaran, et al.
Published: (2025)

A Rising Tide Lifts All Boats: MTQE Rewards for Idioms Improve General Translation Quality
by: Agarwal, Ishika, et al.
Published: (2026)

DualCoTs: Dual Chain-of-Thoughts Prompting for Sentiment Lexicon Expansion of Idioms
by: Niu, Fuqiang, et al.
Published: (2024)

Tabular Data Understanding with LLMs: A Survey of Recent Advances and Challenges
by: Wu, Xiaofeng, et al.
Published: (2025)

How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
by: Zhang, Ran, et al.
Published: (2024)

NLP Datasets for Idiom and Figurative Language Tasks
by: Matheny, Blake, et al.
Published: (2025)

A Survey of Idiom Datasets for Psycholinguistic and Computational Research
by: Flor, Michael, et al.
Published: (2025)

Killing Two Flies with One Stone: An Attempt to Break LLMs Using English->Icelandic Idioms and Proper Names
by: Ármannsson, Bjarki, et al.
Published: (2024)

DMDTEval: An Evaluation and Analysis of LLMs on Disambiguation in Multi-domain Translation
by: Man, Zhibo, et al.
Published: (2025)

Exploring Safety Alignment Evaluation of LLMs in Chinese Mental Health Dialogues via LLM-as-Judge
by: Cai, Yunna, et al.
Published: (2025)

ClinConsensus: A Physician-Calibrated Benchmark for Evaluating Clinical Rubric Coverage in Chinese Medical LLMs
by: Zheng, Xiang, et al.
Published: (2026)

Comparative Study of Multilingual Idioms and Similes in Large Language Models
by: Khoshtab, Paria, et al.
Published: (2024)

Can LLMs Act as Historians? Evaluating Historical Research Capabilities of LLMs via the Chinese Imperial Examination
by: Gao, Lirong, et al.
Published: (2026)

Visual Puns from Idioms: An Iterative LLM-T2IM-MLLM Framework
by: Xiao, Kelaiti, et al.
Published: (2025)

AC-EVAL: Evaluating Ancient Chinese Language Understanding in Large Language Models
by: Wei, Yuting, et al.
Published: (2024)

Benchmarking Machine Translation on Chinese Social Media Texts
by: Zhao, Kaiyan, et al.
Published: (2026)

Machine Translation Evaluation Benchmark for Wu Chinese: Workflow and Analysis
by: Yu, Hongjian, et al.
Published: (2024)

CTourLLM: Enhancing LLMs with Chinese Tourism Knowledge
by: Wei, Qikai, et al.
Published: (2024)

General2Specialized LLMs Translation for E-commerce
by: Chen, Kaidi, et al.
Published: (2024)

Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving
by: Chen, Andong, et al.
Published: (2024)

SimulatorArena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?
by: Dou, Yao, et al.
Published: (2025)

Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation
by: Heineman, David, et al.
Published: (2025)

Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations
by: Sun, Jiaxing, et al.
Published: (2024)

Benchmarking the Detection of LLMs-Generated Modern Chinese Poetry
by: Wang, Shanshan, et al.
Published: (2025)

When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms
by: Sakhawat, Adib, et al.
Published: (2026)

A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models
by: Fornaciari, Francesca De Luca, et al.
Published: (2024)

Anatomy of an Idiom: Tracing Non-Compositionality in Language Models
by: Gomes, Andrew
Published: (2025)

Unveiling the Competitive Dynamics: A Comparative Evaluation of American and Chinese LLMs
by: Jiang, Zhenhui, et al.
Published: (2024)