:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zheng, Jonathan, Ritter, Alan, Xu, Wei
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2402.12261
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models
by: Chen, Kai, et al.
Published: (2025)

From 124 Million Tokens to 1,021 Neologisms: A Large-Scale Pipeline for Automatic Neologism Detection
by: Rossini, Diego, et al.
Published: (2026)

Probabilistic Reasoning with LLMs for k-anonymity Estimation
by: Zheng, Jonathan, et al.
Published: (2025)

Having Beer after Prayer? Measuring Cultural Bias in Large Language Models
by: Naous, Tarek, et al.
Published: (2023)

Neologism Learning for Controllability and Self-Verbalization
by: Hewitt, John, et al.
Published: (2025)

MaterialBENCH: Evaluating College-Level Materials Science Problem-Solving Abilities of Large Language Models
by: Yoshitake, Michiko, et al.
Published: (2024)

Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding
by: Guo, Ruohao, et al.
Published: (2023)

DICE-BENCH: Evaluating the Tool-Use Capabilities of Large Language Models in Multi-Round, Multi-Party Dialogues
by: Jang, Kyochul, et al.
Published: (2025)

Anticipatory Evaluation of Language Models
by: Park, Jungsoo, et al.
Published: (2025)

EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models
by: Wang, Zekun, et al.
Published: (2025)

CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space
by: Hwang, Yeonjun, et al.
Published: (2026)

DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale
by: Zhang, Linghao, et al.
Published: (2025)

Reheat Nachos for Dinner? Evaluating AI Support for Cross-Cultural Communication of Neologisms
by: Ki, Dayeon, et al.
Published: (2026)

UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents
by: Ji, Yifan, et al.
Published: (2026)

What are Foundation Models Cooking in the Post-Soviet World?
by: Lavrouk, Anton, et al.
Published: (2025)

CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans
by: Lal, Yash Kumar, et al.
Published: (2024)

Neologism Learning as a Parameter-Efficient Alternative to Fine-Tuning for Model Steering
by: Park, Sungjoon, et al.
Published: (2025)

LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models
by: Yang, Runming, et al.
Published: (2024)

Learning to Route Languages for Multilingual Policy Optimization
by: Guo, Geyang, et al.
Published: (2026)

Language Models can Self-Improve at State-Value Estimation for Better Search
by: Mendes, Ethan, et al.
Published: (2025)

KOCO-BENCH: Can Large Language Models Leverage Domain Knowledge in Software Development?
by: Jiang, Xue, et al.
Published: (2026)

Granular Privacy Control for Geolocation with Vision Language Models
by: Mendes, Ethan, et al.
Published: (2024)

How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
by: Guo, Ruohao, et al.
Published: (2025)

Investigating and Alleviating Harm Amplification in LLM Interactions
by: Guo, Ruohao, et al.
Published: (2026)

Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation
by: Lavrouk, Anton, et al.
Published: (2024)

Tabular Data Understanding with LLMs: A Survey of Recent Advances and Challenges
by: Wu, Xiaofeng, et al.
Published: (2025)

Reducing Privacy Risks in Online Self-Disclosures with Language Models
by: Dou, Yao, et al.
Published: (2023)

Do LLMs Know What Luxembourgish Borrows? Probing Lexical Neology in Low-Resource Multilingual Models
by: Hosseini-Kivanani, Nina
Published: (2026)

NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning
by: Miao, Zhongtao, et al.
Published: (2026)

UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
by: Peng, Xiangyu, et al.
Published: (2025)

Frustratingly Easy Label Projection for Cross-lingual Transfer
by: Chen, Yang, et al.
Published: (2022)

NeoN: A Tool for Automated Detection, Linguistic and LLM-Driven Analysis of Neologisms in Polish
by: Tomaszewska, Aleksandra, et al.
Published: (2025)

Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making
by: Wu, Siyu, et al.
Published: (2024)

GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs
by: Rajabi, Navid, et al.
Published: (2024)

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning
by: Potamitis, Nearchos, et al.
Published: (2025)

Constrained Decoding for Cross-lingual Label Projection
by: Le, Duong Minh, et al.
Published: (2024)

Evaluating the Retrieval Robustness of Large Language Models
by: Cao, Shuyang, et al.
Published: (2025)

Contrastive Knowledge Transfer and Robust Optimization for Secure Alignment of Large Language Models
by: Zheng, Jiasen, et al.
Published: (2025)

Self-Specialization: Uncovering Latent Expertise within Large Language Models
by: Kang, Junmo, et al.
Published: (2023)

Lost in Execution: On the Multilingual Robustness of Tool Calling in Large Language Models
by: Luo, Zheng, et al.
Published: (2026)