Saved in:
| Main Authors: | Zheng, Jonathan, Ritter, Alan, Xu, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.12261 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models
by: Chen, Kai, et al.
Published: (2025)
by: Chen, Kai, et al.
Published: (2025)
From 124 Million Tokens to 1,021 Neologisms: A Large-Scale Pipeline for Automatic Neologism Detection
by: Rossini, Diego, et al.
Published: (2026)
by: Rossini, Diego, et al.
Published: (2026)
Probabilistic Reasoning with LLMs for k-anonymity Estimation
by: Zheng, Jonathan, et al.
Published: (2025)
by: Zheng, Jonathan, et al.
Published: (2025)
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models
by: Naous, Tarek, et al.
Published: (2023)
by: Naous, Tarek, et al.
Published: (2023)
Neologism Learning for Controllability and Self-Verbalization
by: Hewitt, John, et al.
Published: (2025)
by: Hewitt, John, et al.
Published: (2025)
MaterialBENCH: Evaluating College-Level Materials Science Problem-Solving Abilities of Large Language Models
by: Yoshitake, Michiko, et al.
Published: (2024)
by: Yoshitake, Michiko, et al.
Published: (2024)
Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding
by: Guo, Ruohao, et al.
Published: (2023)
by: Guo, Ruohao, et al.
Published: (2023)
DICE-BENCH: Evaluating the Tool-Use Capabilities of Large Language Models in Multi-Round, Multi-Party Dialogues
by: Jang, Kyochul, et al.
Published: (2025)
by: Jang, Kyochul, et al.
Published: (2025)
Anticipatory Evaluation of Language Models
by: Park, Jungsoo, et al.
Published: (2025)
by: Park, Jungsoo, et al.
Published: (2025)
EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models
by: Wang, Zekun, et al.
Published: (2025)
by: Wang, Zekun, et al.
Published: (2025)
CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space
by: Hwang, Yeonjun, et al.
Published: (2026)
by: Hwang, Yeonjun, et al.
Published: (2026)
DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale
by: Zhang, Linghao, et al.
Published: (2025)
by: Zhang, Linghao, et al.
Published: (2025)
Reheat Nachos for Dinner? Evaluating AI Support for Cross-Cultural Communication of Neologisms
by: Ki, Dayeon, et al.
Published: (2026)
by: Ki, Dayeon, et al.
Published: (2026)
UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents
by: Ji, Yifan, et al.
Published: (2026)
by: Ji, Yifan, et al.
Published: (2026)
What are Foundation Models Cooking in the Post-Soviet World?
by: Lavrouk, Anton, et al.
Published: (2025)
by: Lavrouk, Anton, et al.
Published: (2025)
CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans
by: Lal, Yash Kumar, et al.
Published: (2024)
by: Lal, Yash Kumar, et al.
Published: (2024)
Neologism Learning as a Parameter-Efficient Alternative to Fine-Tuning for Model Steering
by: Park, Sungjoon, et al.
Published: (2025)
by: Park, Sungjoon, et al.
Published: (2025)
LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models
by: Yang, Runming, et al.
Published: (2024)
by: Yang, Runming, et al.
Published: (2024)
Learning to Route Languages for Multilingual Policy Optimization
by: Guo, Geyang, et al.
Published: (2026)
by: Guo, Geyang, et al.
Published: (2026)
Language Models can Self-Improve at State-Value Estimation for Better Search
by: Mendes, Ethan, et al.
Published: (2025)
by: Mendes, Ethan, et al.
Published: (2025)
KOCO-BENCH: Can Large Language Models Leverage Domain Knowledge in Software Development?
by: Jiang, Xue, et al.
Published: (2026)
by: Jiang, Xue, et al.
Published: (2026)
Granular Privacy Control for Geolocation with Vision Language Models
by: Mendes, Ethan, et al.
Published: (2024)
by: Mendes, Ethan, et al.
Published: (2024)
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
by: Guo, Ruohao, et al.
Published: (2025)
by: Guo, Ruohao, et al.
Published: (2025)
Investigating and Alleviating Harm Amplification in LLM Interactions
by: Guo, Ruohao, et al.
Published: (2026)
by: Guo, Ruohao, et al.
Published: (2026)
Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation
by: Lavrouk, Anton, et al.
Published: (2024)
by: Lavrouk, Anton, et al.
Published: (2024)
Tabular Data Understanding with LLMs: A Survey of Recent Advances and Challenges
by: Wu, Xiaofeng, et al.
Published: (2025)
by: Wu, Xiaofeng, et al.
Published: (2025)
Reducing Privacy Risks in Online Self-Disclosures with Language Models
by: Dou, Yao, et al.
Published: (2023)
by: Dou, Yao, et al.
Published: (2023)
Do LLMs Know What Luxembourgish Borrows? Probing Lexical Neology in Low-Resource Multilingual Models
by: Hosseini-Kivanani, Nina
Published: (2026)
by: Hosseini-Kivanani, Nina
Published: (2026)
NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning
by: Miao, Zhongtao, et al.
Published: (2026)
by: Miao, Zhongtao, et al.
Published: (2026)
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
by: Peng, Xiangyu, et al.
Published: (2025)
by: Peng, Xiangyu, et al.
Published: (2025)
Frustratingly Easy Label Projection for Cross-lingual Transfer
by: Chen, Yang, et al.
Published: (2022)
by: Chen, Yang, et al.
Published: (2022)
NeoN: A Tool for Automated Detection, Linguistic and LLM-Driven Analysis of Neologisms in Polish
by: Tomaszewska, Aleksandra, et al.
Published: (2025)
by: Tomaszewska, Aleksandra, et al.
Published: (2025)
Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making
by: Wu, Siyu, et al.
Published: (2024)
by: Wu, Siyu, et al.
Published: (2024)
GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs
by: Rajabi, Navid, et al.
Published: (2024)
by: Rajabi, Navid, et al.
Published: (2024)
ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning
by: Potamitis, Nearchos, et al.
Published: (2025)
by: Potamitis, Nearchos, et al.
Published: (2025)
Constrained Decoding for Cross-lingual Label Projection
by: Le, Duong Minh, et al.
Published: (2024)
by: Le, Duong Minh, et al.
Published: (2024)
Evaluating the Retrieval Robustness of Large Language Models
by: Cao, Shuyang, et al.
Published: (2025)
by: Cao, Shuyang, et al.
Published: (2025)
Contrastive Knowledge Transfer and Robust Optimization for Secure Alignment of Large Language Models
by: Zheng, Jiasen, et al.
Published: (2025)
by: Zheng, Jiasen, et al.
Published: (2025)
Self-Specialization: Uncovering Latent Expertise within Large Language Models
by: Kang, Junmo, et al.
Published: (2023)
by: Kang, Junmo, et al.
Published: (2023)
Lost in Execution: On the Multilingual Robustness of Tool Calling in Large Language Models
by: Luo, Zheng, et al.
Published: (2026)
by: Luo, Zheng, et al.
Published: (2026)
Similar Items
-
STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models
by: Chen, Kai, et al.
Published: (2025) -
From 124 Million Tokens to 1,021 Neologisms: A Large-Scale Pipeline for Automatic Neologism Detection
by: Rossini, Diego, et al.
Published: (2026) -
Probabilistic Reasoning with LLMs for k-anonymity Estimation
by: Zheng, Jonathan, et al.
Published: (2025) -
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models
by: Naous, Tarek, et al.
Published: (2023) -
Neologism Learning for Controllability and Self-Verbalization
by: Hewitt, John, et al.
Published: (2025)