:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hua, Andong, Tang, Kenan, Gu, Chenhe, Gu, Jindong, Wong, Eric, Qin, Yao
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2509.01790
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack
by: Gu, Chenhe, et al.
Published: (2025)

Banana100: Breaking NR-IQA Metrics by 100 Iterative Image Replications with Nano Banana Pro
by: Tang, Kenan, et al.
Published: (2026)

PromptBench: A Unified Library for Evaluation of Large Language Models
by: Zhu, Kaijie, et al.
Published: (2023)

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
by: Lyu, Kaifeng, et al.
Published: (2024)

Interesting Scientific Idea Generation using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders
by: Gu, Xuemei, et al.
Published: (2024)

Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models
by: Bai, Yang, et al.
Published: (2024)

Architectural Flaw Detection in Civil Engineering Using GPT-4
by: Kumar, Saket, et al.
Published: (2024)

Selective Prompting Tuning for Personalized Conversations with LLMs
by: Huang, Qiushi, et al.
Published: (2024)

AgentBench: Evaluating LLMs as Agents
by: Liu, Xiao, et al.
Published: (2023)

LLMs Should Express Uncertainty Explicitly
by: Guo, Junyu, et al.
Published: (2026)

On Meta-Prompting
by: de Wynter, Adrian, et al.
Published: (2023)

Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering
by: Zhou, Han, et al.
Published: (2023)

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge
by: Tang, Yao, et al.
Published: (2026)

Dynamic Evaluation of Large Language Models by Meta Probing Agents
by: Zhu, Kaijie, et al.
Published: (2024)

Higher-order Linear Attention
by: Zhang, Yifan, et al.
Published: (2025)

Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox
by: Liu, Yijun, et al.
Published: (2024)

Revisiting Prompt Sensitivity in Large Language Models for Text Classification: The Role of Prompt Underspecification
by: Pecher, Branislav, et al.
Published: (2026)

Does Machine Unlearning Truly Remove Knowledge?
by: Chen, Haokun, et al.
Published: (2025)

KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
by: Yu, Zhuohao, et al.
Published: (2024)

Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents
by: Tang, Zhengyang, et al.
Published: (2026)

Evaluating Prompt Engineering Techniques for Accuracy and Confidence Elicitation in Medical LLMs
by: Naderi, Nariman, et al.
Published: (2025)

Rethinking the Potential of Multimodality in Collaborative Problem Solving Diagnosis with Large Language Models
by: Wong, K., et al.
Published: (2025)

How Susceptible are LLMs to Influence in Prompts?
by: Anagnostidis, Sotiris, et al.
Published: (2024)

Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs
by: Bhattacharyya, Sree, et al.
Published: (2026)

Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts
by: Yin, Yueqin, et al.
Published: (2024)

Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
by: Zhu, Kan, et al.
Published: (2025)

Learning and Enforcing Context-Sensitive Control for LLMs
by: Albinhassan, Mohammad, et al.
Published: (2026)

POSIX: A Prompt Sensitivity Index For Large Language Models
by: Chatterjee, Anwoy, et al.
Published: (2024)

RewardAnything: Generalizable Principle-Following Reward Models
by: Yu, Zhuohao, et al.
Published: (2025)

Tensor Product Attention Is All You Need
by: Zhang, Yifan, et al.
Published: (2025)

Prompt Repetition Improves Non-Reasoning LLMs
by: Leviathan, Yaniv, et al.
Published: (2025)

Tabular Transfer Learning via Prompting LLMs
by: Nam, Jaehyun, et al.
Published: (2024)

DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
by: Zhu, Kaijie, et al.
Published: (2023)

UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models
by: Gu, Xiaojie, et al.
Published: (2025)

StyleBench: Evaluating thinking styles in Large Language Models
by: Guo, Junyu, et al.
Published: (2025)

Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory
by: Liu, Yexiang, et al.
Published: (2025)

Rethinking Time Series Forecasting with LLMs via Nearest Neighbor Contrastive Learning
by: Bogahawatte, Jayanie, et al.
Published: (2024)

Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
by: Yu, Zhuohao, et al.
Published: (2024)

An Evaluation on Large Language Model Outputs: Discourse and Memorization
by: de Wynter, Adrian, et al.
Published: (2023)

Understanding and Mitigating Bias Inheritance in LLM-based Data Augmentation on Downstream Tasks
by: Li, Miaomiao, et al.
Published: (2025)