:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Saadat, Mohammadreza, Nemzer, Steve
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.03330
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
by: Lu, Jinghui, et al.
Published: (2025)

Fact-Checking with Large Language Models via Probabilistic Certainty and Consistency
by: Wang, Haoran, et al.
Published: (2026)

Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation
by: Wan, Yixin, et al.
Published: (2023)

DayDreamer at CQs-Gen 2025: Generating Critical Questions through Argument Scheme Completion
by: Zhou, Wendi, et al.
Published: (2025)

Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation
by: Dong, Harry, et al.
Published: (2024)

Certainty-Guided Reasoning in Large Language Models: A Dynamic Thinking Budget Approach
by: Nogueira, João Paulo, et al.
Published: (2025)

Soft-prompt Tuning for Large Language Models to Evaluate Bias
by: Tian, Jacob-Junqi, et al.
Published: (2023)

Auto prompting without training labels: An LLM cascade for product quality assessment in e-commerce catalogs
by: Satyadharma, Soham, et al.
Published: (2025)

Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning
by: Chadimová, Milena, et al.
Published: (2024)

Knowledge prompt chaining for semantic modeling
by: Ding, Ning Pei, et al.
Published: (2025)

StateAct: Enhancing LLM Base Agents via Self-prompting and State-tracking
by: Rozanov, Nikolai, et al.
Published: (2024)

Towards Understanding the Robustness of LLM-based Evaluations under Perturbations
by: Chaudhary, Manav, et al.
Published: (2024)

Scalable Best-of-N Selection for Large Language Models via Self-Certainty
by: Kang, Zhewei, et al.
Published: (2025)

Fooling LLM graders into giving better grades through neural activity guided adversarial prompting
by: Yamamura, Atsushi, et al.
Published: (2024)

MOSLIM:Align with diverse preferences in prompts through reward classification
by: Zhang, Yu, et al.
Published: (2025)

Efficient multi-prompt evaluation of LLMs
by: Polo, Felipe Maia, et al.
Published: (2024)

Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression
by: Huang, Jiameng, et al.
Published: (2025)

Mis-prompt: Benchmarking Large Language Models for Proactive Error Handling
by: Zeng, Jiayi, et al.
Published: (2025)

A Comprehensive Evaluation of LLM Unlearning Robustness under Multi-Turn Interaction
by: Pan, Ruihao, et al.
Published: (2026)

Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach
by: Fan, Sinan, et al.
Published: (2025)

Trusting CHATGPT: how minor tweaks in the prompts lead to major differences in sentiment classification
by: Cuellar, Jaime E., et al.
Published: (2025)

Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt
by: de Mijolla, Damien, et al.
Published: (2024)

Judge's Verdict: A Comprehensive Analysis of LLM Judge Capability Through Human Agreement
by: Han, Steve, et al.
Published: (2025)

Do LLMs Align with My Task? Evaluating Text-to-SQL via Dataset Alignment
by: Rafiei, Davood, et al.
Published: (2025)

RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning
by: Morandi, Andrea
Published: (2026)

Zero-shot prompt-based classification: topic labeling in times of foundation models in German Tweets
by: Münker, Simon, et al.
Published: (2024)

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
by: Yu, Ping, et al.
Published: (2025)

Evil twins are not that evil: Qualitative insights into machine-generated prompts
by: Rakotonirina, Nathanaël Carraz, et al.
Published: (2024)

RETUYT-INCO at BEA 2026 Shared Task 2: Meta-prompting in Rubric-based Scoring for German
by: Sastre, Ignacio, et al.
Published: (2026)

Metaphor identification using large language models: A comparison of RAG, prompt engineering, and fine-tuning
by: Fuoli, Matteo, et al.
Published: (2025)

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding
by: Nakka, Krishna Kanth, et al.
Published: (2024)

Unleashing the potential of prompt engineering for large language models
by: Chen, Banghao, et al.
Published: (2023)

Evaluate Summarization in Fine-Granularity: Auto Evaluation with LLM
by: Yuan, Dong, et al.
Published: (2024)

GreenTEA: Gradient Descent with Topic-modeling and Evolutionary Auto-prompting
by: Dong, Zheng, et al.
Published: (2025)

SQL-Exchange: Transforming SQL Queries Across Domains
by: Daviran, Mohammadreza, et al.
Published: (2025)

DaVinci at SemEval-2024 Task 9: Few-shot prompting GPT-3.5 for Unconventional Reasoning
by: Mathur, Suyash Vardhan, et al.
Published: (2024)

Emergent misalignment as prompt sensitivity: A research note
by: Wyse, Tim, et al.
Published: (2025)

LLM Prompt Evaluation for Educational Applications
by: Holmes, Langdon, et al.
Published: (2026)

Evaluating Metrics for Safety with LLM-as-Judges
by: Clegg, Kester, et al.
Published: (2025)

Three Models of RLHF Annotation: Extension, Evidence, and Authority
by: Coyne, Steve
Published: (2026)