Saved in:
| Main Authors: | Saadat, Mohammadreza, Nemzer, Steve |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.03330 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
by: Lu, Jinghui, et al.
Published: (2025)
by: Lu, Jinghui, et al.
Published: (2025)
Fact-Checking with Large Language Models via Probabilistic Certainty and Consistency
by: Wang, Haoran, et al.
Published: (2026)
by: Wang, Haoran, et al.
Published: (2026)
Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation
by: Wan, Yixin, et al.
Published: (2023)
by: Wan, Yixin, et al.
Published: (2023)
DayDreamer at CQs-Gen 2025: Generating Critical Questions through Argument Scheme Completion
by: Zhou, Wendi, et al.
Published: (2025)
by: Zhou, Wendi, et al.
Published: (2025)
Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation
by: Dong, Harry, et al.
Published: (2024)
by: Dong, Harry, et al.
Published: (2024)
Certainty-Guided Reasoning in Large Language Models: A Dynamic Thinking Budget Approach
by: Nogueira, João Paulo, et al.
Published: (2025)
by: Nogueira, João Paulo, et al.
Published: (2025)
Soft-prompt Tuning for Large Language Models to Evaluate Bias
by: Tian, Jacob-Junqi, et al.
Published: (2023)
by: Tian, Jacob-Junqi, et al.
Published: (2023)
Auto prompting without training labels: An LLM cascade for product quality assessment in e-commerce catalogs
by: Satyadharma, Soham, et al.
Published: (2025)
by: Satyadharma, Soham, et al.
Published: (2025)
Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning
by: Chadimová, Milena, et al.
Published: (2024)
by: Chadimová, Milena, et al.
Published: (2024)
Knowledge prompt chaining for semantic modeling
by: Ding, Ning Pei, et al.
Published: (2025)
by: Ding, Ning Pei, et al.
Published: (2025)
StateAct: Enhancing LLM Base Agents via Self-prompting and State-tracking
by: Rozanov, Nikolai, et al.
Published: (2024)
by: Rozanov, Nikolai, et al.
Published: (2024)
Towards Understanding the Robustness of LLM-based Evaluations under Perturbations
by: Chaudhary, Manav, et al.
Published: (2024)
by: Chaudhary, Manav, et al.
Published: (2024)
Scalable Best-of-N Selection for Large Language Models via Self-Certainty
by: Kang, Zhewei, et al.
Published: (2025)
by: Kang, Zhewei, et al.
Published: (2025)
Fooling LLM graders into giving better grades through neural activity guided adversarial prompting
by: Yamamura, Atsushi, et al.
Published: (2024)
by: Yamamura, Atsushi, et al.
Published: (2024)
MOSLIM:Align with diverse preferences in prompts through reward classification
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
Efficient multi-prompt evaluation of LLMs
by: Polo, Felipe Maia, et al.
Published: (2024)
by: Polo, Felipe Maia, et al.
Published: (2024)
Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression
by: Huang, Jiameng, et al.
Published: (2025)
by: Huang, Jiameng, et al.
Published: (2025)
Mis-prompt: Benchmarking Large Language Models for Proactive Error Handling
by: Zeng, Jiayi, et al.
Published: (2025)
by: Zeng, Jiayi, et al.
Published: (2025)
A Comprehensive Evaluation of LLM Unlearning Robustness under Multi-Turn Interaction
by: Pan, Ruihao, et al.
Published: (2026)
by: Pan, Ruihao, et al.
Published: (2026)
Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach
by: Fan, Sinan, et al.
Published: (2025)
by: Fan, Sinan, et al.
Published: (2025)
Trusting CHATGPT: how minor tweaks in the prompts lead to major differences in sentiment classification
by: Cuellar, Jaime E., et al.
Published: (2025)
by: Cuellar, Jaime E., et al.
Published: (2025)
Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt
by: de Mijolla, Damien, et al.
Published: (2024)
by: de Mijolla, Damien, et al.
Published: (2024)
Judge's Verdict: A Comprehensive Analysis of LLM Judge Capability Through Human Agreement
by: Han, Steve, et al.
Published: (2025)
by: Han, Steve, et al.
Published: (2025)
Do LLMs Align with My Task? Evaluating Text-to-SQL via Dataset Alignment
by: Rafiei, Davood, et al.
Published: (2025)
by: Rafiei, Davood, et al.
Published: (2025)
RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning
by: Morandi, Andrea
Published: (2026)
by: Morandi, Andrea
Published: (2026)
Zero-shot prompt-based classification: topic labeling in times of foundation models in German Tweets
by: Münker, Simon, et al.
Published: (2024)
by: Münker, Simon, et al.
Published: (2024)
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
by: Yu, Ping, et al.
Published: (2025)
by: Yu, Ping, et al.
Published: (2025)
Evil twins are not that evil: Qualitative insights into machine-generated prompts
by: Rakotonirina, Nathanaël Carraz, et al.
Published: (2024)
by: Rakotonirina, Nathanaël Carraz, et al.
Published: (2024)
RETUYT-INCO at BEA 2026 Shared Task 2: Meta-prompting in Rubric-based Scoring for German
by: Sastre, Ignacio, et al.
Published: (2026)
by: Sastre, Ignacio, et al.
Published: (2026)
Metaphor identification using large language models: A comparison of RAG, prompt engineering, and fine-tuning
by: Fuoli, Matteo, et al.
Published: (2025)
by: Fuoli, Matteo, et al.
Published: (2025)
PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding
by: Nakka, Krishna Kanth, et al.
Published: (2024)
by: Nakka, Krishna Kanth, et al.
Published: (2024)
Unleashing the potential of prompt engineering for large language models
by: Chen, Banghao, et al.
Published: (2023)
by: Chen, Banghao, et al.
Published: (2023)
Evaluate Summarization in Fine-Granularity: Auto Evaluation with LLM
by: Yuan, Dong, et al.
Published: (2024)
by: Yuan, Dong, et al.
Published: (2024)
GreenTEA: Gradient Descent with Topic-modeling and Evolutionary Auto-prompting
by: Dong, Zheng, et al.
Published: (2025)
by: Dong, Zheng, et al.
Published: (2025)
SQL-Exchange: Transforming SQL Queries Across Domains
by: Daviran, Mohammadreza, et al.
Published: (2025)
by: Daviran, Mohammadreza, et al.
Published: (2025)
DaVinci at SemEval-2024 Task 9: Few-shot prompting GPT-3.5 for Unconventional Reasoning
by: Mathur, Suyash Vardhan, et al.
Published: (2024)
by: Mathur, Suyash Vardhan, et al.
Published: (2024)
Emergent misalignment as prompt sensitivity: A research note
by: Wyse, Tim, et al.
Published: (2025)
by: Wyse, Tim, et al.
Published: (2025)
LLM Prompt Evaluation for Educational Applications
by: Holmes, Langdon, et al.
Published: (2026)
by: Holmes, Langdon, et al.
Published: (2026)
Evaluating Metrics for Safety with LLM-as-Judges
by: Clegg, Kester, et al.
Published: (2025)
by: Clegg, Kester, et al.
Published: (2025)
Three Models of RLHF Annotation: Extension, Evidence, and Authority
by: Coyne, Steve
Published: (2026)
by: Coyne, Steve
Published: (2026)
Similar Items
-
Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
by: Lu, Jinghui, et al.
Published: (2025) -
Fact-Checking with Large Language Models via Probabilistic Certainty and Consistency
by: Wang, Haoran, et al.
Published: (2026) -
Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation
by: Wan, Yixin, et al.
Published: (2023) -
DayDreamer at CQs-Gen 2025: Generating Critical Questions through Argument Scheme Completion
by: Zhou, Wendi, et al.
Published: (2025) -
Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation
by: Dong, Harry, et al.
Published: (2024)