:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Guangliang, Mao, Haitao, Cao, Bochuan, Xue, Zhiyu, Zhang, Xitong, Wang, Rongrong, Tang, Jiliang, Johnson, Kristen
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2406.02378
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On the Convergence of Moral Self-Correction in Large Language Models
by: Liu, Guangliang, et al.
Published: (2025)

Smaller Large Language Models Can Do Moral Self-Correction
by: Liu, Guangliang, et al.
Published: (2024)

Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
by: Liu, Guangliang, et al.
Published: (2024)

A Survey to Recent Progress Towards Understanding In-Context Learning
by: Mao, Haitao, et al.
Published: (2024)

Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness
by: Liu, Guangliang, et al.
Published: (2024)

Discourse Heuristics For Paradoxically Moral Self-Correction
by: Liu, Guangliang, et al.
Published: (2025)

Self-correction is Not An Innate Capability in Language Models
by: Liu, Guangliang, et al.
Published: (2024)

Learning to Diagnose and Correct Errors: Towards Moral Sensitivity Acquisition in Large Language Models
by: Chen, Bocheng, et al.
Published: (2026)

Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Metapragmatic Links
by: Liu, Guangliang, et al.
Published: (2025)

Diagnosing Moral Reasoning Acquisition in Language Models: Pragmatics and Generalization
by: Liu, Guangliang, et al.
Published: (2025)

Diagnosing the Performance Trade-off in Moral Alignment: A Case Study on Gender Stereotypes
by: Liu, Guangliang, et al.
Published: (2025)

Improving Generalization of Complex Models under Unbounded Loss Using PAC-Bayes Bounds
by: Zhang, Xitong, et al.
Published: (2023)

Understanding the Dark Side of LLMs' Intrinsic Self-Correction
by: Zhang, Qingjie, et al.
Published: (2024)

Automate Knowledge Concept Tagging on Math Questions with LLMs
by: Li, Hang, et al.
Published: (2024)

TruthFlow: Truthful LLM Generation via Representation Flow Correction
by: Wang, Hanyu, et al.
Published: (2025)

Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models
by: Li, Loka, et al.
Published: (2024)

Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs
by: Yang, Zhe, et al.
Published: (2024)

Meeseeks: A Feedback-Driven, Iterative Self-Correction Benchmark evaluating LLMs' Instruction Following Capability
by: wang, Jiaming, et al.
Published: (2025)

You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors
by: Cao, Bochuan, et al.
Published: (2025)

Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
by: Cao, Yuanpu, et al.
Published: (2023)

No Free Lunch for Defending Against Prefilling Attack by In-Context Learning
by: Xue, Zhiyu, et al.
Published: (2024)

Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation
by: Chang, Yurui, et al.
Published: (2025)

Label-free Node Classification on Graphs with Large Language Models (LLMS)
by: Chen, Zhikai, et al.
Published: (2023)

Improving Latent Reasoning in LLMs via Soft Concept Mixing
by: Wang, Kang, et al.
Published: (2025)

Intrinsic Self-Correction in LLMs: Towards Explainable Prompting via Mechanistic Interpretability
by: Lee, Yu-Ting, et al.
Published: (2025)

Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
by: Cao, Bochuan, et al.
Published: (2023)

Reasoning Under Uncertainty: Exploring Probabilistic Reasoning Capabilities of LLMs
by: Pournemat, Mobina, et al.
Published: (2025)

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
by: Gao, Yizhao, et al.
Published: (2024)

JoPA:Explaining Large Language Model's Generation via Joint Prompt Attribution
by: Chang, Yurui, et al.
Published: (2024)

Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever
by: Li, Hang, et al.
Published: (2024)

Large Language Models have Intrinsic Self-Correction Ability
by: Liu, Dancheng, et al.
Published: (2024)

Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO
by: Yang, Xin, et al.
Published: (2026)

Geometric Uncertainty for Detecting and Correcting Hallucinations in LLMs
by: Phillips, Edward, et al.
Published: (2025)

Are Your LLMs Capable of Stable Reasoning?
by: Liu, Junnan, et al.
Published: (2024)

Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs
by: Tie, Guiyao, et al.
Published: (2025)

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
by: Kamoi, Ryo, et al.
Published: (2024)

Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities
by: Sun, Chung-En, et al.
Published: (2024)

Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities
by: Tang, Hua, et al.
Published: (2024)

Self-Correction Makes LLMs Better Parsers
by: Zhang, Ziyan, et al.
Published: (2025)

Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
by: Zhang, Xiaoying, et al.
Published: (2024)