Saved in:
| Main Authors: | Liu, Guangliang, Mao, Haitao, Cao, Bochuan, Xue, Zhiyu, Zhang, Xitong, Wang, Rongrong, Tang, Jiliang, Johnson, Kristen |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.02378 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On the Convergence of Moral Self-Correction in Large Language Models
by: Liu, Guangliang, et al.
Published: (2025)
by: Liu, Guangliang, et al.
Published: (2025)
Smaller Large Language Models Can Do Moral Self-Correction
by: Liu, Guangliang, et al.
Published: (2024)
by: Liu, Guangliang, et al.
Published: (2024)
Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
by: Liu, Guangliang, et al.
Published: (2024)
by: Liu, Guangliang, et al.
Published: (2024)
A Survey to Recent Progress Towards Understanding In-Context Learning
by: Mao, Haitao, et al.
Published: (2024)
by: Mao, Haitao, et al.
Published: (2024)
Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness
by: Liu, Guangliang, et al.
Published: (2024)
by: Liu, Guangliang, et al.
Published: (2024)
Discourse Heuristics For Paradoxically Moral Self-Correction
by: Liu, Guangliang, et al.
Published: (2025)
by: Liu, Guangliang, et al.
Published: (2025)
Self-correction is Not An Innate Capability in Language Models
by: Liu, Guangliang, et al.
Published: (2024)
by: Liu, Guangliang, et al.
Published: (2024)
Learning to Diagnose and Correct Errors: Towards Moral Sensitivity Acquisition in Large Language Models
by: Chen, Bocheng, et al.
Published: (2026)
by: Chen, Bocheng, et al.
Published: (2026)
Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Metapragmatic Links
by: Liu, Guangliang, et al.
Published: (2025)
by: Liu, Guangliang, et al.
Published: (2025)
Diagnosing Moral Reasoning Acquisition in Language Models: Pragmatics and Generalization
by: Liu, Guangliang, et al.
Published: (2025)
by: Liu, Guangliang, et al.
Published: (2025)
Diagnosing the Performance Trade-off in Moral Alignment: A Case Study on Gender Stereotypes
by: Liu, Guangliang, et al.
Published: (2025)
by: Liu, Guangliang, et al.
Published: (2025)
Improving Generalization of Complex Models under Unbounded Loss Using PAC-Bayes Bounds
by: Zhang, Xitong, et al.
Published: (2023)
by: Zhang, Xitong, et al.
Published: (2023)
Understanding the Dark Side of LLMs' Intrinsic Self-Correction
by: Zhang, Qingjie, et al.
Published: (2024)
by: Zhang, Qingjie, et al.
Published: (2024)
Automate Knowledge Concept Tagging on Math Questions with LLMs
by: Li, Hang, et al.
Published: (2024)
by: Li, Hang, et al.
Published: (2024)
TruthFlow: Truthful LLM Generation via Representation Flow Correction
by: Wang, Hanyu, et al.
Published: (2025)
by: Wang, Hanyu, et al.
Published: (2025)
Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models
by: Li, Loka, et al.
Published: (2024)
by: Li, Loka, et al.
Published: (2024)
Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs
by: Yang, Zhe, et al.
Published: (2024)
by: Yang, Zhe, et al.
Published: (2024)
Meeseeks: A Feedback-Driven, Iterative Self-Correction Benchmark evaluating LLMs' Instruction Following Capability
by: wang, Jiaming, et al.
Published: (2025)
by: wang, Jiaming, et al.
Published: (2025)
You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors
by: Cao, Bochuan, et al.
Published: (2025)
by: Cao, Bochuan, et al.
Published: (2025)
Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
by: Cao, Yuanpu, et al.
Published: (2023)
by: Cao, Yuanpu, et al.
Published: (2023)
No Free Lunch for Defending Against Prefilling Attack by In-Context Learning
by: Xue, Zhiyu, et al.
Published: (2024)
by: Xue, Zhiyu, et al.
Published: (2024)
Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation
by: Chang, Yurui, et al.
Published: (2025)
by: Chang, Yurui, et al.
Published: (2025)
Label-free Node Classification on Graphs with Large Language Models (LLMS)
by: Chen, Zhikai, et al.
Published: (2023)
by: Chen, Zhikai, et al.
Published: (2023)
Improving Latent Reasoning in LLMs via Soft Concept Mixing
by: Wang, Kang, et al.
Published: (2025)
by: Wang, Kang, et al.
Published: (2025)
Intrinsic Self-Correction in LLMs: Towards Explainable Prompting via Mechanistic Interpretability
by: Lee, Yu-Ting, et al.
Published: (2025)
by: Lee, Yu-Ting, et al.
Published: (2025)
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
by: Cao, Bochuan, et al.
Published: (2023)
by: Cao, Bochuan, et al.
Published: (2023)
Reasoning Under Uncertainty: Exploring Probabilistic Reasoning Capabilities of LLMs
by: Pournemat, Mobina, et al.
Published: (2025)
by: Pournemat, Mobina, et al.
Published: (2025)
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
by: Gao, Yizhao, et al.
Published: (2024)
by: Gao, Yizhao, et al.
Published: (2024)
JoPA:Explaining Large Language Model's Generation via Joint Prompt Attribution
by: Chang, Yurui, et al.
Published: (2024)
by: Chang, Yurui, et al.
Published: (2024)
Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever
by: Li, Hang, et al.
Published: (2024)
by: Li, Hang, et al.
Published: (2024)
Large Language Models have Intrinsic Self-Correction Ability
by: Liu, Dancheng, et al.
Published: (2024)
by: Liu, Dancheng, et al.
Published: (2024)
Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO
by: Yang, Xin, et al.
Published: (2026)
by: Yang, Xin, et al.
Published: (2026)
Geometric Uncertainty for Detecting and Correcting Hallucinations in LLMs
by: Phillips, Edward, et al.
Published: (2025)
by: Phillips, Edward, et al.
Published: (2025)
Are Your LLMs Capable of Stable Reasoning?
by: Liu, Junnan, et al.
Published: (2024)
by: Liu, Junnan, et al.
Published: (2024)
Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs
by: Tie, Guiyao, et al.
Published: (2025)
by: Tie, Guiyao, et al.
Published: (2025)
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
by: Kamoi, Ryo, et al.
Published: (2024)
by: Kamoi, Ryo, et al.
Published: (2024)
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities
by: Sun, Chung-En, et al.
Published: (2024)
by: Sun, Chung-En, et al.
Published: (2024)
Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities
by: Tang, Hua, et al.
Published: (2024)
by: Tang, Hua, et al.
Published: (2024)
Self-Correction Makes LLMs Better Parsers
by: Zhang, Ziyan, et al.
Published: (2025)
by: Zhang, Ziyan, et al.
Published: (2025)
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
by: Zhang, Xiaoying, et al.
Published: (2024)
by: Zhang, Xiaoying, et al.
Published: (2024)
Similar Items
-
On the Convergence of Moral Self-Correction in Large Language Models
by: Liu, Guangliang, et al.
Published: (2025) -
Smaller Large Language Models Can Do Moral Self-Correction
by: Liu, Guangliang, et al.
Published: (2024) -
Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
by: Liu, Guangliang, et al.
Published: (2024) -
A Survey to Recent Progress Towards Understanding In-Context Learning
by: Mao, Haitao, et al.
Published: (2024) -
Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness
by: Liu, Guangliang, et al.
Published: (2024)