:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Mian, Jin, Lifeng, Song, Linfeng, Mi, Haitao, Yu, Dong
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2401.10353
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models
by: Das, Souvik, et al.
Published: (2024)

A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation
by: Li, Xiangci, et al.
Published: (2024)

Collaborative decoding of critical tokens for boosting factuality of large language models
by: Jin, Lifeng, et al.
Published: (2024)

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
by: Tian, Ye, et al.
Published: (2024)

Self-Consistency Boosts Calibration for Math Reasoning
by: Wang, Ante, et al.
Published: (2024)

Fine-Grained Self-Endorsement Improves Factuality and Reasoning
by: Wang, Ante, et al.
Published: (2024)

Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
by: Zhang, Xiaoying, et al.
Published: (2024)

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
by: Yu, Dian, et al.
Published: (2024)

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values
by: Yu, Dian, et al.
Published: (2025)

Teaching LLMs to Refine with Tools
by: Yu, Dian, et al.
Published: (2024)

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
by: Su, Yi, et al.
Published: (2025)

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
by: Liang, Zhenwen, et al.
Published: (2025)

Verified Critical Step Optimization for LLM Agents
by: Li, Mukai, et al.
Published: (2026)

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
by: Zhang, Yuheng, et al.
Published: (2025)

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
by: Wang, Xiyao, et al.
Published: (2024)

Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls
by: Wang, Ante, et al.
Published: (2025)

HunyuanProver: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving
by: Li, Yang, et al.
Published: (2024)

EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving
by: Li, Mukai, et al.
Published: (2025)

LiteSearch: Efficacious Tree Search for LLM
by: Wang, Ante, et al.
Published: (2024)

Research on emotionally intelligent dialogue generation based on automatic dialogue system
by: Wang, Jin, et al.
Published: (2024)

Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
by: Zhou, Yujun, et al.
Published: (2025)

HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
by: Yao, Wenlin, et al.
Published: (2024)

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
by: Zhang, Yuheng, et al.
Published: (2024)

Strong hallucinations from negation and how to fix them
by: Asher, Nicholas, et al.
Published: (2024)

Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
by: Ma, Junyu, et al.
Published: (2025)

Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)

What are human values, and how do we align AI to them?
by: Klingefjord, Oliver, et al.
Published: (2024)

WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model
by: Fang, Tianqing, et al.
Published: (2025)

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
by: Dai, Runpeng, et al.
Published: (2025)

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
by: Chen, Xingyu, et al.
Published: (2024)

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
by: Wang, Yue, et al.
Published: (2025)

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
by: Zhang, Ziyin, et al.
Published: (2025)

Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026)

Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
by: Ouyang, Xu, et al.
Published: (2024)

WebRollback: Enhancing Web Agents with Explicit Rollback Mechanisms
by: Zhang, Zhisong, et al.
Published: (2025)

Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories
by: Lu, Sidi, et al.
Published: (2026)

Scaling Synthetic Data Creation with 1,000,000,000 Personas
by: Ge, Tao, et al.
Published: (2024)

VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025)

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
by: He, Zhiwei, et al.
Published: (2025)

DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
by: Liang, Tian, et al.
Published: (2025)