:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Yu, Zhaoning, Su, Will, Tao, Leitian, Wang, Haozhu, Singh, Aashu, Yu, Hanchao, Wang, Jianyu, Gao, Hongyang, Yuan, Weizhe, Weston, Jason, Yu, Ping, Xu, Jing
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computation and Language
Accesso online:	https://arxiv.org/abs/2510.02172
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
di: Tao, Leitian, et al.
Pubblicazione: (2025)

MAGE: Model-Level Graph Neural Networks Explanations via Motif-based Graph Generation
di: Yu, Zhaoning, et al.
Pubblicazione: (2024)

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
di: Yu, Ping, et al.
Pubblicazione: (2025)

The Era of Real-World Human Interaction: RL from User Conversations
di: Jin, Chuanyang, et al.
Pubblicazione: (2025)

Self-Taught Evaluators
di: Wang, Tianlu, et al.
Pubblicazione: (2024)

G2T-LLM: Graph-to-Tree Text Encoding for Molecule Generation with Fine-Tuned Large Language Models
di: Yu, Zhaoning, et al.
Pubblicazione: (2024)

Self-Consistency Preference Optimization
di: Prasad, Archiki, et al.
Pubblicazione: (2024)

Self-Rewarding Language Models
di: Yuan, Weizhe, et al.
Pubblicazione: (2024)

Following Length Constraints in Instructions
di: Yuan, Weizhe, et al.
Pubblicazione: (2024)

R.I.P.: Better Models by Survival of the Fittest Prompts
di: Yu, Ping, et al.
Pubblicazione: (2025)

HOW TO RESTRAIN SADDAM
Pubblicazione: (1995)

Distilling System 2 into System 1
di: Yu, Ping, et al.
Pubblicazione: (2024)

System-Level Natural Language Feedback
di: Yuan, Weizhe, et al.
Pubblicazione: (2023)

Optimizing Recall or Relevance? A Multi-Task Multi-Head Approach for Item-to-Item Retrieval in Recommendation
di: Zhang, Jiang, et al.
Pubblicazione: (2025)

Your Weak LLM is Secretly a Strong Teacher for Alignment
di: Tao, Leitian, et al.
Pubblicazione: (2024)

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
di: Wu, Tianhao, et al.
Pubblicazione: (2024)

Self-Alignment with Instruction Backtranslation
di: Li, Xian, et al.
Pubblicazione: (2023)

Verifiable Reasoning for LLM-based Generative Recommendation
di: Lin, Xinyu, et al.
Pubblicazione: (2026)

Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis
di: Tao, Leitian, et al.
Pubblicazione: (2025)

SPICE: Self-Play In Corpus Environments Improves Reasoning
di: Liu, Bo, et al.
Pubblicazione: (2025)

Mitigating Spurious Correlations for Self-supervised Recommendation
di: Lin, Xinyu, et al.
Pubblicazione: (2022)

RESTRAIN: Reinforcement Learning-Based Secure Framework for Trigger-Action IoT Environment
di: Alam, Md Morshed, et al.
Pubblicazione: (2025)

CompCap: Improving Multimodal Large Language Models with Composite Captions
di: Chen, Xiaohui, et al.
Pubblicazione: (2024)

TOOLVERIFIER: Generalization to New Tools via Self-Verification
di: Mekala, Dheeraj, et al.
Pubblicazione: (2024)

AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
di: Du, Yu, et al.
Pubblicazione: (2024)

Self-Improving Pretraining: using post-trained models to pretrain better models
di: Tan, Ellen Xiaoqing, et al.
Pubblicazione: (2026)

Bridging Offline and Online Reinforcement Learning for LLMs
di: Lanchantin, Jack, et al.
Pubblicazione: (2025)

Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning
di: Yu, Yongcan, et al.
Pubblicazione: (2026)

Self-Challenging Language Model Agents
di: Zhou, Yifei, et al.
Pubblicazione: (2025)

Hierarchical Prompt Decision Transformer: Improving Few-Shot Policy Generalization with Global and Adaptive Guidance
di: Wang, Zhe, et al.
Pubblicazione: (2024)

MotionHint: Self-Supervised Monocular Visual Odometry with Motion Constraints
di: Wang, Cong, et al.
Pubblicazione: (2021)

Learning Critically: Selective Self Distillation in Federated Learning on Non-IID Data
di: He, Yuting, et al.
Pubblicazione: (2025)

MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems
di: Cui, Yu, et al.
Pubblicazione: (2025)

CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement
di: Tao, Leitian, et al.
Pubblicazione: (2024)

Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning
di: Liu, Haolin, et al.
Pubblicazione: (2026)

LLM-Based SQL Generation: Prompting, Self-Refinement, and Adaptive Weighted Majority Voting
di: Yang, Yu-Jie, et al.
Pubblicazione: (2026)

Quantitative estimates of the singular values of random i.i.d. matrices
di: Dai, Guozheng, et al.
Pubblicazione: (2024)

Quantitative estimates of the spectral norm of random matrices with independent columns
di: Dai, Guozheng, et al.
Pubblicazione: (2023)

J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
di: Whitehouse, Chenxi, et al.
Pubblicazione: (2025)

Comments on “On the significance of peak dose in normal tissue toxicity in spatially fractionated radiotherapy: The case of proton minibeam radiation therapy”
di: Zhaoning Wang, et al.
Pubblicazione: (2025)